Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.1.6, Lustre 2.5.3
-
kernel 2.6.32-279.5.2
-
3
-
16855
Description
General Protection Fault at echo_session_key_fini+0xa9
A crash on an OSS with the following trace has been encountered with lustre 2.1.6 and kernel 2.6.32-279.5.2
2013-10-18 07:57:21 general protection fault: 0000 [#1] SMP 2013-10-18 07:57:21 last sysfs file: /sys/devices/pci0000:80/0000:80:02.2/0000:84:00.0/host10/port-10:3/end_device-10:3/target10:0:3/10:0:3:4/state 2013-10-18 07:57:21 CPU 0 2013-10-18 07:57:21 Modules linked in: nfs fscache obdecho(U) obdfilter(U) fsfilt_ldiskfs(U) ost(U) mgc(U) ldiskfs(U) lustre(U) lov(U) osc(U) mdc(U) lquota(U) fid(U) fld(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) nfsd lockd nfs_acl auth_rpcgss exportf s sunrpc rdma_ucm(U) ib_sdp(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_ipoib(U) ib_cm(U) ib_sa(U) ipv6 ib_uverbs(U) ib_umad(U) mlx4_ib(U) mlx4_core(U) ib_mthca(U) ib_mad(U) ib_core(U) dm_mirror dm_region_hash dm_log dm_round_robin scsi_dh_rdac dm_multipath dm_mod uinput usbhid hid sg mpt2sas scsi_transport_sas raid_class sb_edac edac_core i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support ioatdma igb dca ext4 mbcache jbd2 ehci_hcd sd_mod crc_t10dif ahci megaraid_sas [last unloaded: scsi_wait_scan] 2013-10-18 07:57:21 2013-10-18 07:57:21 Pid: 14573, comm: obd_zombid Not tainted 2.6.32-279.5.2.bl6.Bull.35_restricted.x86_64 #1 Bull SAS bullx R/X9DRH-7TF/7F/iTF/iF 2013-10-18 07:57:21 RIP: 0010:[<ffffffffa0a2dc49>] [<ffffffffa0a2dc49>] echo_session_key_fini+0xa9/0x100 [obdecho] 2013-10-18 07:57:21 RSP: 0018:ffff88105451bce0 EFLAGS: 00010246 2013-10-18 07:57:21 RAX: 5a5a5a5a5a5a5a5a RBX: 5a5a5a5a5a5a5a5a RCX: 0000000000000002 2013-10-18 07:57:21 RDX: ffff8808587c7540 RSI: 5a5a5a5a5a5a5a5a RDI: 0000000000000000 2013-10-18 07:57:21 RBP: ffff88105451bcf0 R08: ffffffff81ac5ee8 R09: 0000000000000140 2013-10-18 07:57:21 R10: 0000000000000000 R11: 000000000000000c R12: ffff880f1ad9a2f8 2013-10-18 07:57:21 R13: 0000000000000010 R14: ffff88105451be20 R15: ffff8802f65a46d0 2013-10-18 07:57:21 FS: 00007f14eea7f700(0000) GS:ffff880028200000(0000) knlGS:0000000000000000 2013-10-18 07:57:21 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b 2013-10-18 07:57:21 CR2: 00007f5cb4fc8000 CR3: 0000000001a06000 CR4: 00000000000406f0 2013-10-18 07:57:21 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 2013-10-18 07:57:21 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 2013-10-18 07:57:21 Process obd_zombid (pid: 14573, threadinfo ffff881054518000, task ffff881068e0c080) 2013-10-18 07:57:21 Stack: 2013-10-18 07:57:21 0000000000000010 ffffffffa0a3cf40 ffff88105451bd20 ffffffffa0554d90 2013-10-18 07:57:21 <d> ffff88105451bd20 ffffffffa0a3cf40 ffff880f1ad9a2f8 ffff88071cd302c0 2013-10-18 07:57:21 <d> ffff88105451bd40 ffffffffa055514b 0000000000000000 ffff8802f65a46c8 2013-10-18 07:57:21 Call Trace: 2013-10-18 07:57:21 [<ffffffffa0554d90>] key_fini+0x60/0x1e0 [obdclass] 2013-10-18 07:57:21 [<ffffffffa055514b>] lu_context_key_quiesce+0x5b/0x90 [obdclass] 2013-10-18 07:57:21 [<ffffffffa05551d9>] lu_context_key_quiesce_many+0x59/0x80 [obdclass] 2013-10-18 07:57:21 [<ffffffffa0a2d910>] echo_type_stop+0x20/0x30 [obdecho] 2013-10-18 07:57:21 [<ffffffffa0554322>] lu_device_fini+0x52/0xd0 [obdclass] 2013-10-18 07:57:21 [<ffffffffa0a2fed7>] echo_device_free+0x247/0x510 [obdecho] 2013-10-18 07:57:22 [<ffffffffa053871d>] class_decref+0x46d/0x590 [obdclass] 2013-10-18 07:57:22 [<ffffffffa0523bae>] obd_zombie_impexp_cull+0x31e/0x620 [obdclass] 2013-10-18 07:57:22 [<ffffffffa0523f75>] obd_zombie_impexp_thread+0xc5/0x1c0 [obdclass] 2013-10-18 07:57:22 [<ffffffff81048df0>] ? default_wake_function+0x0/0x20 2013-10-18 07:57:22 [<ffffffffa0523eb0>] ? obd_zombie_impexp_thread+0x0/0x1c0 [obdclass] 2013-10-18 07:57:22 [<ffffffff8100412a>] child_rip+0xa/0x20 2013-10-18 07:57:22 [<ffffffffa0523eb0>] ? obd_zombie_impexp_thread+0x0/0x1c0 [obdclass] 2013-10-18 07:57:22 [<ffffffffa0523eb0>] ? obd_zombie_impexp_thread+0x0/0x1c0 [obdclass] 2013-10-18 07:57:22 [<ffffffff81004120>] ? child_rip+0x0/0x20 2013-10-18 07:57:22 Code: 99 02 00 00 48 c7 05 d3 49 01 00 00 00 00 00 c7 05 c1 49 01 00 10 00 00 00 e8 14 74 a2 ff 48 b8 5a 5a 5a 5a 5a 5a 5a 5a 48 89 de <48> 89 03 48 8b 3d d5 69 01 00 e8 98 bb a1 ff 48 83 c4 08 5b c9 2013-10-18 07:57:22 RIP [<ffffffffa0a2dc49>] echo_session_key_fini+0xa9/0x100 [obdecho] 2013-10-18 07:57:22 RSP <ffff88105451bce0> 2013-10-18 07:57:22 Initializing cgroup subsys cpuset 2013-10-18 07:57:22 Initializing cgroup subsys cpu 2013-10-18 07:57:22 Linux version 2.6.32-279.5.2.bl6.Bull.35_restricted.x86_64 (derr@atlas.frec.bull.fr) (gcc version 4.4.6 20120305 (Bull 4.4.6-4) (GCC) ) #1 SMP Wed Apr 24 14:29:54 CEST 2013
The crash occured in echo_session_key_fini().
key_fini() was calling key->lct_fini(), pointing on echo_session_key_fini(), using ctx->lc_value[index] as third parameter, but this array was containing POISON'ed values.
static void echo_session_key_fini(...) { struct echo_session_info *session = data; OBD_SLAB_FREE_PTR(session, echo_session_kmem); <================ GPF here, session is 0x5a5a5a5a5a5a5a5a } After OBD_SLAB_FREE_PTR() expansion: ------------------------------------ static void echo_session_key_fini(...) { struct echo_session_info *session = data; LASSERT(session); lprocfs_counter_sub(obd_memory, OBD_MEMORY_STAT, (long)(sizeof *(session))); CDEBUG(D_MALLOC, name " '" #session "': %d at %p.\n", (int)(sizeof *(session)), session); memset(session, 0x5a, sizeof *(session)); <================ GPF here, session is 0x5a5a5a5a5a5a5a5a cfs_mem_cache_free(echo_session_kmem, session); (session) = (void *)0xdeadbeef; }