Details
-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
Lustre 2.4.0
-
3
-
5394
Description
Our Grove-Test MDS crashed due to the following:
2012-10-30 23:11:40 BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 2012-10-30 23:11:40 IP: [<ffffffffa05c393e>] seq_server_alloc_meta+0x51e/0x700 [fid] 2012-10-30 23:11:40 PGD 20168e8067 PUD 2010af0067 PMD 0 2012-10-30 23:11:40 Oops: 0000 [#1] SMP 2012-10-30 23:11:40 last sysfs file: /sys/devices/pci0000:80/0000:80:02.2/0000:83:00.0/host7/port-7:1/expander-7:1/port-7:1:18/end_device-7:1:18/target7:0:42/7:0:42:0/timeout 2012-10-30 23:11:40 CPU 13 2012-10-30 23:11:40 Modules linked in: osp(U) mdt(U) mdd(U) lod(U) mgs(U) mgc(U) osd_zfs(U) lquota(U) lustre(U) lov(U) osc(U) mdc(U) fid(U) fld(U) ptlrpc(U) obdclass(U) lvfs(U) ksocklnd(U) ko2iblnd(U) lnet(U) libcfs(U) acpi_cpufreq freq_table mperf sha512_generic sha256_generic ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ib_sa mlx4_ib ib_mad ib_core dm_mirror dm_region_hash dm_log dm_round_robin dm_multipath dm_mod vhost_net macvtap macvlan tun kvm zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) spl(U) zlib_deflate sg ses enclosure sd_mod crc_t10dif isci libsas wmi mpt2sas scsi_transport_sas raid_class sb_edac edac_core i2c_i801 i2c_core ahci iTCO_wdt iTCO_vendor_support ioatdma shpchp ipv6 nfs lockd fscache nfs_acl auth_rpcgss sunrpc mlx4_en mlx4_core igb dca [last unloaded: libcfs] 2012-10-30 23:11:40 2012-10-30 23:11:40 Pid: 33877, comm: mdt_mdss_0001 Tainted: P W ---------------- 2.6.32-220.23.1.1chaos.ch5.x86_64 #1 appro 2620x-in/S2600GZ 2012-10-30 23:11:40 RIP: 0010:[<ffffffffa05c393e>] [<ffffffffa05c393e>] seq_server_alloc_meta+0x51e/0x700 [fid] 2012-10-30 23:11:40 RSP: 0018:ffff881fc88f9ca0 EFLAGS: 00010246 2012-10-30 23:11:40 RAX: 0000000000000000 RBX: 0000000200005dd8 RCX: 00000002000061c0 2012-10-30 23:11:40 RDX: 00000000000003e8 RSI: ffff880f6f574cc0 RDI: ffff880fc9589f40 2012-10-30 23:11:40 RBP: ffff881fc88f9ce0 R08: 0000000000000000 R09: ffff881b34578600 2012-10-30 23:11:40 R10: 0000000000000009 R11: ffffffffa0c86db0 R12: ffff881b345787e8 2012-10-30 23:11:40 R13: ffff880f6f574d30 R14: ffff880f6f574cc0 R15: ffff880fc9589f40 2012-10-30 23:11:40 FS: 00002aaaab47e700(0000) GS:ffff8810788a0000(0000) knlGS:0000000000000000 2012-10-30 23:11:40 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b 2012-10-30 23:11:40 CR2: 0000000000000010 CR3: 00000020110da000 CR4: 00000000000406e0 2012-10-30 23:11:40 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 2012-10-30 23:11:40 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 2012-10-30 23:11:40 Process mdt_mdss_0001 (pid: 33877, threadinfo ffff881fc88f8000, task ffff881fc88f2aa0) 2012-10-30 23:11:40 Stack: 2012-10-30 23:11:40 ffff881fc88f9cb0 ffff8809c74e3000 ffff880fc9589940 ffff8809c74e3000 2012-10-30 23:11:40 <0> ffff880fc9589940 ffff880fc9589f40 ffff881b345787e8 00000000ffffffea 2012-10-30 23:11:40 <0> ffff881fc88f9d30 ffffffffa05c3e9f ffff881fc88f9d10 ffffc900c5ea4988 2012-10-30 23:11:40 Call Trace: 2012-10-30 23:11:40 [<ffffffffa05c3e9f>] seq_query+0x37f/0x6d0 [fid] 2012-10-30 23:11:40 [<ffffffffa0eef322>] mdt_handle_common+0x932/0x1760 [mdt] 2012-10-30 23:11:40 [<ffffffffa0ef01c5>] mdt_mdss_handle+0x15/0x20 [mdt] 2012-10-30 23:11:40 [<ffffffffa0bee8cc>] ptlrpc_server_handle_request+0x41c/0xe00 [ptlrpc] 2012-10-30 23:11:40 [<ffffffffa088a6be>] ? cfs_timer_arm+0xe/0x10 [libcfs] 2012-10-30 23:11:40 [<ffffffffa089c14f>] ? lc_watchdog_touch+0x6f/0x180 [libcfs] 2012-10-30 23:11:40 [<ffffffffa0be5c79>] ? ptlrpc_wait_event+0xa9/0x2a0 [ptlrpc] 2012-10-30 23:11:40 [<ffffffff81051ba3>] ? __wake_up+0x53/0x70 2012-10-30 23:11:40 [<ffffffffa0befebc>] ptlrpc_main+0xc0c/0x19f0 [ptlrpc] 2012-10-30 23:11:40 [<ffffffffa0bef2b0>] ? ptlrpc_main+0x0/0x19f0 [ptlrpc] 2012-10-30 23:11:40 [<ffffffff8100c14a>] child_rip+0xa/0x20 2012-10-30 23:11:40 [<ffffffffa0bef2b0>] ? ptlrpc_main+0x0/0x19f0 [ptlrpc] 2012-10-30 23:11:40 [<ffffffffa0bef2b0>] ? ptlrpc_main+0x0/0x19f0 [ptlrpc] 2012-10-30 23:11:40 [<ffffffff8100c140>] ? child_rip+0x0/0x20 2012-10-30 23:11:40 Code: d1 fc ff ff 66 0f 1f 84 00 00 00 00 00 49 8b 86 f8 00 00 00 49 8b 96 e8 00 00 00 4c 89 f6 49 8b 5e 30 49 8b 0e 4c 89 ff 48 8b 00 <48> 8b 40 10 48 8b 40 28 48 63 80 40 01 00 00 49 89 5e 18 49 0f 2012-10-30 23:11:40 RIP [<ffffffffa05c393e>] seq_server_alloc_meta+0x51e/0x700 [fid] 2012-10-30 23:11:40 RSP <ffff881fc88f9ca0> 2012-10-30 23:11:40 CR2: 0000000000000010
The specific Lustre version installed:
2012-10-29 16:37:23 Lustre: Lustre: Build Version: 2.3.54-1chaos-2surya1-2surya1--PRISTINE-2.6.32-220.23.1.1chaos.ch5.x86_64
This is the 2.3.54 tag, with our local LLNL patches that haven't yet landed, and a couple patches for LU-2139 (which shouldn't affect the MDS stability).
We did successfully get a crash dump if it is needed:
2012-10-30 23:27:02 The dumpfile is saved to /var/crash/vmcore-grove-mds2-2012-10-31-06:11:59.
Attachments
Issue Links
- duplicates
-
LU-2186 seq_server_alloc_meta() NULL deref
- Resolved