[LU-2256] NULL pointer dereference in seq_server_alloc_meta+0x51e/0x700 Created: 31/Oct/12  Updated: 31/Oct/12  Resolved: 31/Oct/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Prakash Surya (Inactive) Assignee: WC Triage
Resolution: Duplicate Votes: 0
Labels: topsequoia

Issue Links:
Duplicate
duplicates LU-2186 seq_server_alloc_meta() NULL deref Resolved
Severity: 3
Rank (Obsolete): 5394

 Description   

Our Grove-Test MDS crashed due to the following:

2012-10-30 23:11:40 BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
2012-10-30 23:11:40 IP: [<ffffffffa05c393e>] seq_server_alloc_meta+0x51e/0x700 [fid]
2012-10-30 23:11:40 PGD 20168e8067 PUD 2010af0067 PMD 0 
2012-10-30 23:11:40 Oops: 0000 [#1] SMP 
2012-10-30 23:11:40 last sysfs file: /sys/devices/pci0000:80/0000:80:02.2/0000:83:00.0/host7/port-7:1/expander-7:1/port-7:1:18/end_device-7:1:18/target7:0:42/7:0:42:0/timeout
2012-10-30 23:11:40 CPU 13 
2012-10-30 23:11:40 Modules linked in: osp(U) mdt(U) mdd(U) lod(U) mgs(U) mgc(U) osd_zfs(U) lquota(U) lustre(U) lov(U) osc(U) mdc(U) fid(U) fld(U) ptlrpc(U) obdclass(U) lvfs(U) ksocklnd(U) ko2iblnd(U) lnet(U) libcfs(U) acpi_cpufreq freq_table mperf sha512_generic sha256_generic ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ib_sa mlx4_ib ib_mad ib_core dm_mirror dm_region_hash dm_log dm_round_robin dm_multipath dm_mod vhost_net macvtap macvlan tun kvm zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) spl(U) zlib_deflate sg ses enclosure sd_mod crc_t10dif isci libsas wmi mpt2sas scsi_transport_sas raid_class sb_edac edac_core i2c_i801 i2c_core ahci iTCO_wdt iTCO_vendor_support ioatdma shpchp ipv6 nfs lockd fscache nfs_acl auth_rpcgss sunrpc mlx4_en mlx4_core igb dca [last unloaded: libcfs]
2012-10-30 23:11:40 
2012-10-30 23:11:40 Pid: 33877, comm: mdt_mdss_0001 Tainted: P        W  ----------------   2.6.32-220.23.1.1chaos.ch5.x86_64 #1 appro 2620x-in/S2600GZ
2012-10-30 23:11:40 RIP: 0010:[<ffffffffa05c393e>]  [<ffffffffa05c393e>] seq_server_alloc_meta+0x51e/0x700 [fid]
2012-10-30 23:11:40 RSP: 0018:ffff881fc88f9ca0  EFLAGS: 00010246
2012-10-30 23:11:40 RAX: 0000000000000000 RBX: 0000000200005dd8 RCX: 00000002000061c0
2012-10-30 23:11:40 RDX: 00000000000003e8 RSI: ffff880f6f574cc0 RDI: ffff880fc9589f40
2012-10-30 23:11:40 RBP: ffff881fc88f9ce0 R08: 0000000000000000 R09: ffff881b34578600
2012-10-30 23:11:40 R10: 0000000000000009 R11: ffffffffa0c86db0 R12: ffff881b345787e8
2012-10-30 23:11:40 R13: ffff880f6f574d30 R14: ffff880f6f574cc0 R15: ffff880fc9589f40
2012-10-30 23:11:40 FS:  00002aaaab47e700(0000) GS:ffff8810788a0000(0000) knlGS:0000000000000000
2012-10-30 23:11:40 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
2012-10-30 23:11:40 CR2: 0000000000000010 CR3: 00000020110da000 CR4: 00000000000406e0
2012-10-30 23:11:40 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
2012-10-30 23:11:40 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
2012-10-30 23:11:40 Process mdt_mdss_0001 (pid: 33877, threadinfo ffff881fc88f8000, task ffff881fc88f2aa0)
2012-10-30 23:11:40 Stack:
2012-10-30 23:11:40  ffff881fc88f9cb0 ffff8809c74e3000 ffff880fc9589940 ffff8809c74e3000
2012-10-30 23:11:40 <0> ffff880fc9589940 ffff880fc9589f40 ffff881b345787e8 00000000ffffffea
2012-10-30 23:11:40 <0> ffff881fc88f9d30 ffffffffa05c3e9f ffff881fc88f9d10 ffffc900c5ea4988
2012-10-30 23:11:40 Call Trace:
2012-10-30 23:11:40  [<ffffffffa05c3e9f>] seq_query+0x37f/0x6d0 [fid]
2012-10-30 23:11:40  [<ffffffffa0eef322>] mdt_handle_common+0x932/0x1760 [mdt]
2012-10-30 23:11:40  [<ffffffffa0ef01c5>] mdt_mdss_handle+0x15/0x20 [mdt]
2012-10-30 23:11:40  [<ffffffffa0bee8cc>] ptlrpc_server_handle_request+0x41c/0xe00 [ptlrpc]
2012-10-30 23:11:40  [<ffffffffa088a6be>] ? cfs_timer_arm+0xe/0x10 [libcfs]
2012-10-30 23:11:40  [<ffffffffa089c14f>] ? lc_watchdog_touch+0x6f/0x180 [libcfs]
2012-10-30 23:11:40  [<ffffffffa0be5c79>] ? ptlrpc_wait_event+0xa9/0x2a0 [ptlrpc]
2012-10-30 23:11:40  [<ffffffff81051ba3>] ? __wake_up+0x53/0x70
2012-10-30 23:11:40  [<ffffffffa0befebc>] ptlrpc_main+0xc0c/0x19f0 [ptlrpc]
2012-10-30 23:11:40  [<ffffffffa0bef2b0>] ? ptlrpc_main+0x0/0x19f0 [ptlrpc]
2012-10-30 23:11:40  [<ffffffff8100c14a>] child_rip+0xa/0x20
2012-10-30 23:11:40  [<ffffffffa0bef2b0>] ? ptlrpc_main+0x0/0x19f0 [ptlrpc]
2012-10-30 23:11:40  [<ffffffffa0bef2b0>] ? ptlrpc_main+0x0/0x19f0 [ptlrpc]
2012-10-30 23:11:40  [<ffffffff8100c140>] ? child_rip+0x0/0x20
2012-10-30 23:11:40 Code: d1 fc ff ff 66 0f 1f 84 00 00 00 00 00 49 8b 86 f8 00 00 00 49 8b 96 e8 00 00 00 4c 89 f6 49 8b 5e 30 49 8b 0e 4c 89 ff 48 8b 00 <48> 8b 40 10 48 8b 40 28 48 63 80 40 01 00 00 49 89 5e 18 49 0f 
2012-10-30 23:11:40 RIP  [<ffffffffa05c393e>] seq_server_alloc_meta+0x51e/0x700 [fid]
2012-10-30 23:11:40  RSP <ffff881fc88f9ca0>
2012-10-30 23:11:40 CR2: 0000000000000010

The specific Lustre version installed:

2012-10-29 16:37:23 Lustre: Lustre: Build Version: 2.3.54-1chaos-2surya1-2surya1--PRISTINE-2.6.32-220.23.1.1chaos.ch5.x86_64

This is the 2.3.54 tag, with our local LLNL patches that haven't yet landed, and a couple patches for LU-2139 (which shouldn't affect the MDS stability).

We did successfully get a crash dump if it is needed:

2012-10-30 23:27:02 The dumpfile is saved to /var/crash/vmcore-grove-mds2-2012-10-31-06:11:59.


 Comments   
Comment by Alex Zhuravlev [ 31/Oct/12 ]

looks like a dup of http://jira.whamcloud.com/browse/LU-2186 ? should be fixed with http://review.whamcloud.com/#change,4280

Comment by Christopher Morrone [ 31/Oct/12 ]

Agreed, duplicate of LU-2186.

Comment by Prakash Surya (Inactive) [ 31/Oct/12 ]

Sorry for the noise. I should have searched before posting.

Generated at Sat Feb 10 01:23:41 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.