[LU-4618] NULL pointer dereference in ll_lookup_nd Created: 12/Feb/14 Updated: 09/Jan/20 Resolved: 09/Jan/20 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.1, Lustre 2.5.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Ann Koehler (Inactive) | Assignee: | WC Triage |
| Resolution: | Not a Bug | Votes: | 0 |
| Labels: | None | ||
| Environment: |
SLES11 SP1 and SLES11 SP3 |
||
| Severity: | 3 |
| Rank (Obsolete): | 12639 |
| Description |
|
When statahead is disabled, the client crashes referencing a NULL pointer. c0-0c0s3n1 Lustre: Lustre: Build Version: 2.4.1-trunk-1.0501.14514.14.1-abuild-RB-5.1UP01_2.4.1-2013-12-06-21:46 c0-0c0s3n1 BUG: unable to handle kernel NULL pointer dereference at 000000000000000c c0-0c0s3n1 IP: [<ffffffffa0994ef5>] ll_lookup_it+0x605/0xb00 [lustre] c0-0c0s3n1 PGD 838401067 PUD 827df6067 PMD 0 c0-0c0s3n1 Oops: 0000 [#1] SMP c0-0c0s3n1 CPU 0 c0-0c0s3n1 Modules linked in: lmv mgc lustre lov osc mdc fid fld ptlrpc obdclass lvfs ipt_MASQUERADE ipt_LOG xt_state iptable_filter iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables x_tables binfmt_misc dvspn(P) dvsof(P) kgnilnd dvsutil(P) dvsipc(P) dvsipc_lnet(P) lnet libcfs dvsproc(P) bpmcdmod nic_compat dm_mod ahci libahci libata ehci_hcd usbcore scsi_mod usb_common igb kdreg gpcd_ari ipogif_ari kgni_ari hwerr(P) rca hss_os(P) heartbeat simplex(P) ghal_ari craytrace c0-0c0s3n1 Pid: 7762, comm: rm Tainted: P 3.0.80-0.5.1_1.0501.7664-cray_ari_s #1 Cray Inc. Cascade/Cascade c0-0c0s3n1 RIP: 0010:[<ffffffffa0994ef5>] [<ffffffffa0994ef5>] ll_lookup_it+0x605/0xb00 [lustre] c0-0c0s3n1 RSP: 0018:ffff88082b68dd08 EFLAGS: 00010282 c0-0c0s3n1 RAX: 0000000000001e52 RBX: 0000000000001e52 RCX: ffff88082bcd1800 c0-0c0s3n1 RDX: ffff88082a516500 RSI: ffff880825601140 RDI: ffff88082bcd1800 c0-0c0s3n1 RBP: ffff88082b68dde8 R08: 0000000000000002 R09: 0000000000000000 c0-0c0s3n1 R10: 5a5a5a5a5a5a5a5a R11: 5a5a5a5a5a5a5a5a R12: 0000000000000000 c0-0c0s3n1 R13: ffff88082509b640 R14: 0000000000000000 R15: ffff880827e376c0 c0-0c0s3n1 FS: 00007f259e978700(0000) GS:ffff88087fa00000(0000) knlGS:0000000000000000 c0-0c0s3n1 FS: 00007f259e978700(0000) GS:ffff88087fa00000(0000) knlGS:0000000000000000 c0-0c0s3n1 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b c0-0c0s3n1 CR2: 000000000000000c CR3: 000000082b58d000 CR4: 00000000000406f0 c0-0c0s3n1 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 c0-0c0s3n1 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 c0-0c0s3n1 Process rm (pid: 7762, threadinfo ffff88082b68c000, task ffff8808257dc7a0) c0-0c0s3n1 Stack: c0-0c0s3n1 ffff88082b68ddb0 ffffffffa0990090 0000000000000000 0000000000000048 c0-0c0s3n1 0000000020000280 ffff88080064b8c0 ffff880827e376c0 ffff88080064b8c0 c0-0c0s3n1 0000000000000010 0000000000000000 0000000000000000 0000000000000000 c0-0c0s3n1 Call Trace: c0-0c0s3n1 [<ffffffffa099547c>] ll_lookup_nd+0x8c/0x3e0 [lustre] c0-0c0s3n1 [<ffffffff8114d83c>] d_alloc_and_lookup+0x4c/0x80 c0-0c0s3n1 [<ffffffff8114d96e>] __lookup_hash+0xfe/0x180 c0-0c0s3n1 [<ffffffff81152296>] do_unlinkat+0xa6/0x1c0 c0-0c0s3n1 [<ffffffff81152522>] sys_unlinkat+0x22/0x40 c0-0c0s3n1 [<ffffffff814d9aab>] system_call_fastpath+0x16/0x1b c0-0c0s3n1 [<00007f259e4fad28>] 0x7f259e4fad27 c0-0c0s3n1 Code: 58 ff ff ff 8b 9a bc 03 00 00 4c 8b b2 a8 03 00 00 4c 8b 68 78 e8 7c e2 94 ff 39 c3 0f 85 a8 fa ff ff 4d 85 ed 0f 84 9f fa ff ff c0-0c0s3n1 8b 46 0c 41 89 45 60 0f 1f 00 e9 8f fa ff ff 0f 1f 00 49 8b c0-0c0s3n1 RIP [<ffffffffa0994ef5>] ll_lookup_it+0x605/0xb00 [lustre] c0-0c0s3n1 RSP <ffff88082b68dd08> c0-0c0s3n1 CR2: 000000000000000c c0-0c0s3n1 ---[ end trace 4e73cb9a15a458fd ]--- c0-0c0s3n1 Kernel panic - not syncing: Fatal exception Reproducer: The bug has been seen on both 2.4 and 2.5 clients, but goes away with the patch from |
| Comments |
| Comment by Ann Koehler (Inactive) [ 12/Feb/14 ] |
|
Our customer reports that they can not reproduce this bug with a 2.4.2 client built purely from upstream source. So it's possible the bug is only in Cray source, in which case we'll take care of it. My primary purpose in filing the ticket was in case the bug might be an issue for other users. |
| Comment by Oleg Drokin [ 13/Feb/14 ] |
|
Well, I imagine if you have some patches that are specific to your tree from e.g. Xyratex, you might want to ask them to look at it? |
| Comment by Ann Koehler (Inactive) [ 13/Feb/14 ] |
|
My mistake. Just tried the reproducer out on released 2.4, 2.4.1 and 2.5 versions on a VM system. The bug does NOT occur. Guess I should have done the test before filing this bug. Sorry. It's just hard to imagine what we might have changed in our code that would break statahead this way. Anyway, not a community problem so feel free to close this ticket. |
| Comment by Ann Koehler (Inactive) [ 19/Feb/14 ] |
|
In case anyone is interested, I found our problem. Got the patch for |