Details
-
Bug
-
Resolution: Fixed
-
Minor
-
None
-
None
-
3
-
9223372036854775807
Description
There have been several occurrences in the field (at sites running with various 2.1/2.5 based Lustre versions) of this kind of crashes with following signatures/stacks examples :
<1>BUG: unable to handle kernel paging request at ffffffff00000018 <1>IP: [<ffffffff811ad11c>] __d_lookup+0x8c/0x150 <4>PGD 1a8f067 PUD 0 <4>Oops: 0000 [#1] SMP <4>last sysfs file: /sys/devices/system/cpu/online <4>CPU 0 <4>Modules linked in: lmv(U) fld(U) mgc(U) lustre(U) lov(U) osc(U) mdc(U) fid(U) ko2iblnd(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) libcfs(U) nfs fscache iptable_filter ip_tables nfsd nfs_acl auth_rpcgss exportfs autofs4 sha512_generic crc32c_intel lockd sunrpc bonding ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm sg ipmi_devintf joydev microcode power_meter acpi_ipmi ipmi_si ipmi_msghandler iTCO_wdt iTCO_vendor_support bnx2x libcrc32c mdio dcdbas sb_edac edac_core lpc_ich mfd_core shpchp ext4 jbd2 mbcache mlx4_ib ib_sa ib_mad ib_core ib_addr ipv6 mlx4_en ptp pps_core mlx4_core sd_mod crc_t10dif ahci megaraid_sas wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: libcfs] <4> <4>Pid: 25256, comm: rsync Not tainted 2.6.32-573.1.1.el6.x86_64 #1 Dell Inc. PowerEdge R630/0CNCJW <4>RIP: 0010:[<ffffffff811ad11c>] [<ffffffff811ad11c>] __d_lookup+0x8c/0x150 <4>RSP: 0018:ffff881b8cd4fb98 EFLAGS: 00010286 <4>RAX: 0000000000000010 RBX: ffffffff00000000 RCX: 0000000000000018 <4>RDX: 018721e0b8549035 RSI: ffff881b8cd4fcd8 RDI: ffff880b4e78c300 <4>RBP: ffff881b8cd4fbe8 R08: 0000000000000001 R09: 0000000000000000 <4>R10: 0000000000000001 R11: 0000000000000001 R12: fffffffeffffffe8 <4>R13: ffff880b4e78c300 R14: 00000000e58e7d29 R15: ffff881ed58a1520 <4>FS: 00007f09288c4700(0000) GS:ffff880062400000(0000) knlGS:0000000000000000 <4>CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>CR2: ffffffff00000018 CR3: 0000001d76c3a000 CR4: 00000000001407f0 <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 <4>Process rsync (pid: 25256, threadinfo ffff881b8cd4c000, task ffff881ed58a1520) <4>Stack: <4> ffff880c2a16d046 000000108119f01c 0000000000000010 ffff881b8cd4fcd8 <4><d> 0000000000000001 ffff881b8cd4fdb8 ffff881b8cd4fce8 ffff881b8cd4fcd8 <4><d> ffff8820662e5a80 ffff881ed58a1520 ffff881b8cd4fc48 ffffffff811a16f6 <4>Call Trace: <4> [<ffffffff811a16f6>] do_lookup+0x36/0x230 <4> [<ffffffffa0d31cd2>] ? cfs_hash_bd_from_key+0x42/0xd0 [libcfs] <4> [<ffffffff811a24f4>] __link_path_walk+0x7a4/0x1000 <4> [<ffffffffa11b7cf2>] ? osc_find_cbdata+0xa2/0x150 [osc] <4> [<ffffffff811a300a>] path_walk+0x6a/0xe0 <4> [<ffffffff811a321b>] filename_lookup+0x6b/0xc0 <4> [<ffffffff811a4347>] user_path_at+0x57/0xa0 <4> [<ffffffff810f326e>] ? call_rcu+0xe/0x10 <4> [<ffffffff811ab90f>] ? d_free+0x3f/0x60 <4> [<ffffffff811b43d0>] ? mntput_no_expire+0x30/0x110 <4> [<ffffffff811a0331>] ? path_put+0x31/0x40 <4> [<ffffffff81197750>] vfs_fstatat+0x50/0xa0 <4> [<ffffffff8119780e>] vfs_lstat+0x1e/0x20 <4> [<ffffffff81197834>] sys_newlstat+0x24/0x50 <4> [<ffffffff810e8ab7>] ? audit_syscall_entry+0x1d7/0x200 <4> [<ffffffff8100c6f5>] ? math_state_restore+0x45/0x60 <4> [<ffffffff8153be5e>] ? do_device_not_available+0xe/0x10 <4> [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b <4>Code: 48 03 05 d8 ea a6 00 48 8b 18 8b 45 bc 48 85 db 48 89 45 c0 75 11 eb 74 0f 1f 80 00 00 00 00 48 8b 1b 48 85 db 74 65 4c 8d 63 e8 <45> 39 74 24 30 75 ed 4d 39 6c 24 28 75 e6 4d 8d 7c 24 08 4c 89 <1>RIP [<ffffffff811ad11c>] __d_lookup+0x8c/0x150 <4> RSP <ffff881b8cd4fb98> <4>CR2: ffffffff00000018
or
<1>BUG: unable to handle kernel paging request at ffffffff00000008 <1>IP: [<ffffffffa13821d5>] ll_md_blocking_ast+0x615/0x7d0 [lustre] <4>PGD 1a87067 PUD 0 <4>Oops: 0002 1 SMP <4>last sysfs file: /sys/devices/system/cpu/online <4>CPU 21 <4>Modules linked in: nfs fscache lmv(U) fld(U) mgc(U) lustre(U) lov(U) osc(U) mdc(U) fid(U) ptlrpc(U) obdclass(U) ko2iblnd(U) lnet(U) sha512_generic sha256_generic crc32c_intel libcfs(U) nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs autofs4 cpufreq_ondemand acpi_cpufreq freq_table mperf rdma_ucm(U) ib_ucm(U) rdma_cm(U) iw_cm(U) ib_ipoib(U) ib_cm(U) ib_uverbs(U) ib_umad(U) mlx5_ib(U) mlx5_core(U) mlx4_en(U) mlx4_ib(U) ib_sa(U) ib_mad(U) ib_core(U) ib_addr(U) ipv6 mlx4_core(U) compat(U) microcode iTCO_wdt iTCO_vendor_support power_meter sg nvidia(P)(U) i2c_i801 lpc_ich mfd_core shpchp igb dca i2c_algo_bit i2c_core ptp pps_core be2net ext4 jbd2 mbcache sd_mod crc_t10dif megaraid_sas xhci_hcd ahci wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] <4> <4>Pid: 2309, comm: ll_imp_inval Tainted: P --------------- 2.6.32-431.el6.x86_64 #1 Supermicro X10DRi/X10DRi-T <4>RIP: 0010:[<ffffffffa13821d5>] [<ffffffffa13821d5>] ll_md_blocking_ast+0x615/0x7d0 [lustre] <4>RSP: 0018:ffff887e10b8db40 EFLAGS: 00010286 <4>RAX: ffff88401d60a8c8 RBX: ffff883e6e1a8d40 RCX: ffffc90002ad37f8 <4>RDX: ffffffff00000000 RSI: 0000000000000000 RDI: ffff88401d60a8c8 <4>RBP: ffff887e10b8dbe0 R08: 0000000000000003 R09: 000000000000001b <4>R10: 0000000000015dfb R11: 0000000000000000 R12: ffff88401d60a8c0 <4>R13: ffff887ce475f638 R14: ffff887d1d695ea0 R15: ffff887d1d695e40 <4>FS: 0000000000000000(0000) GS:ffff880190ec0000(0000) knlGS:0000000000000000 <4>CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b <4>CR2: ffffffff00000008 CR3: 0000004023d54000 CR4: 00000000001407e0 <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 <4>Process ll_imp_inval (pid: 2309, threadinfo ffff887e10b8c000, task ffff887eaf803500) <4>Stack: <4> 1700fade1700fade ffff883cdfae4c4b ffffffffa10cdf50 ffff88401d60a8c8 <4><d> ffff887ce475f668 ffff887d1d695e48 ffffc9004abf71f0 ffff887b9c92d9c0 <4><d> ffff887e10b8dba0 ffffffffa0e5a23c ffff887b9c92d9c0 0000000000000013 <4>Call Trace: <4> [<ffffffffa0e5a23c>] ? class_handle_unhash+0x3c/0x50 [obdclass] <4> [<ffffffffa103903c>] ldlm_cancel_callback+0x6c/0x1a0 [ptlrpc] <4> [<ffffffffa104859a>] ldlm_cli_cancel_local+0x8a/0x470 [ptlrpc] <4> [<ffffffffa104d030>] ldlm_cli_cancel+0x60/0x360 [ptlrpc] <4> [<ffffffffa1041e3d>] cleanup_resource+0x18d/0x310 [ptlrpc] <4> [<ffffffffa0d07ade>] ? cfs_hash_spin_lock+0xe/0x10 [libcfs] <4> [<ffffffffa1041fef>] ldlm_resource_clean+0x2f/0x60 [ptlrpc] <4> [<ffffffffa0d07d5c>] cfs_hash_for_each_relax+0x17c/0x350 [libcfs] <4> [<ffffffffa1041fc0>] ? ldlm_resource_clean+0x0/0x60 [ptlrpc] <4> [<ffffffffa1041fc0>] ? ldlm_resource_clean+0x0/0x60 [ptlrpc] <4> [<ffffffffa0d096ef>] cfs_hash_for_each_nolock+0x7f/0x1c0 [libcfs] <4> [<ffffffffa103ed7e>] ldlm_namespace_cleanup+0x2e/0xc0 [ptlrpc] <4> [<ffffffffa11e7cc9>] mdc_import_event+0x1e9/0xa30 [mdc] <4> [<ffffffffa108b27c>] ptlrpc_invalidate_import+0x2bc/0x8f0 [ptlrpc] <4> [<ffffffffa0d019f1>] ? libcfs_debug_msg+0x41/0x50 [libcfs] <4> [<ffffffffa108e550>] ? ptlrpc_invalidate_import_thread+0x0/0x2e0 [ptlrpc] <4> [<ffffffffa108e598>] ptlrpc_invalidate_import_thread+0x48/0x2e0 [ptlrpc] <4> [<ffffffff8109aef6>] kthread+0x96/0xa0 <4> [<ffffffff8100c20a>] child_rip+0xa/0x20 <4> [<ffffffff8109ae60>] ? kthread+0x0/0xa0 <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20 <4>Code: 8b 14 24 85 d2 75 37 41 8b 54 24 04 f6 c2 10 75 2d 83 ca 10 49 8b 4c 24 20 41 89 54 24 04 49 8b 54 24 18 48 85 d2 48 89 11 74 04 <48> 89 4a 08 48 ba 00 02 20 00 00 00 ad de 49 89 54 24 20 66 ff <1>RIP [<ffffffffa13821d5>] ll_md_blocking_ast+0x615/0x7d0 [lustre] <4> RSP <ffff887e10b8db40> <4>CR2: ffffffff00000008
or
<1>BUG: unable to handle kernel paging request at ffffffff00000018 <1>IP: [<ffffffff811a375c>] __d_lookup+0x8c/0x150 <4>PGD 1a87067 PUD 0 <4>Oops: 0000 1 SMP <4>last sysfs file: /sys/devices/system/cpu/online <4>CPU 10 <4>Modules linked in: lmv(U) fld(U) mgc(U) lustre(U) lov(U) osc(U) mdc(U) fid(U) ptlrpc(U) obdclass(U) ko2iblnd(U) lnet(U) libcfs(U) bridge stp llc nfs fscache sha512_generic sha256_generic crc32c_intel nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs autofs4 cpufreq_ondemand acpi_cpufreq freq_table mperf iptable_filter ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables rdma_ucm(U) ib_ucm(U) rdma_cm(U) iw_cm(U) ib_ipoib(U) ib_cm(U) ib_uverbs(U) ib_umad(U) mlx5_ib(U) mlx5_core(U) mlx4_en(U) mlx4_ib(U) ib_sa(U) ib_mad(U) ib_core(U) ib_addr(U) ipv6 mlx4_core(U) compat(U) microcode iTCO_wdt iTCO_vendor_support power_meter nvidia(P)(U) i2c_i801 sg lpc_ich mfd_core shpchp igb dca i2c_algo_bit i2c_core ptp pps_core be2net ext4 jbd2 mbcache sd_mod crc_t10dif sr_mod cdrom megaraid_sas xhci_hcd ahci wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: libcfs] <4> <4>Pid: 33054, comm: tar Tainted: P --------------- 2.6.32-431.el6.x86_64 #1 Supermicro SYS-7048R-TR/X10DRi <4>RIP: 0010:[<ffffffff811a375c>] [<ffffffff811a375c>] __d_lookup+0x8c/0x150 <4>RSP: 0018:ffff880ba631fbd8 EFLAGS: 00010286 <4>RAX: 0000000000000003 RBX: ffffffff00000000 RCX: 000000000000001a <4>RDX: 018721de7d980c23 RSI: ffff880ba631fd18 RDI: ffff8860a3ef4e40 <4>RBP: ffff880ba631fc28 R08: 0000000000000000 R09: 0000000000000000 <4>R10: 0000000000000001 R11: 0000000000000001 R12: fffffffeffffffe8 <4>R13: ffff8860a3ef4e40 R14: 000000000027beea R15: ffff8838a4086080 <4>FS: 00007fbfb40467a0(0000) GS:ffff884161400000(0000) knlGS:0000000000000000 <4>CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>CR2: ffffffff00000018 CR3: 00000015850f1000 CR4: 00000000001407e0 <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 <4>Process tar (pid: 33054, threadinfo ffff880ba631e000, task ffff8838a4086080) <4>Stack: <4> ffff8878808c004d 0000000381196213 0000000000000003 ffff880ba631fd18 <4><d> 0000000000000001 ffff880ba631fe08 ffff880ba631fd28 ffff880ba631fd18 <4><d> ffff888024d055c0 ffff8838a4086080 ffff880ba631fc88 ffffffff811988c6 <4>Call Trace: <4> [<ffffffff811988c6>] do_lookup+0x36/0x230 <4> [<ffffffffa0f08462>] ? ldlm_res_hop_get_locked+0x12/0x20 [ptlrpc] <4> [<ffffffff81198dc0>] __link_path_walk+0x200/0xff0 <4> [<ffffffffa0f09766>] ? ldlm_resource_putref+0x66/0x280 [ptlrpc] <4> [<ffffffff81199e6a>] path_walk+0x6a/0xe0 <4> [<ffffffff8119b64a>] do_filp_open+0x1fa/0xd20 <4> [<ffffffff810ec785>] ? call_rcu_sched+0x15/0x20 <4> [<ffffffff810ec79e>] ? call_rcu+0xe/0x10 <4> [<ffffffff81282705>] ? _atomic_dec_and_lock+0x55/0x80 <4> [<ffffffff811aaa20>] ? mntput_no_expire+0x30/0x110 <4> [<ffffffff811a8212>] ? alloc_fd+0x92/0x160 <4> [<ffffffff81185d29>] do_sys_open+0x69/0x140 <4> [<ffffffff8100c715>] ? math_state_restore+0x45/0x60 <4> [<ffffffff81185e40>] sys_open+0x20/0x30 <4> [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b <4>Code: 48 03 05 f8 6c a6 00 48 8b 18 8b 45 bc 48 85 db 48 89 45 c0 75 11 eb 74 0f 1f 80 00 00 00 00 48 8b 1b 48 85 db 74 65 4c 8d 63 e8 <45> 39 74 24 30 75 ed 4d 39 6c 24 28 75 e6 4d 8d 7c 24 08 4c 89 <1>RIP [<ffffffff811a375c>] __d_lookup+0x8c/0x150 <4> RSP <ffff880ba631fbd8> <4>CR2: ffffffff00000018
or
general protection fault: 0000 [#1] SMP last sysfs file: /sys/devices/system/cpu/cpu11/cache/index2/shared_cpu_map CPU 0 Modules linked in: cpufreq_ondemand nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables mgc(U) lustre(U) lov(U) osc(U) lquota(U) mdc(U) fid(U) fld(U) ptlrpc(U) obdclass(U) lvfs(U) ko2iblnd(U) lnet(U) libcfs(U) acpi_cpufreq freq_table mperf ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ib_sa dm_mirror dm_region_hash dm_log dm_mod vhost_net macvtap macvlan tun kvm uinput ahci ib_qib(U) ib_mad ib_core dcdbas microcode i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support ioatdma i7core_edac edac_core shpchp xt_owner ipt_LOG xt_multiport ipv6 nfs lockd fscache nfs_acl auth_rpcgss sunrpc igb dca [last unloaded: cpufreq_ondemand] Pid: 3895, comm: ll_sa_25915 Tainted: G W ---------------- 2.6.32-220.23.1.1chaos.ch5.x86_64 #1 Dell XS23-TY35 /0GW08P RIP: 0010:[<ffffffff8118fc8c>] [<ffffffff8118fc8c>] __d_lookup+0x8c/0x150 RSP: 0018:ffff88049ad3dcc0 EFLAGS: 00010202 RAX: 000000000000000f RBX: 2e342036343a3732 RCX: 0000000000000016 RDX: 018721e08df08940 RSI: ffff88049ad3ddc0 RDI: ffff880421557300 RBP: ffff88049ad3dd10 R08: ffff880589fdad30 R09: 00000000ffffffff R10: 0000000000000000 R11: 0000000000000000 R12: 2e342036343a371a R13: ffff880421557300 R14: 000000009e75e374 R15: ffff8803a22822b8 FS: 00002aaaab05db20(0000) GS:ffff880028200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007fffffffc010 CR3: 0000000297c13000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400 Process ll_sa_25915 (pid: 3895, threadinfo ffff88049ad3c000, task ffff8804f49a0aa0) Stack: ffff8802fc86b3b8 0000000f00000246 000000000000000f ffff88049ad3ddc0 <0> ffff880028215fc0 0000000002170c3c ffff88049ad3ddc0 ffff880421557300 <0> ffff880421557300 ffff8803a22822b8 ffff88049ad3dd40 ffffffff811908fc Call Trace: [<ffffffff811908fc>] d_lookup+0x3c/0x60 [<ffffffffa09b672c>] ll_statahead_one+0x1ec/0x14a0 [lustre] [<ffffffff81051ba3>] ? __wake_up+0x53/0x70 [<ffffffff8109144c>] ? remove_wait_queue+0x3c/0x50 [<ffffffffa09b7c98>] ll_statahead_thread+0x2b8/0x890 [lustre] [<ffffffff8105ea30>] ? default_wake_function+0x0/0x20 [<ffffffffa09b79e0>] ? ll_statahead_thread+0x0/0x890 [lustre] [<ffffffff8100c14a>] child_rip+0xa/0x20 [<ffffffffa09b79e0>] ? ll_statahead_thread+0x0/0x890 [lustre] [<ffffffffa09b79e0>] ? ll_statahead_thread+0x0/0x890 [lustre] [<ffffffff8100c140>] ? child_rip+0x0/0x20 Code: 48 03 05 88 4b a7 00 48 8b 18 8b 45 bc 48 85 db 48 89 45 c0 75 11 eb 74 0f 1f 80 00 00 00 00 48 8b 1b 48 85 db 74 65 4c 8d 63 e8 <45> 39 74 24 30 75 ed 4d 39 6c 24 28 75 e6 4d 8d 7c 24 08 4c 89 RIP [<ffffffff8118fc8c>] __d_lookup+0x8c/0x150 RSP <ffff88049ad3dcc0>
All of their crash-dumps analysis show the same problem of a dentry->d_hash->next corrupted pointer.
Attachments
Issue Links
- is related to
-
LU-2704 GPF in __d_lookup called from ll_statahead_one
-
- Resolved
-