Details
-
Bug
-
Resolution: Cannot Reproduce
-
Minor
-
None
-
None
-
3
-
9223372036854775807
Description
This issue was created by maloo for wangdi <di.wang@intel.com>
This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/2db6ece8-024e-11e5-9144-5254006e85c2.
The sub-test test_2e failed with the following error:
test failed to respond and timed out
Although I only saw this in DNE2 test, but I can not see this is related with the patch. Anyway create the ticket first to see if this is repeatable.
12:47:07:Lustre: DEBUG MARKER: == sanity-lfsck test 2e: namespace LFSCK can verify remote object linkEA == 12:43:12 (1432471392) 12:47:07:Lustre: DEBUG MARKER: /usr/sbin/lctl set_param fail_loc=0x1603 12:47:07:Lustre: *** cfs_fail_loc=1603, val=0*** 12:47:07:Lustre: DEBUG MARKER: /usr/sbin/lctl set_param fail_loc=0 12:47:07:Lustre: DEBUG MARKER: /usr/sbin/lctl lfsck_start -M lustre-MDT0000 -t namespace -r -A 12:47:07:Lustre: 25171:0:(client.c:1940:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1432471394/real 1432471394] req@ffff88007b587cc0 x1502045982093880/t0(0) o1101->lustre-MDT0001-osp-MDT0000@10.1.5.252@tcp:24/4 lens 320/224 e 4 to 1 dl 1432471501 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 12:47:07:Lustre: 25171:0:(client.c:1940:ptlrpc_expire_one_request()) Skipped 24 previous similar messages 12:47:07:Lustre: lustre-MDT0001-osp-MDT0000: Connection to lustre-MDT0001 (at 10.1.5.252@tcp) was lost; in progress operations using this service will wait for recovery to complete 12:47:07:Lustre: Skipped 2 previous similar messages 12:47:07:Lustre: MGS: haven't heard from client a8120cf8-cd69-035f-f0a0-2847eb8fd18a (at 10.1.5.251@tcp) in 53 seconds. I think it's dead, and I am evicting it. exp ffff880059ba1400, cur 1432471543 expire 1432471513 last 1432471490 12:47:07:BUG: soft lockup - CPU#0 stuck for 67s! [ll_evictor:21676] 12:47:07:Modules linked in: osp(U) mdd(U) lod(U) mdt(U) lfsck(U) mgs(U) mgc(U) osd_ldiskfs(U) lquota(U) lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) sha512_generic libcfs(U) ldiskfs(U) jbd2 nfsd exportfs nfs lockd fscache auth_rpcgss nfs_acl sunrpc ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib] 12:47:07:CPU 0 12:47:07:Modules linked in: osp(U) mdd(U) lod(U) mdt(U) lfsck(U) mgs(U) mgc(U) osd_ldiskfs(U) lquota(U) lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) sha512_generic libcfs(U) ldiskfs(U) jbd2 nfsd exportfs nfs lockd fscache auth_rpcgss nfs_acl sunrpc ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib] 12:47:07: 12:47:07:Pid: 21676, comm: ll_evictor Not tainted 2.6.32-504.16.2.el6_lustre.g2f99b7f.x86_64 #1 Red Hat KVM 12:47:07:RIP: 0010:[<ffffffff8152dc8b>] [<ffffffff8152dc8b>] _spin_lock_bh+0x2b/0x40 12:47:07:RSP: 0018:ffff88006a551b60 EFLAGS: 00000206 12:47:07:RAX: 0000000000000001 RBX: ffff88006a551b70 RCX: 0000000000000000 12:47:07:RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffa091bc30 12:47:07:RBP: ffffffff8100bb8e R08: 00000000ffffff0a R09: 00000000fffffffe 12:47:07:R10: 0000000000000000 R11: 000000000000005a R12: ffffffffa0567da0 12:47:07:R13: 0060000000000020 R14: ffffffffa08aae77 R15: 0000000a00000004 12:47:07:FS: 0000000000000000(0000) GS:ffff880002200000(0000) knlGS:0000000000000000 12:47:07:CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b 12:47:07:CR2: 00007fb01ff4e000 CR3: 0000000059bd8000 CR4: 00000000000006f0 12:47:07:DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 12:47:07:DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 12:47:07:Process ll_evictor (pid: 21676, threadinfo ffff88006a550000, task ffff880058faf520) 12:47:07:Stack: 12:47:07: ffffffffa08c1aa8 ffff8800662c3900 ffff88006a551b90 ffffffffa07e9b39 12:47:07:<d> ffff8800662c3900 ffff88005aafec00 ffff88006a551bb0 ffffffffa07c3f68 12:47:07:<d> ffff8800662c3900 ffff88006a551ce0 ffff88006a551c00 ffffffffa07c57a0 12:47:07:Call Trace: 12:47:07: [<ffffffffa07e9b39>] ? ldlm_del_waiting_lock+0x29/0x220 [ptlrpc] 12:47:07: [<ffffffffa07c3f68>] ? ldlm_lock_cancel+0x178/0x200 [ptlrpc] 12:47:07: [<ffffffffa07c57a0>] ? ldlm_cancel_locks_for_export_cb+0xb0/0x450 [ptlrpc] 12:47:07: [<ffffffffa04a3dcc>] ? cfs_hash_for_each_relax+0x17c/0x350 [libcfs] 12:47:07: [<ffffffffa05d2e15>] ? uuid_export_put_locked+0x15/0x20 [obdclass] 12:47:07: [<ffffffffa07c56f0>] ? ldlm_cancel_locks_for_export_cb+0x0/0x450 [ptlrpc] 12:47:07: [<ffffffffa07c56f0>] ? ldlm_cancel_locks_for_export_cb+0x0/0x450 [ptlrpc] 12:47:07: [<ffffffffa04a55be>] ? cfs_hash_for_each_empty+0xfe/0x1e0 [libcfs] 12:47:07: [<ffffffffa07bf8af>] ? ldlm_cancel_locks_for_export+0x2f/0x40 [ptlrpc] 12:47:07: [<ffffffffa07cfec4>] ? server_disconnect_export+0x64/0x1a0 [ptlrpc] 12:47:07: [<ffffffffa0eb8fa0>] ? mdt_obd_disconnect+0x50/0x500 [mdt] 12:47:07: [<ffffffffa049dc51>] ? libcfs_debug_msg+0x41/0x50 [libcfs] 12:47:07: [<ffffffffa049dc51>] ? libcfs_debug_msg+0x41/0x50 [libcfs] 12:47:07: [<ffffffffa05b835d>] ? class_fail_export+0x23d/0x540 [obdclass] 12:47:07: [<ffffffffa08234b5>] ? ping_evictor_main+0x245/0x650 [ptlrpc] 12:47:07: [<ffffffff81064bc0>] ? default_wake_function+0x0/0x20 12:47:07: [<ffffffffa0823270>] ? ping_evictor_main+0x0/0x650 [ptlrpc] 12:47:07: [<ffffffff8109e71e>] ? kthread+0x9e/0xc0 12:47:07: [<ffffffff8100c20a>] ? child_rip+0xa/0x20 12:47:07: [<ffffffff8109e680>] ? kthread+0x0/0xc0 12:47:07: [<ffffffff8100c200>] ? child_rip+0x0/0x20 12:47:07:Code: 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00 00 48 89 fb e8 2a f0 b4 ff b8 00 00 01 00 f0 0f c1 03 0f b7 d0 c1 e8 10 39 c2 74 0e f3 90 <0f> 1f 44 00 00 83 3b 00 75 f4 eb df 48 83 c4 08 5b c9 c3 66 90 12:47:07:Call Trace: 12:47:07: [<ffffffffa07e9b39>] ? ldlm_del_waiting_lock+0x29/0x220 [ptlrpc] 12:47:07: [<ffffffffa07c3f68>] ? ldlm_lock_cancel+0x178/0x200 [ptlrpc] 12:47:07: [<ffffffffa07c57a0>] ? ldlm_cancel_locks_for_export_cb+0xb0/0x450 [ptlrpc] 12:47:07: [<ffffffffa04a3dcc>] ? cfs_hash_for_each_relax+0x17c/0x350 [libcfs] 12:47:07: [<ffffffffa05d2e15>] ? uuid_export_put_locked+0x15/0x20 [obdclass] 12:47:07: [<ffffffffa07c56f0>] ? ldlm_cancel_locks_for_export_cb+0x0/0x450 [ptlrpc] 12:47:07: [<ffffffffa07c56f0>] ? ldlm_cancel_locks_for_export_cb+0x0/0x450 [ptlrpc] 12:47:07: [<ffffffffa04a55be>] ? cfs_hash_for_each_empty+0xfe/0x1e0 [libcfs] 12:47:07: [<ffffffffa07bf8af>] ? ldlm_cancel_locks_for_export+0x2f/0x40 [ptlrpc] 12:47:07: [<ffffffffa07cfec4>] ? server_disconnect_export+0x64/0x1a0 [ptlrpc] 12:47:07: [<ffffffffa0eb8fa0>] ? mdt_obd_disconnect+0x50/0x500 [mdt] 12:47:07: [<ffffffffa049dc51>] ? libcfs_debug_msg+0x41/0x50 [libcfs] 12:47:07: [<ffffffffa049dc51>] ? libcfs_debug_msg+0x41/0x50 [libcfs] 12:47:07: [<ffffffffa05b835d>] ? class_fail_export+0x23d/0x540 [obdclass] 12:47:07: [<ffffffffa08234b5>] ? ping_evictor_main+0x245/0x650 [ptlrpc] 12:47:07: [<ffffffff81064bc0>] ? default_wake_function+0x0/0x20 12:47:07: [<ffffffffa0823270>] ? ping_evictor_main+0x0/0x650 [ptlrpc] 12:47:07: [<ffffffff8109e71e>] ? kthread+0x9e/0xc0 12:47:07: [<ffffffff8100c20a>] ? child_rip+0xa/0x20 12:47:07: [<ffffffff8109e680>] ? kthread+0x0/0xc0 12:47:07: [<ffffffff8100c200>] ? child_rip+0x0/0x20 12:47:07:Kernel panic - not syncing: softlockup: hung tasks 12:47:07:Pid: 21676, comm: ll_evictor Not tainted 2.6.32-504.16.2.el6_lustre.g2f99b7f.x86_64 #1 12:47:07:Call Trace: 12:47:07: <IRQ> [<ffffffff81529fbc>] ? panic+0xa7/0x16f 12:47:07: [<ffffffff810ea5e0>] ? watchdog_timer_fn+0x0/0x1e0 12:47:07: [<ffffffff810ea7aa>] ? watchdog_timer_fn+0x1ca/0x1e0 12:47:07: [<ffffffff810a343e>] ? __run_hrtimer+0x8e/0x1a0 12:47:07: [<ffffffff810aa83f>] ? ktime_get_update_offsets+0x4f/0xd0 12:47:07: [<ffffffff810a37a6>] ? hrtimer_interrupt+0xe6/0x260 12:47:07: [<ffffffff8103422d>] ? local_apic_timer_interrupt+0x3d/0x70 12:47:07: [<ffffffff81534905>] ? smp_apic_timer_interrupt+0x45/0x60 12:47:07: [<ffffffff8100bb93>] ? apic_timer_interrupt+0x13/0x20 12:47:07: <EOI> [<ffffffff8152dc8b>] ? _spin_lock_bh+0x2b/0x40 12:47:07: [<ffffffffa07e9b39>] ? ldlm_del_waiting_lock+0x29/0x220 [ptlrpc] 12:47:07: [<ffffffffa07c3f68>] ? ldlm_lock_cancel+0x178/0x200 [ptlrpc] 12:47:07: [<ffffffffa07c57a0>] ? ldlm_cancel_locks_for_export_cb+0xb0/0x450 [ptlrpc] 12:47:07: [<ffffffffa04a3dcc>] ? cfs_hash_for_each_relax+0x17c/0x350 [libcfs] 12:47:07: [<ffffffffa05d2e15>] ? uuid_export_put_locked+0x15/0x20 [obdclass] 12:47:07: [<ffffffffa07c56f0>] ? ldlm_cancel_locks_for_export_cb+0x0/0x450 [ptlrpc] 12:47:07: [<ffffffffa07c56f0>] ? ldlm_cancel_locks_for_export_cb+0x0/0x450 [ptlrpc] 12:47:07: [<ffffffffa04a55be>] ? cfs_hash_for_each_empty+0xfe/0x1e0 [libcfs] 12:47:07: [<ffffffffa07bf8af>] ? ldlm_cancel_locks_for_export+0x2f/0x40 [ptlrpc] 12:47:07: [<ffffffffa07cfec4>] ? server_disconnect_export+0x64/0x1a0 [ptlrpc] 12:47:07: [<ffffffffa0eb8fa0>] ? mdt_obd_disconnect+0x50/0x500 [mdt] 12:47:07: [<ffffffffa049dc51>] ? libcfs_debug_msg+0x41/0x50 [libcfs] 12:47:07: [<ffffffffa049dc51>] ? libcfs_debug_msg+0x41/0x50 [libcfs] 12:47:07: [<ffffffffa05b835d>] ? class_fail_export+0x23d/0x540 [obdclass] 12:47:07: [<ffffffffa08234b5>] ? ping_evictor_main+0x245/0x650 [ptlrpc] 12:47:07: [<ffffffff81064bc0>] ? default_wake_function+0x0/0x20 12:47:07: [<ffffffffa0823270>] ? ping_evictor_main+0x0/0x650 [ptlrpc] 12:47:07: [<ffffffff8109e71e>] ? kthread+0x9e/0xc0 12:47:07: [<ffffffff8100c20a>] ? child_rip+0xa/0x20 12:47:07: [<ffffffff8109e680>] ? kthread+0x0/0xc0 12:47:07: [<ffffffff8100c200>] ? child_rip+0x0/0x20
Attachments
Issue Links
- is related to
-
LU-6831 The ticket for tracking all DNE2 bugs
- Reopened