Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6639

sanity-lfsck test_2e: cpu softlock

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Minor
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for wangdi <di.wang@intel.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/2db6ece8-024e-11e5-9144-5254006e85c2.

      The sub-test test_2e failed with the following error:

      test failed to respond and timed out
      

      Although I only saw this in DNE2 test, but I can not see this is related with the patch. Anyway create the ticket first to see if this is repeatable.

      12:47:07:Lustre: DEBUG MARKER: == sanity-lfsck test 2e: namespace LFSCK can verify remote object linkEA == 12:43:12 (1432471392)
      12:47:07:Lustre: DEBUG MARKER: /usr/sbin/lctl set_param fail_loc=0x1603
      12:47:07:Lustre: *** cfs_fail_loc=1603, val=0***
      12:47:07:Lustre: DEBUG MARKER: /usr/sbin/lctl set_param fail_loc=0
      12:47:07:Lustre: DEBUG MARKER: /usr/sbin/lctl lfsck_start -M lustre-MDT0000 -t namespace -r -A
      12:47:07:Lustre: 25171:0:(client.c:1940:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1432471394/real 1432471394]  req@ffff88007b587cc0 x1502045982093880/t0(0) o1101->lustre-MDT0001-osp-MDT0000@10.1.5.252@tcp:24/4 lens 320/224 e 4 to 1 dl 1432471501 ref 1 fl Rpc:X/0/ffffffff rc 0/-1
      12:47:07:Lustre: 25171:0:(client.c:1940:ptlrpc_expire_one_request()) Skipped 24 previous similar messages
      12:47:07:Lustre: lustre-MDT0001-osp-MDT0000: Connection to lustre-MDT0001 (at 10.1.5.252@tcp) was lost; in progress operations using this service will wait for recovery to complete
      12:47:07:Lustre: Skipped 2 previous similar messages
      12:47:07:Lustre: MGS: haven't heard from client a8120cf8-cd69-035f-f0a0-2847eb8fd18a (at 10.1.5.251@tcp) in 53 seconds. I think it's dead, and I am evicting it. exp ffff880059ba1400, cur 1432471543 expire 1432471513 last 1432471490
      12:47:07:BUG: soft lockup - CPU#0 stuck for 67s! [ll_evictor:21676]
      12:47:07:Modules linked in: osp(U) mdd(U) lod(U) mdt(U) lfsck(U) mgs(U) mgc(U) osd_ldiskfs(U) lquota(U) lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) sha512_generic libcfs(U) ldiskfs(U) jbd2 nfsd exportfs nfs lockd fscache auth_rpcgss nfs_acl sunrpc ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
      12:47:07:CPU 0 
      12:47:07:Modules linked in: osp(U) mdd(U) lod(U) mdt(U) lfsck(U) mgs(U) mgc(U) osd_ldiskfs(U) lquota(U) lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) sha512_generic libcfs(U) ldiskfs(U) jbd2 nfsd exportfs nfs lockd fscache auth_rpcgss nfs_acl sunrpc ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
      12:47:07:
      12:47:07:Pid: 21676, comm: ll_evictor Not tainted 2.6.32-504.16.2.el6_lustre.g2f99b7f.x86_64 #1 Red Hat KVM
      12:47:07:RIP: 0010:[<ffffffff8152dc8b>]  [<ffffffff8152dc8b>] _spin_lock_bh+0x2b/0x40
      12:47:07:RSP: 0018:ffff88006a551b60  EFLAGS: 00000206
      12:47:07:RAX: 0000000000000001 RBX: ffff88006a551b70 RCX: 0000000000000000
      12:47:07:RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffa091bc30
      12:47:07:RBP: ffffffff8100bb8e R08: 00000000ffffff0a R09: 00000000fffffffe
      12:47:07:R10: 0000000000000000 R11: 000000000000005a R12: ffffffffa0567da0
      12:47:07:R13: 0060000000000020 R14: ffffffffa08aae77 R15: 0000000a00000004
      12:47:07:FS:  0000000000000000(0000) GS:ffff880002200000(0000) knlGS:0000000000000000
      12:47:07:CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
      12:47:07:CR2: 00007fb01ff4e000 CR3: 0000000059bd8000 CR4: 00000000000006f0
      12:47:07:DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      12:47:07:DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      12:47:07:Process ll_evictor (pid: 21676, threadinfo ffff88006a550000, task ffff880058faf520)
      12:47:07:Stack:
      12:47:07: ffffffffa08c1aa8 ffff8800662c3900 ffff88006a551b90 ffffffffa07e9b39
      12:47:07:<d> ffff8800662c3900 ffff88005aafec00 ffff88006a551bb0 ffffffffa07c3f68
      12:47:07:<d> ffff8800662c3900 ffff88006a551ce0 ffff88006a551c00 ffffffffa07c57a0
      12:47:07:Call Trace:
      12:47:07: [<ffffffffa07e9b39>] ? ldlm_del_waiting_lock+0x29/0x220 [ptlrpc]
      12:47:07: [<ffffffffa07c3f68>] ? ldlm_lock_cancel+0x178/0x200 [ptlrpc]
      12:47:07: [<ffffffffa07c57a0>] ? ldlm_cancel_locks_for_export_cb+0xb0/0x450 [ptlrpc]
      12:47:07: [<ffffffffa04a3dcc>] ? cfs_hash_for_each_relax+0x17c/0x350 [libcfs]
      12:47:07: [<ffffffffa05d2e15>] ? uuid_export_put_locked+0x15/0x20 [obdclass]
      12:47:07: [<ffffffffa07c56f0>] ? ldlm_cancel_locks_for_export_cb+0x0/0x450 [ptlrpc]
      12:47:07: [<ffffffffa07c56f0>] ? ldlm_cancel_locks_for_export_cb+0x0/0x450 [ptlrpc]
      12:47:07: [<ffffffffa04a55be>] ? cfs_hash_for_each_empty+0xfe/0x1e0 [libcfs]
      12:47:07: [<ffffffffa07bf8af>] ? ldlm_cancel_locks_for_export+0x2f/0x40 [ptlrpc]
      12:47:07: [<ffffffffa07cfec4>] ? server_disconnect_export+0x64/0x1a0 [ptlrpc]
      12:47:07: [<ffffffffa0eb8fa0>] ? mdt_obd_disconnect+0x50/0x500 [mdt]
      12:47:07: [<ffffffffa049dc51>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
      12:47:07: [<ffffffffa049dc51>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
      12:47:07: [<ffffffffa05b835d>] ? class_fail_export+0x23d/0x540 [obdclass]
      12:47:07: [<ffffffffa08234b5>] ? ping_evictor_main+0x245/0x650 [ptlrpc]
      12:47:07: [<ffffffff81064bc0>] ? default_wake_function+0x0/0x20
      12:47:07: [<ffffffffa0823270>] ? ping_evictor_main+0x0/0x650 [ptlrpc]
      12:47:07: [<ffffffff8109e71e>] ? kthread+0x9e/0xc0
      12:47:07: [<ffffffff8100c20a>] ? child_rip+0xa/0x20
      12:47:07: [<ffffffff8109e680>] ? kthread+0x0/0xc0
      12:47:07: [<ffffffff8100c200>] ? child_rip+0x0/0x20
      12:47:07:Code: 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00 00 48 89 fb e8 2a f0 b4 ff b8 00 00 01 00 f0 0f c1 03 0f b7 d0 c1 e8 10 39 c2 74 0e f3 90 <0f> 1f 44 00 00 83 3b 00 75 f4 eb df 48 83 c4 08 5b c9 c3 66 90 
      12:47:07:Call Trace:
      12:47:07: [<ffffffffa07e9b39>] ? ldlm_del_waiting_lock+0x29/0x220 [ptlrpc]
      12:47:07: [<ffffffffa07c3f68>] ? ldlm_lock_cancel+0x178/0x200 [ptlrpc]
      12:47:07: [<ffffffffa07c57a0>] ? ldlm_cancel_locks_for_export_cb+0xb0/0x450 [ptlrpc]
      12:47:07: [<ffffffffa04a3dcc>] ? cfs_hash_for_each_relax+0x17c/0x350 [libcfs]
      12:47:07: [<ffffffffa05d2e15>] ? uuid_export_put_locked+0x15/0x20 [obdclass]
      12:47:07: [<ffffffffa07c56f0>] ? ldlm_cancel_locks_for_export_cb+0x0/0x450 [ptlrpc]
      12:47:07: [<ffffffffa07c56f0>] ? ldlm_cancel_locks_for_export_cb+0x0/0x450 [ptlrpc]
      12:47:07: [<ffffffffa04a55be>] ? cfs_hash_for_each_empty+0xfe/0x1e0 [libcfs]
      12:47:07: [<ffffffffa07bf8af>] ? ldlm_cancel_locks_for_export+0x2f/0x40 [ptlrpc]
      12:47:07: [<ffffffffa07cfec4>] ? server_disconnect_export+0x64/0x1a0 [ptlrpc]
      12:47:07: [<ffffffffa0eb8fa0>] ? mdt_obd_disconnect+0x50/0x500 [mdt]
      12:47:07: [<ffffffffa049dc51>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
      12:47:07: [<ffffffffa049dc51>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
      12:47:07: [<ffffffffa05b835d>] ? class_fail_export+0x23d/0x540 [obdclass]
      12:47:07: [<ffffffffa08234b5>] ? ping_evictor_main+0x245/0x650 [ptlrpc]
      12:47:07: [<ffffffff81064bc0>] ? default_wake_function+0x0/0x20
      12:47:07: [<ffffffffa0823270>] ? ping_evictor_main+0x0/0x650 [ptlrpc]
      12:47:07: [<ffffffff8109e71e>] ? kthread+0x9e/0xc0
      12:47:07: [<ffffffff8100c20a>] ? child_rip+0xa/0x20
      12:47:07: [<ffffffff8109e680>] ? kthread+0x0/0xc0
      12:47:07: [<ffffffff8100c200>] ? child_rip+0x0/0x20
      12:47:07:Kernel panic - not syncing: softlockup: hung tasks
      12:47:07:Pid: 21676, comm: ll_evictor Not tainted 2.6.32-504.16.2.el6_lustre.g2f99b7f.x86_64 #1
      12:47:07:Call Trace:
      12:47:07: <IRQ>  [<ffffffff81529fbc>] ? panic+0xa7/0x16f
      12:47:07: [<ffffffff810ea5e0>] ? watchdog_timer_fn+0x0/0x1e0
      12:47:07: [<ffffffff810ea7aa>] ? watchdog_timer_fn+0x1ca/0x1e0
      12:47:07: [<ffffffff810a343e>] ? __run_hrtimer+0x8e/0x1a0
      12:47:07: [<ffffffff810aa83f>] ? ktime_get_update_offsets+0x4f/0xd0
      12:47:07: [<ffffffff810a37a6>] ? hrtimer_interrupt+0xe6/0x260
      12:47:07: [<ffffffff8103422d>] ? local_apic_timer_interrupt+0x3d/0x70
      12:47:07: [<ffffffff81534905>] ? smp_apic_timer_interrupt+0x45/0x60
      12:47:07: [<ffffffff8100bb93>] ? apic_timer_interrupt+0x13/0x20
      12:47:07: <EOI>  [<ffffffff8152dc8b>] ? _spin_lock_bh+0x2b/0x40
      12:47:07: [<ffffffffa07e9b39>] ? ldlm_del_waiting_lock+0x29/0x220 [ptlrpc]
      12:47:07: [<ffffffffa07c3f68>] ? ldlm_lock_cancel+0x178/0x200 [ptlrpc]
      12:47:07: [<ffffffffa07c57a0>] ? ldlm_cancel_locks_for_export_cb+0xb0/0x450 [ptlrpc]
      12:47:07: [<ffffffffa04a3dcc>] ? cfs_hash_for_each_relax+0x17c/0x350 [libcfs]
      12:47:07: [<ffffffffa05d2e15>] ? uuid_export_put_locked+0x15/0x20 [obdclass]
      12:47:07: [<ffffffffa07c56f0>] ? ldlm_cancel_locks_for_export_cb+0x0/0x450 [ptlrpc]
      12:47:07: [<ffffffffa07c56f0>] ? ldlm_cancel_locks_for_export_cb+0x0/0x450 [ptlrpc]
      12:47:07: [<ffffffffa04a55be>] ? cfs_hash_for_each_empty+0xfe/0x1e0 [libcfs]
      12:47:07: [<ffffffffa07bf8af>] ? ldlm_cancel_locks_for_export+0x2f/0x40 [ptlrpc]
      12:47:07: [<ffffffffa07cfec4>] ? server_disconnect_export+0x64/0x1a0 [ptlrpc]
      12:47:07: [<ffffffffa0eb8fa0>] ? mdt_obd_disconnect+0x50/0x500 [mdt]
      12:47:07: [<ffffffffa049dc51>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
      12:47:07: [<ffffffffa049dc51>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
      12:47:07: [<ffffffffa05b835d>] ? class_fail_export+0x23d/0x540 [obdclass]
      12:47:07: [<ffffffffa08234b5>] ? ping_evictor_main+0x245/0x650 [ptlrpc]
      12:47:07: [<ffffffff81064bc0>] ? default_wake_function+0x0/0x20
      12:47:07: [<ffffffffa0823270>] ? ping_evictor_main+0x0/0x650 [ptlrpc]
      12:47:07: [<ffffffff8109e71e>] ? kthread+0x9e/0xc0
      12:47:07: [<ffffffff8100c20a>] ? child_rip+0xa/0x20
      12:47:07: [<ffffffff8109e680>] ? kthread+0x0/0xc0
      12:47:07: [<ffffffff8100c200>] ? child_rip+0x0/0x20
      

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: