Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2703

racer: BUG: soft lockup - CPU#0 stuck for 67s! [dd:1404]

Details

    • 3
    • 6296

    Description

      While running racer test, the following issue occurred on one of the two clients:

      00:11:31:Lustre: DEBUG MARKER: == racer test 1: racer on clients: client-28vm1,client-28vm2.lab.whamcloud.com DURATION=900 == 00:11:30 (1359447090)
      00:11:32:Lustre: DEBUG MARKER: DURATION=900 /usr/lib64/lustre/tests/racer/racer.sh /mnt/lustre/racer 
      00:24:12:BUG: soft lockup - CPU#0 stuck for 67s! [dd:1404]
      00:24:12:Modules linked in: mgc(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) nfs fscache nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 sunrpc ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
      00:24:12:CPU 0 
      00:24:12:Modules linked in: mgc(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) nfs fscache nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 sunrpc ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
      00:24:12:
      00:24:12:Pid: 1404, comm: dd Not tainted 2.6.32-279.19.1.el6.x86_64 #1 Red Hat KVM
      00:24:12:RIP: 0010:[<ffffffff814ec53e>]  [<ffffffff814ec53e>] _spin_lock+0x1e/0x30
      00:24:12:RSP: 0018:ffff880030ce16a8  EFLAGS: 00000206
      00:24:13:RAX: 0000000000000001 RBX: ffff880030ce16a8 RCX: ffff8800783c6ba0
      00:24:13:RDX: 0000000000000000 RSI: ffff88004ecec7c0 RDI: ffff88007b510a1c
      00:24:13:RBP: ffffffff8100bb8e R08: 0000000000000102 R09: 0000000000000000
      00:24:14:R10: 0000000003b5e000 R11: 000000000000000e R12: ffffffffa058315f
      00:24:14:R13: ffff880030ce1638 R14: 0000000003b5e000 R15: 0000000003b5efff
      00:24:14:FS:  00007f288786d700(0000) GS:ffff880002200000(0000) knlGS:0000000000000000
      00:24:14:CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      00:24:14:CR2: 00000036526cd710 CR3: 000000003d0ee000 CR4: 00000000000006f0
      00:24:14:DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      00:24:14:DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      00:24:14:Process dd (pid: 1404, threadinfo ffff880030ce0000, task ffff88007d082aa0)
      00:24:14:Stack:
      00:24:14: ffff880030ce17f8 ffffffffa0275889 00000000002537dc ffffea000115a9f0
      00:24:14:<d> ffff88003e877950 0000000000000000 0000000003b5e000 0000000003b5efff
      00:24:14:<d> ffff880030ce1998 ffffffffa079bddd f869fda2cf0897a1 0000000000000000
      00:24:14:Call Trace:
      00:24:15: [<ffffffffa0275889>] ? osc_queue_async_io+0x399/0x1140 [osc]
      00:24:15: [<ffffffffa079bddd>] ? ll_prepare_write+0x50d/0x1230 [lustre]
      00:24:15: [<ffffffffa072adce>] ? lov_stripe_offset+0x28e/0x340 [lov]
      00:24:15: [<ffffffffa072a8db>] ? lov_tgt_seq_show+0x26b/0x300 [lov]
      00:24:16: [<ffffffffa070d0a9>] ? lov_queue_async_io+0x149/0x4a0 [lov]
      00:24:16: [<ffffffffa0795780>] ? queue_or_sync_write+0x160/0xda0 [lustre]
      00:24:16: [<ffffffffa07a2c2b>] ? ll_stats_ops_tally+0x6b/0xd0 [lustre]
      00:24:16: [<ffffffffa079cde5>] ? ll_commit_write+0x2e5/0x750 [lustre]
      00:24:16: [<ffffffffa07b4333>] ? ll_write_begin+0x83/0x210 [lustre]
      00:24:16: [<ffffffffa07b4280>] ? ll_write_end+0x30/0x60 [lustre]
      00:24:16: [<ffffffff811107fa>] ? generic_file_buffered_write+0x18a/0x2e0
      00:24:16: [<ffffffff81070f97>] ? current_fs_time+0x27/0x30
      00:24:16: [<ffffffff81112130>] ? __generic_file_aio_write+0x250/0x480
      00:24:16: [<ffffffffa0765dba>] ? ll_file_get_tree_lock_iov+0x14a/0x810 [lustre]
      00:24:16: [<ffffffff811123cf>] ? generic_file_aio_write+0x6f/0xe0
      00:24:16: [<ffffffffa0772449>] ? ll_file_aio_write+0xa19/0x1c60 [lustre]
      00:24:16: [<ffffffffa0773760>] ? ll_file_write+0xd0/0xf0 [lustre]
      00:24:16: [<ffffffff8105a5c3>] ? perf_event_task_sched_out+0x33/0x80
      00:24:16: [<ffffffff81090990>] ? autoremove_wake_function+0x0/0x40
      00:24:16: [<ffffffff8120ca26>] ? security_file_permission+0x16/0x20
      00:24:16: [<ffffffff8117646d>] ? rw_verify_area+0x5d/0xc0
      00:24:16: [<ffffffff81176588>] ? vfs_write+0xb8/0x1a0
      00:24:16: [<ffffffff81176e81>] ? sys_write+0x51/0x90
      00:24:16: [<ffffffff810d3a75>] ? __audit_syscall_exit+0x265/0x290
      00:24:16: [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
      00:24:16:Code: 00 00 00 01 74 05 e8 72 8c d8 ff c9 c3 55 48 89 e5 0f 1f 44 00 00 b8 00 00 01 00 3e 0f c1 07 0f b7 d0 c1 e8 10 39 c2 74 0e f3 90 <0f> 1f 44 00 00 83 3f 00 75 f4 eb df c9 c3 0f 1f 40 00 55 48 89
      

      Maloo report: https://maloo.whamcloud.com/test_sets/83fa5d12-6a23-11e2-85d4-52540035b04c

      Attachments

        Issue Links

          Activity

            [LU-2703] racer: BUG: soft lockup - CPU#0 stuck for 67s! [dd:1404]

            I am not sure but seems to me that some of the test failures may come from memory-leak detected/indicated at obdclass module unload. In fact it is the memory accounting which seems to be wrong and found more memory freed ("leaked: 18446744073709551612", ie 0xFFFFFFFFFFFFFFFC).This problem may have been induced by last patch #5 where the new macro PPGA_SIZE() plays with pointer size to determine what unit-size to alloc/free in osc_build_ppga()/osc_release_ppga(). But this can fail (actually on archs, like i686 where pointer size is < sizeof(struct lustre_handle)=64-bits) if not all allocations use osc_build_ppga() like in osc_build_req().

            bfaccini Bruno Faccini (Inactive) added a comment - I am not sure but seems to me that some of the test failures may come from memory-leak detected/indicated at obdclass module unload. In fact it is the memory accounting which seems to be wrong and found more memory freed ("leaked: 18446744073709551612", ie 0xFFFFFFFFFFFFFFFC).This problem may have been induced by last patch #5 where the new macro PPGA_SIZE() plays with pointer size to determine what unit-size to alloc/free in osc_build_ppga()/osc_release_ppga(). But this can fail (actually on archs, like i686 where pointer size is < sizeof(struct lustre_handle)=64-bits) if not all allocations use osc_build_ppga() like in osc_build_req().
            hongchao.zhang Hongchao Zhang added a comment - the patch is tracked at http://review.whamcloud.com/#change,5285

            the obd_cancel call in ll_ap_completion is too big to be included in a spin_lock for it is easy to cause such issue,
            we could revert bugzilla 21252&16774 or move the obd_cancel out of *_ap_completion, will create a patch for the second options soon.

            hongchao.zhang Hongchao Zhang added a comment - the obd_cancel call in ll_ap_completion is too big to be included in a spin_lock for it is easy to cause such issue, we could revert bugzilla 21252&16774 or move the obd_cancel out of *_ap_completion, will create a patch for the second options soon.

            This issue was actually added by this commit 29309746c4049aa7da6cde4cb9a44ec0df2b1af3 (bugzilla 21252) which was a fix for bugzilla 16774. I think nobody uses this feature on 1.8 (which is disabled by default anyway). Maybe we could just revert bugzilla 21252 as well as some part of bugzilla 16774 to restore the default readahead locking behavior?

            johann Johann Lombardi (Inactive) added a comment - This issue was actually added by this commit 29309746c4049aa7da6cde4cb9a44ec0df2b1af3 (bugzilla 21252) which was a fix for bugzilla 16774. I think nobody uses this feature on 1.8 (which is disabled by default anyway). Maybe we could just revert bugzilla 21252 as well as some part of bugzilla 16774 to restore the default readahead locking behavior?
            bobijam Zhenyu Xu added a comment - - edited

            patch tracking at http://review.whamcloud.com/5272

            commit message
            LU-2703 osc: ptlrpcd scheduled with a spinlock
            
            When memory is stringent, there is a change that ptlrpcd calling path
            reschedules waiting for memory available while holding a spinlock,
            the backtrace is as follows:
            
            PID: 1808 TASK: ffff880037f0f540 CPU: 0 COMMAND: "ptlrpcd"
            0 [ffff88007c881620] schedule at ffffffff814e9c02
            1 [ffff88007c8816e8] __cond_resched at ffffffff8106118a
            2 [ffff88007c881708] _cond_resched at ffffffff814ea610
            3 [ffff88007c881718] __kmalloc at ffffffff8115e8b0
            4 [ffff88007c881768] cfs_alloc at ffffffffa039e701 [libcfs]
            5 [ffff88007c881798] ldlm_bl_to_thread at ffffffffa051b2e9 [ptlrpc]
            6 [ffff88007c8818c8] ldlm_bl_to_thread_lock at ffffffffa051b8eb
            							   [ptlrpc]
            7 [ffff88007c881918] ldlm_lock_decref_internal at ffffffffa04f7f9d
            [ptlrpc]
            8 [ffff88007c881998] ldlm_lock_decref at ffffffffa04f8fd9 [ptlrpc]
            9 [ffff88007c8819e8] osc_cancel at ffffffffa026962e [osc]
            10 [ffff88007c881a38] lov_cancel at ffffffffa0684e0b [lov]
            11 [ffff88007c881b58] ll_ap_completion at ffffffffa0701669 [lustre]
            12 [ffff88007c881c18] lov_ap_completion at ffffffffa0677b68 [lov]
            13 [ffff88007c881c48] osc_ap_completion at ffffffffa02683b3 [osc]
            ====> here ptlrpcd has holden the cl_loi_list_lock
            14 [ffff88007c881cb8] brw_interpret at ffffffffa0279429 [osc]
            15 [ffff88007c881d58] ptlrpc_check_set at ffffffffa0530a9a [ptlrpc]
            16 [ffff88007c881e38] ptlrpcd_check at ffffffffa05677ad [ptlrpc]
            17 [ffff88007c881e98] ptlrpcd at ffffffffa0567a50 [ptlrpc]
            18 [ffff88007c881f48] kernel_thread at ffffffff8100c0ca
            
            This patch unlocks the spinlock before calling osc_ap_completion().
            
            bobijam Zhenyu Xu added a comment - - edited patch tracking at http://review.whamcloud.com/5272 commit message LU-2703 osc: ptlrpcd scheduled with a spinlock When memory is stringent, there is a change that ptlrpcd calling path reschedules waiting for memory available while holding a spinlock, the backtrace is as follows: PID: 1808 TASK: ffff880037f0f540 CPU: 0 COMMAND: "ptlrpcd" 0 [ffff88007c881620] schedule at ffffffff814e9c02 1 [ffff88007c8816e8] __cond_resched at ffffffff8106118a 2 [ffff88007c881708] _cond_resched at ffffffff814ea610 3 [ffff88007c881718] __kmalloc at ffffffff8115e8b0 4 [ffff88007c881768] cfs_alloc at ffffffffa039e701 [libcfs] 5 [ffff88007c881798] ldlm_bl_to_thread at ffffffffa051b2e9 [ptlrpc] 6 [ffff88007c8818c8] ldlm_bl_to_thread_lock at ffffffffa051b8eb [ptlrpc] 7 [ffff88007c881918] ldlm_lock_decref_internal at ffffffffa04f7f9d [ptlrpc] 8 [ffff88007c881998] ldlm_lock_decref at ffffffffa04f8fd9 [ptlrpc] 9 [ffff88007c8819e8] osc_cancel at ffffffffa026962e [osc] 10 [ffff88007c881a38] lov_cancel at ffffffffa0684e0b [lov] 11 [ffff88007c881b58] ll_ap_completion at ffffffffa0701669 [lustre] 12 [ffff88007c881c18] lov_ap_completion at ffffffffa0677b68 [lov] 13 [ffff88007c881c48] osc_ap_completion at ffffffffa02683b3 [osc] ====> here ptlrpcd has holden the cl_loi_list_lock 14 [ffff88007c881cb8] brw_interpret at ffffffffa0279429 [osc] 15 [ffff88007c881d58] ptlrpc_check_set at ffffffffa0530a9a [ptlrpc] 16 [ffff88007c881e38] ptlrpcd_check at ffffffffa05677ad [ptlrpc] 17 [ffff88007c881e98] ptlrpcd at ffffffffa0567a50 [ptlrpc] 18 [ffff88007c881f48] kernel_thread at ffffffff8100c0ca This patch unlocks the spinlock before calling osc_ap_completion().
            bobijam Zhenyu Xu added a comment -

            this issue is a little related to LU-2468, they are all both involved in trying to allocate memory during memory freeing phase.

            bobijam Zhenyu Xu added a comment - this issue is a little related to LU-2468 , they are all both involved in trying to allocate memory during memory freeing phase.

            I think master has a patch for this already (it prevents kmalloc on this path).

            johann Johann Lombardi (Inactive) added a comment - I think master has a patch for this already (it prevents kmalloc on this path).
            bobijam Zhenyu Xu added a comment -

            I think I've found who holds the cl_loi_list_lock and sleeps away.

            crash> ps | grep ptlrpcd
            1808 2 0 ffff880037f0f540 RU 0.0 0 0 [ptlrpcd]
            1809 2 0 ffff88007d3af500 RU 0.0 0 0 [ptlrpcd-recov]
            crash> bt 1808
            PID: 1808 TASK: ffff880037f0f540 CPU: 0 COMMAND: "ptlrpcd"
            #0 [ffff88007c881620] schedule at ffffffff814e9c02
            #1 [ffff88007c8816e8] __cond_resched at ffffffff8106118a
            #2 [ffff88007c881708] _cond_resched at ffffffff814ea610
            #3 [ffff88007c881718] __kmalloc at ffffffff8115e8b0
            #4 [ffff88007c881768] cfs_alloc at ffffffffa039e701 [libcfs]
            #5 [ffff88007c881798] ldlm_bl_to_thread at ffffffffa051b2e9 [ptlrpc]
            #6 [ffff88007c8818c8] ldlm_bl_to_thread_lock at ffffffffa051b8eb [ptlrpc]
            #7 [ffff88007c881918] ldlm_lock_decref_internal at ffffffffa04f7f9d [ptlrpc]
            #8 [ffff88007c881998] ldlm_lock_decref at ffffffffa04f8fd9 [ptlrpc]
            #9 [ffff88007c8819e8] osc_cancel at ffffffffa026962e [osc]
            #10 [ffff88007c881a38] lov_cancel at ffffffffa0684e0b [lov]
            #11 [ffff88007c881b58] ll_ap_completion at ffffffffa0701669 [lustre]
            #12 [ffff88007c881c18] lov_ap_completion at ffffffffa0677b68 [lov]
            #13 [ffff88007c881c48] osc_ap_completion at ffffffffa02683b3 [osc] ====> here ptlrpcd has holden the cl_loi_list_lock
            #14 [ffff88007c881cb8] brw_interpret at ffffffffa0279429 [osc]
            #15 [ffff88007c881d58] ptlrpc_check_set at ffffffffa0530a9a [ptlrpc]
            #16 [ffff88007c881e38] ptlrpcd_check at ffffffffa05677ad [ptlrpc]
            #17 [ffff88007c881e98] ptlrpcd at ffffffffa0567a50 [ptlrpc]
            #18 [ffff88007c881f48] kernel_thread at ffffffff8100c0ca

            Will push a patch for review.

            bobijam Zhenyu Xu added a comment - I think I've found who holds the cl_loi_list_lock and sleeps away. crash> ps | grep ptlrpcd 1808 2 0 ffff880037f0f540 RU 0.0 0 0 [ptlrpcd] 1809 2 0 ffff88007d3af500 RU 0.0 0 0 [ptlrpcd-recov] crash> bt 1808 PID: 1808 TASK: ffff880037f0f540 CPU: 0 COMMAND: "ptlrpcd" #0 [ffff88007c881620] schedule at ffffffff814e9c02 #1 [ffff88007c8816e8] __cond_resched at ffffffff8106118a #2 [ffff88007c881708] _cond_resched at ffffffff814ea610 #3 [ffff88007c881718] __kmalloc at ffffffff8115e8b0 #4 [ffff88007c881768] cfs_alloc at ffffffffa039e701 [libcfs] #5 [ffff88007c881798] ldlm_bl_to_thread at ffffffffa051b2e9 [ptlrpc] #6 [ffff88007c8818c8] ldlm_bl_to_thread_lock at ffffffffa051b8eb [ptlrpc] #7 [ffff88007c881918] ldlm_lock_decref_internal at ffffffffa04f7f9d [ptlrpc] #8 [ffff88007c881998] ldlm_lock_decref at ffffffffa04f8fd9 [ptlrpc] #9 [ffff88007c8819e8] osc_cancel at ffffffffa026962e [osc] #10 [ffff88007c881a38] lov_cancel at ffffffffa0684e0b [lov] #11 [ffff88007c881b58] ll_ap_completion at ffffffffa0701669 [lustre] #12 [ffff88007c881c18] lov_ap_completion at ffffffffa0677b68 [lov] #13 [ffff88007c881c48] osc_ap_completion at ffffffffa02683b3 [osc] ====> here ptlrpcd has holden the cl_loi_list_lock #14 [ffff88007c881cb8] brw_interpret at ffffffffa0279429 [osc] #15 [ffff88007c881d58] ptlrpc_check_set at ffffffffa0530a9a [ptlrpc] #16 [ffff88007c881e38] ptlrpcd_check at ffffffffa05677ad [ptlrpc] #17 [ffff88007c881e98] ptlrpcd at ffffffffa0567a50 [ptlrpc] #18 [ffff88007c881f48] kernel_thread at ffffffff8100c0ca Will push a patch for review.

            Sorry no hard data from my local debug setup yet, I wasted some time with a 1.8 build environment when I should have just grabbed the 1.8.9 rpms from the build. In general the plan for the Master Client is to ensure that when a spinlock is acquired the place and task is captured so that when everyone is waiting it lock can be examined to see who and where it was touched last. With any luck it is not a small window where a debug spinlock will not cause the issue to be skipped.

            With the current state of master in such change is there a version of master that is known to work?

            keith Keith Mannthey (Inactive) added a comment - Sorry no hard data from my local debug setup yet, I wasted some time with a 1.8 build environment when I should have just grabbed the 1.8.9 rpms from the build. In general the plan for the Master Client is to ensure that when a spinlock is acquired the place and task is captured so that when everyone is waiting it lock can be examined to see who and where it was touched last. With any luck it is not a small window where a debug spinlock will not cause the issue to be skipped. With the current state of master in such change is there a version of master that is known to work?

            I had started on setting up a local setup to help get the crash but I see we have that first step. I am now working to debug the spinlock. It will be good to see who touched the lock last.

            keith Keith Mannthey (Inactive) added a comment - I had started on setting up a local setup to help get the crash but I see we have that first step. I am now working to debug the spinlock. It will be good to see who touched the lock last.

            People

              bobijam Zhenyu Xu
              yujian Jian Yu
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: