Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.3.0
    • None
    • None
    • lustre 2.1.0-21chaos (github.com/chaos/lustre)
    • 2
    • 4614

    Description

      While investigating LU-1087, the ll_mgs_* threads suddenly went nuts and shot the load through the roof, to the point where the node is almost completely unresponsive, and a "top" that I had running is only able to redraw ever minute or so.

      The console is mostly unresponsive, but it did respond to a sysreq-l, so I can see that they are all in a backtrace similar to this:

      Call Trace:
       [<ffffffffa06da060>] lock_res_and_lock+0x30/0x40 [ptlrpc]
       [<ffffffffa06deca3>] ldlm_lock_enqueue+0x453/0x7e0 [ptlrpc]
       [<ffffffffa06fd206>] ldlm_handle_enqueue0+0x406/0xd70 [ptlrpc]
       [<ffffffffa06fdbd6>] ldlm_handle_enqueue+0x66/0x70 [ptlrpc]
       [<ffffffffa06fdbe0>] ? ldlm_server_completion_ast+0x0/0x590 [ptlrpc]
       [<ffffffffa06fe170>] ? ldlm_server_blocking_ast+0x0/0x740 [ptlrpc]
       [<ffffffffa0b55245>] mgs_handle+0x545/0x1350 [mgs]
       [<ffffffffa04933f1>] ? libcfs_debug_vmsg1+0x41/0x50 [libcfs]
       [<ffffffffa04933f1>] ? libcfs_debug_vmsg1+0x41/0x50 [libcfs]
       [<ffffffffa0723181>] ptlrpc_main+0xcd1/0x1690 [ptlrpc]
       [<ffffffffa07224b0>] ? ptlrpc_main+0x0/0x1690 [ptlrpc]
       [<ffffffff8100c14a>] child_rip+0xa/0x20
       [<ffffffffa07224b0>] ? ptlrpc_main+0x0/0x1690 [ptlrpc]
       [<ffffffffa07224b0>] ? ptlrpc_main+0x0/0x1690 [ptlrpc]
       [<ffffffff8100c140>] ? child_rip+0x0/0x20
      

      Attachments

        Activity

          People

            laisiyao Lai Siyao
            morrone Christopher Morrone (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: