Details
-
Bug
-
Resolution: Fixed
-
Critical
-
None
-
None
-
lustre 2.1.0-21chaos (github.com/chaos/lustre)
-
2
-
4614
Description
While investigating LU-1087, the ll_mgs_* threads suddenly went nuts and shot the load through the roof, to the point where the node is almost completely unresponsive, and a "top" that I had running is only able to redraw ever minute or so.
The console is mostly unresponsive, but it did respond to a sysreq-l, so I can see that they are all in a backtrace similar to this:
Call Trace: [<ffffffffa06da060>] lock_res_and_lock+0x30/0x40 [ptlrpc] [<ffffffffa06deca3>] ldlm_lock_enqueue+0x453/0x7e0 [ptlrpc] [<ffffffffa06fd206>] ldlm_handle_enqueue0+0x406/0xd70 [ptlrpc] [<ffffffffa06fdbd6>] ldlm_handle_enqueue+0x66/0x70 [ptlrpc] [<ffffffffa06fdbe0>] ? ldlm_server_completion_ast+0x0/0x590 [ptlrpc] [<ffffffffa06fe170>] ? ldlm_server_blocking_ast+0x0/0x740 [ptlrpc] [<ffffffffa0b55245>] mgs_handle+0x545/0x1350 [mgs] [<ffffffffa04933f1>] ? libcfs_debug_vmsg1+0x41/0x50 [libcfs] [<ffffffffa04933f1>] ? libcfs_debug_vmsg1+0x41/0x50 [libcfs] [<ffffffffa0723181>] ptlrpc_main+0xcd1/0x1690 [ptlrpc] [<ffffffffa07224b0>] ? ptlrpc_main+0x0/0x1690 [ptlrpc] [<ffffffff8100c14a>] child_rip+0xa/0x20 [<ffffffffa07224b0>] ? ptlrpc_main+0x0/0x1690 [ptlrpc] [<ffffffffa07224b0>] ? ptlrpc_main+0x0/0x1690 [ptlrpc] [<ffffffff8100c140>] ? child_rip+0x0/0x20