[LU-1088] mgs threads go nuts Created: 09/Feb/12  Updated: 04/Jun/12  Resolved: 04/Jun/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.3.0

Type: Bug Priority: Critical
Reporter: Christopher Morrone Assignee: Lai Siyao
Resolution: Fixed Votes: 0
Labels: None
Environment:

lustre 2.1.0-21chaos (github.com/chaos/lustre)


Attachments: Text File momus-mds1-backtraces.txt    
Severity: 2
Rank (Obsolete): 4614

 Description   

While investigating LU-1087, the ll_mgs_* threads suddenly went nuts and shot the load through the roof, to the point where the node is almost completely unresponsive, and a "top" that I had running is only able to redraw ever minute or so.

The console is mostly unresponsive, but it did respond to a sysreq-l, so I can see that they are all in a backtrace similar to this:

Call Trace:
 [<ffffffffa06da060>] lock_res_and_lock+0x30/0x40 [ptlrpc]
 [<ffffffffa06deca3>] ldlm_lock_enqueue+0x453/0x7e0 [ptlrpc]
 [<ffffffffa06fd206>] ldlm_handle_enqueue0+0x406/0xd70 [ptlrpc]
 [<ffffffffa06fdbd6>] ldlm_handle_enqueue+0x66/0x70 [ptlrpc]
 [<ffffffffa06fdbe0>] ? ldlm_server_completion_ast+0x0/0x590 [ptlrpc]
 [<ffffffffa06fe170>] ? ldlm_server_blocking_ast+0x0/0x740 [ptlrpc]
 [<ffffffffa0b55245>] mgs_handle+0x545/0x1350 [mgs]
 [<ffffffffa04933f1>] ? libcfs_debug_vmsg1+0x41/0x50 [libcfs]
 [<ffffffffa04933f1>] ? libcfs_debug_vmsg1+0x41/0x50 [libcfs]
 [<ffffffffa0723181>] ptlrpc_main+0xcd1/0x1690 [ptlrpc]
 [<ffffffffa07224b0>] ? ptlrpc_main+0x0/0x1690 [ptlrpc]
 [<ffffffff8100c14a>] child_rip+0xa/0x20
 [<ffffffffa07224b0>] ? ptlrpc_main+0x0/0x1690 [ptlrpc]
 [<ffffffffa07224b0>] ? ptlrpc_main+0x0/0x1690 [ptlrpc]
 [<ffffffff8100c140>] ? child_rip+0x0/0x20


 Comments   
Comment by Christopher Morrone [ 09/Feb/12 ]

We gave up on waiting for the node to recover on its own. We forced a crash dump and are rebooting now.

Comment by Christopher Morrone [ 09/Feb/12 ]

MDS went through recovery and seems happy for the moment.

Comment by Alex Zhuravlev [ 10/Feb/12 ]

once you meet this problem again, please grab all the traces and attach them here.

Comment by Christopher Morrone [ 10/Feb/12 ]

Alex, we have a crash dump. If you want backtraces from all tasks, we'll get you that. No need to wait for another instance.

Comment by Christopher Morrone [ 10/Feb/12 ]

Attach "foreach bt" from momus mds.

Comment by Peter Jones [ 10/Feb/12 ]

Added Alex as a watcher so he is aware of Chris's answer

Comment by Oleg Drokin [ 10/Feb/12 ]

From the stack trace:
PID: 21012 TASK: ffff880812dd7500 CPU: 8 COMMAND: "ll_mgs_22"
#0 [ffff88043f407e90] crash_nmi_callback at ffffffff81029746
#1 [ffff88043f407ea0] notifier_call_chain at ffffffff814f3eb5
#2 [ffff88043f407ee0] atomic_notifier_call_chain at ffffffff814f3f1a
#3 [ffff88043f407ef0] notify_die at ffffffff81096a6e
#4 [ffff88043f407f20] do_nmi at ffffffff814f1b33
#5 [ffff88043f407f50] nmi at ffffffff814f1440
[exception RIP: vsnprintf+779]
RIP: ffffffff81276c2b RSP: ffff880812da5810 RFLAGS: 00000206
RAX: ffff880812da5a40 RBX: ffff8805aa88a529 RCX: 0000000000000002
RDX: ffffffffa076a2fc RSI: ffff8805aa88b000 RDI: ffff8805aa88a529
RBP: ffff880812da58a0 R8: 0000000000000073 R9: 0000000000000000
R10: 0000000000000001 R11: 00000000000000dc R12: ffffffffa075b626
R13: ffffffffa075b624 R14: ffff880812da5980 R15: ffff8805aa88b000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
— <NMI exception stack> —
#6 [ffff880812da5810] vsnprintf at ffffffff81276c2b
#7 [ffff880812da5868] cfs_set_ptldebug_header at ffffffffa048919b [libcfs]
#8 [ffff880812da58a8] libcfs_debug_vmsg2 at ffffffffa0492b13 [libcfs]
#9 [ffff880812da5a08] libcfs_debug_vmsg1 at ffffffffa04933f1 [libcfs]
#10 [ffff880812da5a68] ldlm_lock_dump at ffffffffa06da97e [ptlrpc]
#11 [ffff880812da5aa8] ldlm_resource_dump at ffffffffa06e0ae3 [ptlrpc]
#12 [ffff880812da5b08] ldlm_granted_list_add_lock at ffffffffa06dd280 [ptlrpc]
#13 [ffff880812da5b48] ldlm_grant_lock at ffffffffa06dd83c [ptlrpc]
#14 [ffff880812da5ba8] ldlm_process_plain_lock at ffffffffa06edc1a [ptlrpc]
#15 [ffff880812da5c48] ldlm_lock_enqueue at ffffffffa06deb6d [ptlrpc]
#16 [ffff880812da5cb8] ldlm_handle_enqueue0 at ffffffffa06fd206 [ptlrpc]
#17 [ffff880812da5d28] ldlm_handle_enqueue at ffffffffa06fdbd6 [ptlrpc]
#18 [ffff880812da5d68] mgs_handle at ffffffffa0b55245 [mgs]
#19 [ffff880812da5df8] ptlrpc_main at ffffffffa0723181 [ptlrpc]
#20 [ffff880812da5f48] kernel_thread at ffffffff8100c14a

So I imagine you have a lot of clients (tens of thousands?), and once all of them somehow got disconnected, they all come rushing to reconnect back and get their config lock too (all on the same resource)..
Now in the middle of the locking we have this ldlm_resource_dump() call that would print all locks on the resource, which would take quite a while I can imagine, and everybody else is spinning on the resource spinlock meanwhile.

That ldlm_resource_dump is D_IFO which is probably bad idea and should be D_DLMTRACE too, what is the lustre debug level you are running at?
I wonder if we should just change ldlm_resource_dump() to only print the individual locks if the refcount is below some threshold, as otherwise it is just too much info I suspect.

Comment by Christopher Morrone [ 10/Feb/12 ]

Yes, I had recently added D_INFO to get a look at what the spinning mdt thread was doing for LU-1087.

We have a few thousand clients. Most of them should not be disconnected. There are on the order of a couple hundred that might reboot and reconnect at any time (BGP nodes).

I think that we definitely need ldlm_resource_dump changed. I certainly accept that performance is reduced when I enable higher logging levels, but I don't expect a denial-of-service attack.

Comment by Lai Siyao [ 02/Mar/12 ]

It looks okay to use RCU for resource lock dump, and compared to ldlm_lock_debug() ldlm_lock_dump() is inefficient, I'll replace it the former one.

Comment by Lai Siyao [ 02/Mar/12 ]

Review is on http://review.whamcloud.com/#change,2250.

Comment by Peter Jones [ 04/Jun/12 ]

Landed for 2.3

Generated at Sat Feb 10 01:13:23 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.