[LU-16625] improved Lustre thread debugging Created: 08/Mar/23 Updated: 23/Jan/24 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.16.0 |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Minor |
| Reporter: | Andreas Dilger | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||
| Description |
|
I was thinking about how we might improve the debugging of Lustre threads that are busy (e.g. threads stuck in ldlm_cli_enqueue_local() or ldlm_completion_ast() or possibly on a mutex/spinlock). One thing that would help, especially for post-facto debugging where we only have watchdog stack traces dumped to dmesg/messages, would be to print the FID/resource of locks that the thread is holding and/or blocked on as part of the watchdog stack trace. Since doing this in a generic way would be difficult, it would be possible to either create a thread-local data structure (similar to, or part of, "env") that contained "well-known" slots for e.g. parent/child FIDs, locked LDLM resources, next LDLM resource to lock. Possibly these would be kept in ASCII format so that the whole chunk could just be printed as-is without much interpretation (maybe walking through NUL-terminated 32-char slots and only printing those that are used). Potentially this could be looked up by PID during watchdog stack dump, but would typically only be accessed by the local thread. The main benefit here would be that instead of just seeing the stack traces being dumped, we could also see which resources the thread is (or was) holding, and this would greatly simplify the ability to analyze stuck thread issues after the fact. |
| Comments |
| Comment by Andreas Dilger [ 08/Mar/23 ] |
|
I'd welcome input on this idea, whether you think it is practical to implement, or if there is something better we could do? Having a full crash dump available can be very helpful if it is captured in a timely manner, or having a mechanism in sysfs (like LU-14858) to dump the lock state at the time of the problem is useful, but often the issue is only caught afterward, or the customer doesn't want to crashdump the machine and/or the size of the crashdump makes it impractical. |
| Comment by Patrick Farrell [ 08/Mar/23 ] |
|
I'm not sure about how it would be implemented - not quite obvious to me what you're getting at - but if we do this, we should definitely dump all the locks on the resource we're trying to lock. |
| Comment by Andreas Dilger [ 22/Jan/24 ] |
|
timday, stancheff I recall a discussion that mentioned it is possible to extend the dump_stack() functionality to include more information, and this was already being done in some device driver. Unfortunately, I can't find that here or LU-16375 that is also discussing a similar issue. |
| Comment by Tim Day [ 22/Jan/24 ] |
|
The discussion was on LU-17242.
|
| Comment by Andreas Dilger [ 23/Jan/24 ] |
|
It looks like there is some infrastructure to handle this already: static int ipa_smp2p_panic_notifier_register(struct ipa_smp2p *smp2p) { /* IPA panic handler needs to run before modem shuts down */ smp2p->panic_notifier.notifier_call = ipa_smp2p_panic_notifier; smp2p->panic_notifier.priority = INT_MAX; /* Do it early */ return atomic_notifier_chain_register(&panic_notifier_list, &smp2p->panic_notifier); } but this looks like it is only for a panic, not necessarily a stack trace... |