[LU-16625] improved Lustre thread debugging Created: 08/Mar/23  Updated: 23/Jan/24

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.16.0
Fix Version/s: None

Type: Improvement Priority: Minor
Reporter: Andreas Dilger Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Duplicate
duplicates LU-16375 dump more information for threads blo... Open
Related
is related to LU-17242 Clean up and Improve Lustre Debugging Open
Rank (Obsolete): 9223372036854775807

 Description   

I was thinking about how we might improve the debugging of Lustre threads that are busy (e.g. threads stuck in ldlm_cli_enqueue_local() or ldlm_completion_ast() or possibly on a mutex/spinlock).

One thing that would help, especially for post-facto debugging where we only have watchdog stack traces dumped to dmesg/messages, would be to print the FID/resource of locks that the thread is holding and/or blocked on as part of the watchdog stack trace. Since doing this in a generic way would be difficult, it would be possible to either create a thread-local data structure (similar to, or part of, "env") that contained "well-known" slots for e.g. parent/child FIDs, locked LDLM resources, next LDLM resource to lock. Possibly these would be kept in ASCII format so that the whole chunk could just be printed as-is without much interpretation (maybe walking through NUL-terminated 32-char slots and only printing those that are used).

Potentially this could be looked up by PID during watchdog stack dump, but would typically only be accessed by the local thread.

The main benefit here would be that instead of just seeing the stack traces being dumped, we could also see which resources the thread is (or was) holding, and this would greatly simplify the ability to analyze stuck thread issues after the fact.



 Comments   
Comment by Andreas Dilger [ 08/Mar/23 ]

I'd welcome input on this idea, whether you think it is practical to implement, or if there is something better we could do?

Having a full crash dump available can be very helpful if it is captured in a timely manner, or having a mechanism in sysfs (like LU-14858) to dump the lock state at the time of the problem is useful, but often the issue is only caught afterward, or the customer doesn't want to crashdump the machine and/or the size of the crashdump makes it impractical.

Comment by Patrick Farrell [ 08/Mar/23 ]

I'm not sure about how it would be implemented - not quite obvious to me what you're getting at - but if we do this, we should definitely dump all the locks on the resource we're trying to lock.

Comment by Andreas Dilger [ 22/Jan/24 ]

timday, stancheff I recall a discussion that mentioned it is possible to extend the dump_stack() functionality to include more information, and this was already being done in some device driver. Unfortunately, I can't find that here or LU-16375 that is also discussing a similar issue.

Comment by Tim Day [ 22/Jan/24 ]

The discussion was on LU-17242.

Seems useful. I think we could register a custom panic handler. I see upstream drivers (like drivers/net/ipa/ipa_smp2p.c) doing something like that. We could avoid extending custom Lustre debugging and it should work on every panic. Adding current->journal_info to the handler would be easy. Getting the Lustre specific info might be tougher, but I saw some ideas upstream we could probably copy. The ipa just embedded the notifier_block in a larger struct and used container_of to get everything else.

Comment by Andreas Dilger [ 23/Jan/24 ]

It  looks like there is some infrastructure to handle this already:

static int ipa_smp2p_panic_notifier_register(struct ipa_smp2p *smp2p)
{
        /* IPA panic handler needs to run before modem shuts down */
        smp2p->panic_notifier.notifier_call = ipa_smp2p_panic_notifier;
        smp2p->panic_notifier.priority = INT_MAX;       /* Do it early */

        return atomic_notifier_chain_register(&panic_notifier_list,
                                              &smp2p->panic_notifier);
}

but this looks like it is only for a panic, not necessarily a stack trace...

Generated at Sat Feb 10 03:28:37 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.