Details
Description
I was thinking about how we might improve the debugging of Lustre threads that are busy (e.g. threads stuck in ldlm_cli_enqueue_local() or ldlm_completion_ast() or possibly on a mutex/spinlock).
One thing that would help, especially for post-facto debugging where we only have watchdog stack traces dumped to dmesg/messages, would be to print the FID/resource of locks that the thread is holding and/or blocked on as part of the watchdog stack trace. Since doing this in a generic way would be difficult, it would be possible to either create a thread-local data structure (similar to, or part of, "env") that contained "well-known" slots for e.g. parent/child FIDs, locked LDLM resources, next LDLM resource to lock. Possibly these would be kept in ASCII format so that the whole chunk could just be printed as-is without much interpretation (maybe walking through NUL-terminated 32-char slots and only printing those that are used).
It would also be useful to print the JobID and XID of the RPC that is being processed.
Potentially this could be looked up by PID during watchdog stack dump, but would typically only be accessed by the local thread.
The main benefit here would be that instead of just seeing the stack traces being dumped, we could also see which resources the thread is (or was) holding, and this would greatly simplify the ability to analyze stuck thread issues after the fact.
I think we should define something like lustre_dump_info() and lustre_dump_task_info(struct task_struct *p) that would dump all of the info not currently output by dump_stack(). Then we could define macros like:
and:
This is kind of a partial revert of https://review.whamcloud.com/c/fs/lustre-release/+/53625 (i.e. keep the bug fix and reintroduce the wrapper function/macro). These dump info functions could also be used in the panic handler (I think the libcfs panic is essentially a no-op right now). And libcfs_debug_dumpstack(task) can eventually be superseded by sched_show_task(task), which is EXPORT_SYMBOL_GPL() in newer (5.xx?) kernels. The upstream client could use sched_show_task(task) directly and drop a bunch of custom code.
For mds.MDS.mdt.threads.THREAD_NAME I think one file would be the simplest, IMHO.