Details
-
Bug
-
Resolution: Fixed
-
Critical
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
I noticed in a recent test that instead of getting a stack trace printed to the console, all that is shown on the MDS console in the log is:
https://testing.whamcloud.com/test_sets/92361cb8-6cb4-4045-a0df-c9efc31520ea
[ 508.376802] Call Trace TBD: [ 508.378011] Pid: 31468, comm: mdt00_033 4.18.0-240.1.1.el8_lustre.x86_64 #1 SMP Fri Feb 19 20:34:57 UTC 2021 [ 508.379472] Call Trace TBD: [ 508.379899] Pid: 31454, comm: mdt00_019 4.18.0-240.1.1.el8_lustre.x86_64 #1 SMP Fri Feb 19 20:34:57 UTC 2021 [ 508.381521] Call Trace TBD:
which is not very useful. That message comes from patch https://review.whamcloud.com/35239 "LU-12400 libcfs: save_stack_trace_tsk if ARCH_STACKWALK". That patch was ostensibly to fix an issue with 5.4 kernels, but the MDS was running RHEL8.3 (4.18.0-240.1.1.el8_lustre.x86_64).
At a minimum, in libcfs_call_trace() if tsk == current this should fall back to doing something useful:
spin_lock(&st_lock); pr_info("Pid: %d, comm: %.20s %s %s\n", tsk->pid, tsk->comm, init_utsname()->release, init_utsname()->version); if (task_dump_stack) { pr_info("Call Trace:\n"); nr_entries = task_dump_stack(tsk, entries, MAX_ST_ENTRIES, 0); for (i = 0; i < nr_entries; i++) pr_info("[<0>] %pB\n", (void *)entries[i]); } else if (tsk == current) { dump_stack(); } else { pr_info("can't show stack: kernel doesn't export save_stack_trace_tsk\n"); } spin_unlock(&st_lock);
so that the stack is printed in the common case.