Details
-
Bug
-
Resolution: Duplicate
-
Minor
-
None
-
Lustre 2.15.5
-
None
-
Running lustre server with Lustre 2.15.5.
-
3
-
9223372036854775807
Description
if a lustre server MDT/OST is failover or brought down while client jobs are running the following lustreError appears in the console logs:
==================================================================
2024-10-14T17:01:44.574579-07:00 tuolumne265 kernel: [344923.097446] LustreError: 1140374:0:(lprocfs_jobstats.c:137:job_stat_exit()) should not have any items
2024-10-14T17:01:44.574585-07:00 tuolumne265 kernel: [344923.104622] LustreError: 1140374:0:(lprocfs_jobstats.c:137:job_stat_exit()) Skipped 51 previous similar messages
==================================================================
I went through the code and that function is called during cleanup as one of the exit function.
that function doesn't do anything else than throwing that error. When servers are not brought down or failover very often it doesn't matter.
The issue is with some of our nodes creates lustre file system on the fly for a job and brought down not always at the end of that job. The cycle is repeated several time per day on these nodes.
These errors are distracting and not very useful, perhaps remove it or issue a warning.
Attachments
Issue Links
- duplicates
-
LU-16639 job_stat_exit() should not have any items
- Resolved