Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-18376

Lustre Server Console logs contains LustreError messages. lprocfs_jobstats.c:137:job_stat_exit(), When server brought down or failover while jobs are running.

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 2.15.5
    • None
    • Running lustre server with Lustre 2.15.5.
    • 3
    • 9223372036854775807

    Description

      if a lustre server MDT/OST is failover or brought down while client jobs are running the following lustreError appears in the console logs:

      ==================================================================
      2024-10-14T17:01:44.574579-07:00 tuolumne265 kernel: [344923.097446] LustreError: 1140374:0:(lprocfs_jobstats.c:137:job_stat_exit()) should not have any items
      2024-10-14T17:01:44.574585-07:00 tuolumne265 kernel: [344923.104622] LustreError: 1140374:0:(lprocfs_jobstats.c:137:job_stat_exit()) Skipped 51 previous similar messages
      ==================================================================

      I went through the code and that function is called during cleanup as one of the exit function.
      that function doesn't do anything else than throwing that error. When servers are not brought down or failover very often it doesn't matter.

      The issue is with some of our nodes creates lustre file system on the fly for a job and brought down not always at the end of that job. The cycle is repeated several time per day on these nodes.
      These errors are distracting and not very useful, perhaps remove it or issue a warning.

      Attachments

        Issue Links

          Activity

            People

              adilger Andreas Dilger
              carbonneau Eric Carbonneau
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: