Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11373

CERROR/CWARN messages are not throttled

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.12.0
    • Lustre 2.12.0
    • None
    • 9223372036854775807

    Description

      It appears that CDEBUG_LIMIT is not working properly, since test logs in Maloo are full of repeated error messages. This would be a nightmare for a large cluster if there are many clients and servers spewing repeated messages.

      For example, sanity test_60a on the MDS console:

      [ 2786.154265] Lustre: 30107:0:(llog_cat.c:98:llog_cat_new_log()) MGS: there are no more free slots in catalog e973af35
      [ 2786.155427] Lustre: 30107:0:(llog_cat.c:98:llog_cat_new_log()) MGS: there are no more free slots in catalog e973af35
      [ 2786.156482] Lustre: 30107:0:(llog_cat.c:98:llog_cat_new_log()) MGS: there are no more free slots in catalog e973af35
      [ 2786.157628] Lustre: 30107:0:(llog_cat.c:98:llog_cat_new_log()) MGS: there are no more free slots in catalog e973af35
      [ 2786.158671] Lustre: 30107:0:(llog_cat.c:98:llog_cat_new_log()) MGS: there are no more free slots in catalog e973af35
      [ 2786.159789] Lustre: 30107:0:(llog_cat.c:98:llog_cat_new_log()) MGS: there are no more free slots in catalog e973af35
      [ 2786.160824] Lustre: 30107:0:(llog_cat.c:98:llog_cat_new_log()) MGS: there are no more free slots in catalog e973af35
      [ 2786.161934] Lustre: 30107:0:(llog_cat.c:98:llog_cat_new_log()) MGS: there are no more free slots in catalog e973af35
      [ 2786.162977] Lustre: 30107:0:(llog_cat.c:98:llog_cat_new_log()) MGS: there are no more free slots in catalog e973af35
      [ 2786.164074] Lustre: 30107:0:(llog_cat.c:98:llog_cat_new_log()) MGS: there are no more free slots in catalog e973af35
      [ 2786.165111] Lustre: 30107:0:(llog_cat.c:98:llog_cat_new_log()) MGS: there are no more free slots in catalog e973af35
      [repeats hundreds of times]
      

      It might relate to ktime_t patches that James landed previously, but that is just speculation as I haven't investigated it yet.

      The sanity test_60b should be catching the failure of CDEBUG_LIMIT() but it is checking the logs on the client, while the test is being run on the MGS.

      Attachments

        Issue Links

          Activity

            [LU-11373] CERROR/CWARN messages are not throttled
            pjones Peter Jones made changes -
            Fix Version/s New: Lustre 2.12.0 [ 13495 ]
            Resolution New: Fixed [ 1 ]
            Status Original: Reopened [ 4 ] New: Resolved [ 5 ]
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-11566 [ LU-11566 ]
            pjones Peter Jones made changes -
            Fix Version/s Original: Lustre 2.12.0 [ 13495 ]
            jamesanunez James Nunez (Inactive) made changes -
            Resolution Original: Fixed [ 1 ]
            Status Original: Resolved [ 5 ] New: Reopened [ 4 ]
            pjones Peter Jones made changes -
            Resolution New: Fixed [ 1 ]
            Status Original: Open [ 1 ] New: Resolved [ 5 ]
            adilger Andreas Dilger made changes -
            Issue Type Original: Improvement [ 4 ] New: Bug [ 1 ]
            adilger Andreas Dilger made changes -
            Link New: This issue is duplicated by LU-11384 [ LU-11384 ]
            adilger Andreas Dilger made changes -
            Link New: This issue is duplicated by LU-11383 [ LU-11383 ]
            pjones Peter Jones made changes -
            Assignee Original: WC Triage [ wc-triage ] New: Andreas Dilger [ adilger ]
            adilger Andreas Dilger created issue -

            People

              adilger Andreas Dilger
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: