Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17432

add "slow start" to some CWARN/CERROR messages

Details

    • Improvement
    • Resolution: Fixed
    • Minor
    • Lustre 2.17.0
    • Lustre 2.14.0, Lustre 2.16.0
    • 3
    • 9223372036854775807

    Description

      In some error cases, it is OK to have an occasional error (e.g. RPC timeout) that is handled transparently by RPC retry, but repeated errors on the local node or with the same peer indicates a more significant error.

      It would be useful to re-enable some CWARN/CERROR messages that were quieted because they were too noisy, but now we are losing insight into problems on nodes that have continuous errors. There should be a new variant of CERROR/CWARN that have a "skip first N messages" parameter and then start printing to the console as normal.

      Attachments

        Issue Links

          Activity

            [LU-17432] add "slow start" to some CWARN/CERROR messages
            pjones Peter Jones added a comment -

            Merged for 2.17

            pjones Peter Jones added a comment - Merged for 2.17

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/55439/
            Subject: LU-17432 libcfs: new CDEBUG_SLOW message type
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 18bf112e96ceb3f45505be86f3357369c9122992

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/55439/ Subject: LU-17432 libcfs: new CDEBUG_SLOW message type Project: fs/lustre-release Branch: master Current Patch Set: Commit: 18bf112e96ceb3f45505be86f3357369c9122992

            "Frederick Dilger <fdilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55439
            Subject: LU-17432 libcfs: new CDEBUG_SLOW_START message type
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: ea9694235262a767bd9dbbed4faf4d83bff9cde9

            gerrit Gerrit Updater added a comment - "Frederick Dilger <fdilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55439 Subject: LU-17432 libcfs: new CDEBUG_SLOW_START message type Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: ea9694235262a767bd9dbbed4faf4d83bff9cde9

            For example:

            Lustre: 313946:0:(client.c:2321:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply
            
            adilger Andreas Dilger added a comment - For example: Lustre: 313946:0:(client.c:2321:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply

            People

              fdilger Fred Dilger
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: