Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17853

Negative value reported in req_waittime mdt

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • Lustre 2.14.0
    • 3
    • 9223372036854775807

    Description

      ldlm.services.ldlm_canceld.stats=
      snapshot_time             1714662722.986857642 secs.nsecs
      req_waittime              101358239600 samples [usecs] -36 1855805 5530720965329 21563670935443407
      req_qdepth                101358239600 samples [reqs] 0 1164 1893537183 7828947261
      req_active                101358239600 samples [reqs] 1 23 152657095033 281200017837
      req_timeout               101358239600 samples [secs] 1 218 6892805470378 468801155450006
      reqbuf_avail              210996467581 samples [bufs] 0 155 13398749465845 850967459690601
      ldlm_cancel               101358239600 samples [usecs] 1 211241571 1436530054018 108103493490971286
      

      This doesn't appear to be a math error like LU-9683, but possibly caused by NTP.

      ptlrpc uses ktime_get_real() for timing events where other modules use ktime_get() for timing.

      ktime_get_real() "can jump backwards due to a leap second update, NTP adjustment settimeofday() operation from user space."

      Attachments

        Issue Links

          Activity

            [LU-17853] Negative value reported in req_waittime mdt
            pjones Peter Jones added a comment -

            Merged for 2.16

            pjones Peter Jones added a comment - Merged for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/55605/
            Subject: LU-17853 ptlrpc: Negative value for req_waittime
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: f5a74b5a2fee9ead75138ac3073f07d85fadcf17

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/55605/ Subject: LU-17853 ptlrpc: Negative value for req_waittime Project: fs/lustre-release Branch: master Current Patch Set: Commit: f5a74b5a2fee9ead75138ac3073f07d85fadcf17

            "Frederick Dilger <fdilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55605
            Subject: LU-17853 ptlrpc: Negative value for req_waittime
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: c2c1537e36f091faed36f1269fa1de35658a45ec

            gerrit Gerrit Updater added a comment - "Frederick Dilger <fdilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55605 Subject: LU-17853 ptlrpc: Negative value for req_waittime Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: c2c1537e36f091faed36f1269fa1de35658a45ec

            Probably ktime_get_real() is used because we print these times in console error messages, so they need to be real times and not internal clock values, so I don't think ktime_get_ts64() would be correct here.

            Two possible solutions I can think of:

            • use zero or one if the time difference is negative. This should happen rarely enough that it wouldn't affect the statistics much
            • track the time internally with ktime_get() and then adjust it to ktime_get_real() in the places where it is printed. This might need a considerable amount of code change, and if these timestamps are printed in many debug message (eg. DEBUG_REQ()) then it will mean calling ktime_get_real() all the time anyway.
            adilger Andreas Dilger added a comment - Probably ktime_get_real() is used because we print these times in console error messages, so they need to be real times and not internal clock values, so I don't think ktime_get_ts64() would be correct here. Two possible solutions I can think of: use zero or one if the time difference is negative. This should happen rarely enough that it wouldn't affect the statistics much track the time internally with ktime_get() and then adjust it to ktime_get_real() in the places where it is printed. This might need a considerable amount of code change, and if these timestamps are printed in many debug message (eg. DEBUG_REQ() ) then it will mean calling ktime_get_real() all the time anyway.
            cfaber Colin Faber added a comment -

            It seems this should be ktime_get_ts64() or similar to provide for a monotonic interface to ktime.

            cfaber Colin Faber added a comment - It seems this should be ktime_get_ts64() or similar to provide for a monotonic interface to ktime.

            People

              fdilger Fred Dilger
              utopiabound Nathaniel Clark
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: