Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12838

ptlrpc watchdog ratelimiting is broken

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.13.0
    • Lustre 2.13.0
    • None
    • 3
    • 9223372036854775807

    Description

      The ptlrpc thread ratelimiting added in patch https://review.whamcloud.com/33018 "LU-9859 libcfs: add watchdog for ptlrpc service threads" is broken. The kernel always prints:

      [29352.393371] Lustre: mdt00_009: service thread pid 18935 was inactive for 72.167 seconds. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one.
      

      even though there hasn't been any stack trace printed before. This is visible in e.g. sanityn test_104 timeouts on the MDS when testing LU-11549, but is also visible in sanity test 422 results in the same log.

      It looks like the __ratelimit() return value is backward is backward from what one would expect from normal English grammar, namely that "if (__ratelimit())" is true then the action should NOT be ratelimited, and vice versa.

      Trivial patch to follow. This should be included in 2.13.0 as it was broken in commit v2_12_50-83-gfc9de67 and would make debugging problems reported from the field significantly more complex than necessary.

      Attachments

        Issue Links

          Activity

            People

              adilger Andreas Dilger
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: