Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Lustre 2.13.0
-
None
-
3
-
9223372036854775807
Description
The ptlrpc thread ratelimiting added in patch https://review.whamcloud.com/33018 "LU-9859 libcfs: add watchdog for ptlrpc service threads" is broken. The kernel always prints:
[29352.393371] Lustre: mdt00_009: service thread pid 18935 was inactive for 72.167 seconds. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one.
even though there hasn't been any stack trace printed before. This is visible in e.g. sanityn test_104 timeouts on the MDS when testing LU-11549, but is also visible in sanity test 422 results in the same log.
It looks like the __ratelimit() return value is backward is backward from what one would expect from normal English grammar, namely that "if (__ratelimit())" is true then the action should NOT be ratelimited, and vice versa.
Trivial patch to follow. This should be included in 2.13.0 as it was broken in commit v2_12_50-83-gfc9de67 and would make debugging problems reported from the field significantly more complex than necessary.