[LU-12838] ptlrpc watchdog ratelimiting is broken Created: 08/Oct/19  Updated: 02/Apr/21  Resolved: 18/Oct/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.13.0
Fix Version/s: Lustre 2.13.0

Type: Bug Priority: Critical
Reporter: Andreas Dilger Assignee: Andreas Dilger
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-12749 sanity-quota test_6: FAIL: [22292.915... Resolved
is related to LU-9859 libcfs simplification Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

The ptlrpc thread ratelimiting added in patch https://review.whamcloud.com/33018 "LU-9859 libcfs: add watchdog for ptlrpc service threads" is broken. The kernel always prints:

[29352.393371] Lustre: mdt00_009: service thread pid 18935 was inactive for 72.167 seconds. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one.

even though there hasn't been any stack trace printed before. This is visible in e.g. sanityn test_104 timeouts on the MDS when testing LU-11549, but is also visible in sanity test 422 results in the same log.

It looks like the __ratelimit() return value is backward is backward from what one would expect from normal English grammar, namely that "if (__ratelimit())" is true then the action should NOT be ratelimited, and vice versa.

Trivial patch to follow. This should be included in 2.13.0 as it was broken in commit v2_12_50-83-gfc9de67 and would make debugging problems reported from the field significantly more complex than necessary.



 Comments   
Comment by Gerrit Updater [ 08/Oct/19 ]

Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36409
Subject: LU-12838 ptlrpc: fix watchdog ratelimit logic
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 0127ab2033b6c5a94eae84202a8a80bb62715c8c

Comment by Gerrit Updater [ 18/Oct/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36409/
Subject: LU-12838 ptlrpc: fix watchdog ratelimit logic
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 594c79f2f855737fa415562a9bbb3fb13aee9ec9

Comment by Peter Jones [ 18/Oct/19 ]

Landed for 2.13

Generated at Sat Feb 10 02:56:05 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.