[LU-13667] ptlrpc_pinger_main is stuck in endless loop Created: 12/Jun/20  Updated: 17/Oct/20  Resolved: 11/Jul/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.14.0, Lustre 2.12.6

Type: Bug Priority: Minor
Reporter: Hongchao Zhang Assignee: Hongchao Zhang
Resolution: Fixed Votes: 0
Labels: llnl

Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

In ptlrpc_pinger_main, the process of the pingable imports or obd_update_maxusage
could cost long time and be stuck in endless loop because of the negative timeout
returned by pinger_check_timeout



 Comments   
Comment by Gerrit Updater [ 12/Jun/20 ]

Hongchao Zhang (hongchao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38915
Subject: LU-13667 ptlrpc: fix endless loop issue
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 9ccad707e88b3b2bc118eac588817737cb9da1c9

Comment by Olaf Faaland [ 02/Jul/20 ]

I believe we hit this today on a Lustre 2.12.4 client. The pinger was taking 100% of a core. Over 498 seconds, the "next wakeup in" message appeared in the debug log 76,716 times, and the time_to_next_wake started at -41,852 and ended at -42,350 (getting more and more negative with time).

Comment by Olaf Faaland [ 02/Jul/20 ]

Please let me know whether you agree my described symptoms match this issue, thanks.

Comment by Hongchao Zhang [ 02/Jul/20 ]

Yes, it should be the same issue with this ticket.

Comment by Olaf Faaland [ 02/Jul/20 ]

Thank you. This should go into b2_12 after it's merged to master.

Comment by Gerrit Updater [ 10/Jul/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38915/
Subject: LU-13667 ptlrpc: fix endless loop issue
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 6be2dbb2595121fabceda86c5f7bdcb45e10b320

Comment by Peter Jones [ 11/Jul/20 ]

Landed for 2.14

Comment by Gerrit Updater [ 13/Jul/20 ]

Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/39344
Subject: LU-13667 ptlrpc: fix endless loop issue
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 5acd6853b4a64057ce55174a15a93b11d2922eab

Comment by Gerrit Updater [ 07/Aug/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39344/
Subject: LU-13667 ptlrpc: fix endless loop issue
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 95cd26446e16c63b531ed94a844b5f69c8b3730f

Generated at Sat Feb 10 03:03:12 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.