[LU-14654] Need to check if lnet_recovery_limit is non-zero in lnet_peer_ni_add_to_recoveryq_locked() Created: 29/Apr/21  Updated: 08/Jul/21  Resolved: 08/Jul/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.15.0

Type: Bug Priority: Minor
Reporter: Chris Horn Assignee: Chris Horn
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Bug in 'cc27201a76 LU-13569 lnet: Age peer NI out of recovery' https://review.whamcloud.com/39718

If lnet_recovery_limit is 0 then we're supposed to allow for indefinite recovery of peer NIs. However, we don't check for this in lnet_peer_ni_add_to_recoveryq_locked():

        if (now > lpni->lpni_last_alive + lnet_recovery_limit) {
                CDEBUG(D_NET, "lpni %s aged out last alive %lld\n",
                       libcfs_nid2str(lpni->lpni_nid),
                       lpni->lpni_last_alive);
                return;
        }

We just need to modify this condition to check whether lnet_recovery_limit is zero/non-zero.



 Comments   
Comment by Gerrit Updater [ 29/Apr/21 ]

Chris Horn (chris.horn@hpe.com) uploaded a new patch: https://review.whamcloud.com/43501
Subject: LU-14654 lnet: Correct peer NI recovery age out calculation
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 4e59c31083674ab082476699133a80d7a6e23e65

Comment by Gerrit Updater [ 29/Apr/21 ]

Chris Horn (chris.horn@hpe.com) uploaded a new patch: https://review.whamcloud.com/43502
Subject: LU-14654 tests: Ensure recovery_limit zero works as expected
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 3c24633b629810746fff324271e801a71523ea36

Comment by Gerrit Updater [ 08/Jul/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/43501/
Subject: LU-14654 lnet: Correct peer NI recovery age out calculation
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 8f3f0e1219724d6e0ed727e46b28ab28203aef9f

Comment by Gerrit Updater [ 08/Jul/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/43502/
Subject: LU-14654 tests: Ensure recovery_limit zero works as expected
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 8d1895f2f69bd2eec3ff6af5eb356740fa2c8766

Generated at Sat Feb 10 03:11:37 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.