[LU-16866] sanity-lnet test_211: Remote NI recovery checks failed Created: 03/Jun/23  Updated: 31/Aug/23  Resolved: 31/Aug/23

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.16.0

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Chris Horn
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Frank Sehr <fsehr@whamcloud.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/f1bad0ba-a4cb-466c-a8b6-08ab90ec3f5f

test_211 failed with the following error:

$'Expect 0 NIDs found: peer NI recovery:n    nid-0: 10.240.44.214@tcp'

Test session details:
clients: https://build.whamcloud.com/job/lustre-reviews/95293 - 4.18.0-425.10.1.el8_7.aarch64
servers: https://build.whamcloud.com/job/lustre-reviews/95293 - 4.18.0-425.10.1.el8_lustre.x86_64

              ping_count: 2
              next_ping: 24922
        - nid: 10.240.44.214@tcp1
              health value: 1000
              ping_count: 0
              next_ping: 0
Expect ping count "2" found "2"
Check "-p" recovery queue

Check ping counts:
    - primary nid: 10.240.44.214@tcp
        - nid: 10.240.44.214@tcp
              health value: 0
              ping_count: 0
              next_ping: 24930
        - nid: 10.240.44.214@tcp1
              health value: 1000
              ping_count: 0
              next_ping: 0
Removed 2 drop rules
/usr/sbin/lnetctl set recovery_limit 0
Check "-p" recovery queue
peer NI recovery:
    nid-0: 10.240.44.214@tcp
Check ping counts:
    - primary nid: 10.240.44.214@tcp
        - nid: 10.240.44.214@tcp
              health value: 500
              ping_count: 0
              next_ping: 24934
        - nid: 10.240.44.214@tcp1
              health value: 1000
              ping_count: 0
              next_ping: 0
Check "-p" recovery queue
peer NI recovery:
    nid-0: 10.240.44.214@tcp
 sanity-lnet test_211: @@@@@@ FAIL: Expect 0 NIDs found: "peer NI recovery:
    nid-0: 10.240.44.214@tcp" 

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity-lnet test_211 - $'Expect 0 NIDs found: peer NI recovery:n nid-0: 10.240.44.214@tcp'



 Comments   
Comment by Gerrit Updater [ 01/Aug/23 ]

"Chris Horn <chris.horn@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51845
Subject: LU-16866 tests: Use wait_update to check LNet recovery state
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 407862b513afd3050479bdc123f70d850692963b

Comment by Gerrit Updater [ 04/Aug/23 ]

"Chris Horn <chris.horn@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51873
Subject: LU-16866 tests: Test patch
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 8fcdc85a33f40989d72ac13fdb39d4b23d33ea77

Comment by Gerrit Updater [ 31/Aug/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/51845/
Subject: LU-16866 tests: Use wait_update to check LNet recovery state
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 8e53a0ea594a7d7eb9cd7541233bc8771d4023b5

Comment by Peter Jones [ 31/Aug/23 ]

Landed for 2.16

Generated at Sat Feb 10 03:30:39 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.