[LU-17083] sanity-lnet test_205: Expected 2 resends found x Created: 04/Sep/23  Updated: 04/Sep/23

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for eaujames <eaujames@ddn.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/fd95d551-a9e8-4c26-8f57-98940f6e6532

test_205 failed with the following error:

Expected 2 resends found 0

Test session details:
clients: https://build.whamcloud.com/job/lustre-reviews/97552 - 5.14.0-284.25.1.el9_2.x86_64
servers: https://build.whamcloud.com/job/lustre-reviews/97552 - 4.18.0-477.15.1.el8_lustre.x86_64

Pre resends: 6
Post resends: 8
Resends delta: 2
Pre local health: 3000
Post local health: 2700
Pre remote health: 2000
Post remote health: 2000
/usr/sbin/lnetctl peer set --health 1000 --all
/usr/sbin/lnetctl net set --health 1000 --all
Removed 8 drop rules
Check that 2 resends took place
Check that local NI health has been changed
Simulate local_timeout
Added drop rule 10.240.29.136@tcp->10.240.29.56@tcp (1/1)
Added drop rule 10.240.29.136@tcp->10.240.29.56@tcp1 (1/1)
Added drop rule 10.240.29.136@tcp->10.240.29.136@tcp (1/1)
Added drop rule 10.240.29.136@tcp->10.240.29.136@tcp1 (1/1)
Added drop rule 10.240.29.136@tcp1->10.240.29.56@tcp (1/1)
Added drop rule 10.240.29.136@tcp1->10.240.29.56@tcp1 (1/1)
Added drop rule 10.240.29.136@tcp1->10.240.29.136@tcp (1/1)
Added drop rule 10.240.29.136@tcp1->10.240.29.136@tcp1 (1/1)
/usr/sbin/lnetctl ping 10.240.29.56@tcp
manage:
    - ping:
          errno: -1
          descr: failed to ping 10.240.29.56@tcp: Operation canceled
                 
Pre resends: 8
Post resends: 8
Resends delta: 0
Pre local health: 3000
Post local health: 2800
Pre remote health: 2000
Post remote health: 2000
/usr/sbin/lnetctl peer set --health 1000 --all
/usr/sbin/lnetctl net set --health 1000 --all
Removed 8 drop rules
Check that 2 resends took place
 sanity-lnet test_205: @@@@@@ FAIL: Expected 2 resends found 0 
[13097.969039] Lustre: DEBUG MARKER: /usr/sbin/lnetctl lnet configure --all
[13097.973832] LNet: Added LNI 10.240.29.136@tcp [8/256/0/180]
[13097.974649] LNet: Accept all, port 7988
[13098.315173] Lustre: DEBUG MARKER: /usr/sbin/lnetctl discover 10.240.29.56@tcp
[13103.133331] Lustre: DEBUG MARKER: /usr/sbin/lnetctl lnet configure
[13103.139318] Lustre: DEBUG MARKER: /usr/sbin/lnetctl net add --net tcp1 --if eth0
[13103.142951] LNet: Added LNI 10.240.29.136@tcp1 [8/256/0/180]
[13103.343360] Lustre: DEBUG MARKER: /usr/sbin/lnetctl ping 10.240.29.56@tcp
[13103.346783] LNet: There was an unexpected network error while writing to 10.240.29.56: rc = -22
[13103.347984] LNet: 1 local NIs in recovery (showing 1): 10.240.29.136@tcp
[13103.409422] LNet: 1003942:0:(api-ni.c:357:recovery_interval_set()) 'lnet_recovery_interval' has been deprecated
[13103.423442] Lustre: DEBUG MARKER: /usr/sbin/lnetctl peer set --health 1000 --all
[13103.429448] Lustre: DEBUG MARKER: /usr/sbin/lnetctl net set --health 1000 --all
[13103.506732] Lustre: DEBUG MARKER: /usr/sbin/lnetctl ping 10.240.29.56@tcp
[13103.596613] Lustre: DEBUG MARKER: /usr/sbin/lnetctl peer set --health 1000 --all
[13103.602640] Lustre: DEBUG MARKER: /usr/sbin/lnetctl net set --health 1000 --all
[13103.681700] Lustre: DEBUG MARKER: /usr/sbin/lnetctl ping 10.240.29.56@tcp
[13103.757408] Lustre: DEBUG MARKER: /usr/sbin/lnetctl peer set --health 1000 --all
[13103.763403] Lustre: DEBUG MARKER: /usr/sbin/lnetctl net set --health 1000 --all
[13103.844551] Lustre: DEBUG MARKER: /usr/sbin/lnetctl ping 10.240.29.56@tcp
[13103.850275] LNet: There was an unexpected network error while writing to 10.240.29.56: rc = -22
[13103.851360] LNet: Skipped 8 previous similar messages
[13103.920628] Lustre: DEBUG MARKER: /usr/sbin/lnetctl peer set --health 1000 --all
[13103.926682] Lustre: DEBUG MARKER: /usr/sbin/lnetctl net set --health 1000 --all
[13104.007477] Lustre: DEBUG MARKER: /usr/sbin/lnetctl ping 10.240.29.56@tcp
[13106.921400] LNetError: 1003626:0:(lib-move.c:3441:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.240.29.56@tcp1: -125
[13106.974460] LNet: 1004214:0:(api-ni.c:357:recovery_interval_set()) 'lnet_recovery_interval' has been deprecated
[13106.975673] LNet: 1004214:0:(api-ni.c:357:recovery_interval_set()) Skipped 3 previous similar messages
[13106.988985] Lustre: DEBUG MARKER: /usr/sbin/lnetctl peer set --health 1000 --all
[13106.994939] Lustre: DEBUG MARKER: /usr/sbin/lnetctl net set --health 1000 --all
[13107.267145] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  sanity-lnet test_205: @@@@@@ FAIL: Expected 2 resends found 0 
[13107.478131] Lustre: DEBUG MARKER: sanity-lnet test_205: @@@@@@ FAIL: Expected 2 resends found 0

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity-lnet test_205 - Expected 2 resends found 0


Generated at Sat Feb 10 03:32:29 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.