[LU-17083] sanity-lnet test_205: Expected 2 resends found x Created: 04/Sep/23 Updated: 04/Sep/23 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
This issue was created by maloo for eaujames <eaujames@ddn.com> This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/fd95d551-a9e8-4c26-8f57-98940f6e6532 test_205 failed with the following error: Expected 2 resends found 0 Test session details: Pre resends: 6
Post resends: 8
Resends delta: 2
Pre local health: 3000
Post local health: 2700
Pre remote health: 2000
Post remote health: 2000
/usr/sbin/lnetctl peer set --health 1000 --all
/usr/sbin/lnetctl net set --health 1000 --all
Removed 8 drop rules
Check that 2 resends took place
Check that local NI health has been changed
Simulate local_timeout
Added drop rule 10.240.29.136@tcp->10.240.29.56@tcp (1/1)
Added drop rule 10.240.29.136@tcp->10.240.29.56@tcp1 (1/1)
Added drop rule 10.240.29.136@tcp->10.240.29.136@tcp (1/1)
Added drop rule 10.240.29.136@tcp->10.240.29.136@tcp1 (1/1)
Added drop rule 10.240.29.136@tcp1->10.240.29.56@tcp (1/1)
Added drop rule 10.240.29.136@tcp1->10.240.29.56@tcp1 (1/1)
Added drop rule 10.240.29.136@tcp1->10.240.29.136@tcp (1/1)
Added drop rule 10.240.29.136@tcp1->10.240.29.136@tcp1 (1/1)
/usr/sbin/lnetctl ping 10.240.29.56@tcp
manage:
- ping:
errno: -1
descr: failed to ping 10.240.29.56@tcp: Operation canceled
Pre resends: 8
Post resends: 8
Resends delta: 0
Pre local health: 3000
Post local health: 2800
Pre remote health: 2000
Post remote health: 2000
/usr/sbin/lnetctl peer set --health 1000 --all
/usr/sbin/lnetctl net set --health 1000 --all
Removed 8 drop rules
Check that 2 resends took place
sanity-lnet test_205: @@@@@@ FAIL: Expected 2 resends found 0
[13097.969039] Lustre: DEBUG MARKER: /usr/sbin/lnetctl lnet configure --all [13097.973832] LNet: Added LNI 10.240.29.136@tcp [8/256/0/180] [13097.974649] LNet: Accept all, port 7988 [13098.315173] Lustre: DEBUG MARKER: /usr/sbin/lnetctl discover 10.240.29.56@tcp [13103.133331] Lustre: DEBUG MARKER: /usr/sbin/lnetctl lnet configure [13103.139318] Lustre: DEBUG MARKER: /usr/sbin/lnetctl net add --net tcp1 --if eth0 [13103.142951] LNet: Added LNI 10.240.29.136@tcp1 [8/256/0/180] [13103.343360] Lustre: DEBUG MARKER: /usr/sbin/lnetctl ping 10.240.29.56@tcp [13103.346783] LNet: There was an unexpected network error while writing to 10.240.29.56: rc = -22 [13103.347984] LNet: 1 local NIs in recovery (showing 1): 10.240.29.136@tcp [13103.409422] LNet: 1003942:0:(api-ni.c:357:recovery_interval_set()) 'lnet_recovery_interval' has been deprecated [13103.423442] Lustre: DEBUG MARKER: /usr/sbin/lnetctl peer set --health 1000 --all [13103.429448] Lustre: DEBUG MARKER: /usr/sbin/lnetctl net set --health 1000 --all [13103.506732] Lustre: DEBUG MARKER: /usr/sbin/lnetctl ping 10.240.29.56@tcp [13103.596613] Lustre: DEBUG MARKER: /usr/sbin/lnetctl peer set --health 1000 --all [13103.602640] Lustre: DEBUG MARKER: /usr/sbin/lnetctl net set --health 1000 --all [13103.681700] Lustre: DEBUG MARKER: /usr/sbin/lnetctl ping 10.240.29.56@tcp [13103.757408] Lustre: DEBUG MARKER: /usr/sbin/lnetctl peer set --health 1000 --all [13103.763403] Lustre: DEBUG MARKER: /usr/sbin/lnetctl net set --health 1000 --all [13103.844551] Lustre: DEBUG MARKER: /usr/sbin/lnetctl ping 10.240.29.56@tcp [13103.850275] LNet: There was an unexpected network error while writing to 10.240.29.56: rc = -22 [13103.851360] LNet: Skipped 8 previous similar messages [13103.920628] Lustre: DEBUG MARKER: /usr/sbin/lnetctl peer set --health 1000 --all [13103.926682] Lustre: DEBUG MARKER: /usr/sbin/lnetctl net set --health 1000 --all [13104.007477] Lustre: DEBUG MARKER: /usr/sbin/lnetctl ping 10.240.29.56@tcp [13106.921400] LNetError: 1003626:0:(lib-move.c:3441:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.240.29.56@tcp1: -125 [13106.974460] LNet: 1004214:0:(api-ni.c:357:recovery_interval_set()) 'lnet_recovery_interval' has been deprecated [13106.975673] LNet: 1004214:0:(api-ni.c:357:recovery_interval_set()) Skipped 3 previous similar messages [13106.988985] Lustre: DEBUG MARKER: /usr/sbin/lnetctl peer set --health 1000 --all [13106.994939] Lustre: DEBUG MARKER: /usr/sbin/lnetctl net set --health 1000 --all [13107.267145] Lustre: DEBUG MARKER: /usr/sbin/lctl mark sanity-lnet test_205: @@@@@@ FAIL: Expected 2 resends found 0 [13107.478131] Lustre: DEBUG MARKER: sanity-lnet test_205: @@@@@@ FAIL: Expected 2 resends found 0 VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV |