[LU-14941] Remote NI recovery pings do not work in routed environment Created: 16/Aug/21 Updated: 30/Nov/21 Resolved: 30/Nov/21 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.15.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Chris Horn | Assignee: | Chris Horn |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
Issue impacts all Lustre versions with LNet health. The routing algorithm does not ensure that recovery pings go to correct NI. In fact, they are actively avoided since, by definition, the health of that NI is sub-optimal. Here's a simple test showing the problem (test requires sles15c01:/home/hornc/lustre-filesystem # lnetctl peer show --nid 192.168.2.35@tcp1
peer:
- primary nid: 192.168.2.34@tcp1
Multi-Rail: True
peer ni:
- nid: 192.168.2.34@tcp1
state: NA
- nid: 192.168.2.35@tcp1
state: NA
sles15c01:/home/hornc/lustre-filesystem # lctl list_nids
192.168.2.38@tcp2
192.168.2.39@tcp2
sles15c01:/home/hornc/lustre-filesystem # lctl show_route
net tcp1 hops 4294967295 gw 192.168.2.33@tcp2 up pri 0
sles15c01:/home/hornc/lustre-filesystem # lnetctl peer show -v 2 --nid 192.168.2.35@tcp1 | egrep -e nid -e health
- primary nid: 192.168.2.34@tcp1
- nid: 192.168.2.34@tcp1
health stats:
health value: 1000
- nid: 192.168.2.35@tcp1
health stats:
health value: 1000
sles15c01:/home/hornc/lustre-filesystem # lnetctl peer set --health 0 --nid 192.168.2.35@tcp1
sles15c01:/home/hornc/lustre-filesystem # lnetctl peer show -v 2 --nid 192.168.2.35@tcp1 | egrep -e nid -e health
- primary nid: 192.168.2.34@tcp1
- nid: 192.168.2.34@tcp1
health stats:
health value: 1000
- nid: 192.168.2.35@tcp1
health stats:
health value: 0
sles15c01:/home/hornc/lustre-filesystem # sleep 30
sles15c01:/home/hornc/lustre-filesystem # lnetctl peer show -v 2 --nid 192.168.2.35@tcp1 | egrep -e nid -e health
- primary nid: 192.168.2.34@tcp1
- nid: 192.168.2.34@tcp1
health stats:
health value: 1000
- nid: 192.168.2.35@tcp1
health stats:
health value: 0
sles15c01:/home/hornc/lustre-filesystem # lctl dk > /tmp/dk.log
sles15c01:/home/hornc/lustre-filesystem # grep lnet_handle_send /tmp/dk.log | grep -c TRACE
513
sles15c01:/home/hornc/lustre-filesystem # grep lnet_handle_send /tmp/dk.log | grep TRACE | grep -- '-> 192.168.2.35@tcp1'
sles15c01:/home/hornc/lustre-filesystem #
|
| Comments |
| Comment by Gerrit Updater [ 23/Aug/21 ] |
|
"Chris Horn <chris.horn@hpe.com>" uploaded a new patch: https://review.whamcloud.com/44730 |
| Comment by Chris Horn [ 18/Oct/21 ] |
|
Test config: Router: sles15s02:~ # lctl list_nids 192.168.2.32@tcp1 192.168.2.33@tcp2 sles15s02:~ # Clients: sles15s01:~ # lctl list_nids 192.168.2.30@tcp2 192.168.2.31@tcp2 sles15s01:~ # lctl show_route net tcp1 hops 4294967295 gw 192.168.2.33@tcp2 up pri 0 sles15s01:~ # sles15c01:~ # lctl list_nids 192.168.2.38@tcp1 192.168.2.39@tcp1 sles15c01:~ # lctl show_route net tcp2 hops 4294967295 gw 192.168.2.32@tcp1 up pri 0 sles15c01:~ # Send a couple messages from c01 to s01: sles15c01:~ # lnetctl discover 192.168.2.30@tcp2
discover:
- primary nid: 192.168.2.30@tcp2
Multi-Rail: True
peer ni:
- nid: 192.168.2.30@tcp2
- nid: 192.168.2.31@tcp2
sles15c01:~ # lnetctl discover 192.168.2.30@tcp2
discover:
- primary nid: 192.168.2.30@tcp2
Multi-Rail: True
peer ni:
- nid: 192.168.2.30@tcp2
- nid: 192.168.2.31@tcp2
sles15c01:~ # lnetctl discover 192.168.2.30@tcp2
discover:
- primary nid: 192.168.2.30@tcp2
Multi-Rail: True
peer ni:
- nid: 192.168.2.30@tcp2
- nid: 192.168.2.31@tcp2
Check that we've received messages from both s01's NIs: sles15c01:~ # lctl dk | grep lnet_parse | egrep -e 192.168.2.30@tcp2 -e 192.168.2.31@tcp2 00000400:00000200:0.0:1629482198.234301:0:16222:0:(lib-move.c:4429:lnet_parse()) TRACE: 192.168.2.39@tcp1(192.168.2.39@tcp1) <- 192.168.2.30@tcp2 : REPLY - for me 00000400:00000200:0.0:1629482198.234307:0:16222:0:(lib-move.c:4194:lnet_parse_reply()) 192.168.2.39@tcp1: Reply from 12345-192.168.2.30@tcp2 of length 64/64 into md 0x19 00000400:00000200:0.0:1629482198.235691:0:16222:0:(lib-move.c:4429:lnet_parse()) TRACE: 192.168.2.39@tcp1(192.168.2.39@tcp1) <- 192.168.2.31@tcp2 : ACK - for me 00000400:00000200:0.0:1629482198.235695:0:16222:0:(lib-move.c:4247:lnet_parse_ack()) 192.168.2.39@tcp1: ACK from 12345-192.168.2.31@tcp2 into md 0x1d 00000400:00000200:0.0:1629482198.971600:0:16222:0:(lib-move.c:4429:lnet_parse()) TRACE: 192.168.2.38@tcp1(192.168.2.38@tcp1) <- 192.168.2.30@tcp2 : REPLY - for me 00000400:00000200:0.0:1629482198.971606:0:16222:0:(lib-move.c:4194:lnet_parse_reply()) 192.168.2.38@tcp1: Reply from 12345-192.168.2.30@tcp2 of length 64/64 into md 0x21 00000400:00000200:0.0:1629482199.940773:0:16222:0:(lib-move.c:4429:lnet_parse()) TRACE: 192.168.2.39@tcp1(192.168.2.39@tcp1) <- 192.168.2.31@tcp2 : REPLY - for me 00000400:00000200:0.0:1629482199.940782:0:16222:0:(lib-move.c:4194:lnet_parse_reply()) 192.168.2.39@tcp1: Reply from 12345-192.168.2.31@tcp2 of length 64/64 into md 0x25 Set health to 0: sles15c01:~ # lnetctl peer set --health 0 --nid 192.168.2.30@tcp2 sles15c01:~ # lctl dk | grep recovery 00000400:00000200:0.0:1629482216.841269:0:16310:0:(peer.c:4035:lnet_peer_ni_add_to_recoveryq_locked()) 192.168.2.30@tcp2 added to recovery queue. ping count: 0 next ping: 671368 last alive: 671349 health: 0 00000400:00000200:0.0:1629482217.032143:0:16202:0:(peer.c:4035:lnet_peer_ni_add_to_recoveryq_locked()) 192.168.2.30@tcp2 added to recovery queue. ping count: 1 next ping: 671370 last alive: 671349 health: 0 00000400:00000200:0.0:1629482217.032203:0:16222:0:(lib-move.c:3864:lnet_mt_event_handler()) 192.168.2.30@tcp2 recovery message sent successfully:0 00000400:00000200:0.0:1629482219.080116:0:16202:0:(peer.c:4035:lnet_peer_ni_add_to_recoveryq_locked()) 192.168.2.30@tcp2 added to recovery queue. ping count: 2 next ping: 671374 last alive: 671349 health: 0 00000400:00000200:0.0:1629482219.080213:0:16222:0:(lib-move.c:3864:lnet_mt_event_handler()) 192.168.2.30@tcp2 recovery message sent successfully:0 sles15c01:~ # We can see NI is now queued for recovery correctly. However, after some time the health still has not recovered: sles15c01:~ # sleep 30
sles15c01:~ # lnetctl peer show -v 2 --nid 192.168.2.30@tcp2 | egrep -e nid -e health
- primary nid: 192.168.2.30@tcp2
- nid: 192.168.2.30@tcp2
health stats:
health value: 0
- nid: 192.168.2.31@tcp2
health stats:
health value: 1000
sles15c01:~ #
From debug log we can see that no recovery pings are being sent to the NI in recovery. Instead they are being sent to the healthy NI: sles15c01:~ # grep lnet_handle_send /tmp/dk.log | grep TRACE | grep -- '-> 192.168.2.30@tcp2' sles15c01:~ # grep lnet_handle_send /tmp/dk.log | grep TRACE | grep -- '-> 192.168.2.31@tcp2' 00000400:00000200:0.0:1629482844.232128:0:17222:0:(lib-move.c:1907:lnet_handle_send()) TRACE: 192.168.2.38@tcp1(192.168.2.38@tcp1:<?>) -> 192.168.2.31@tcp2(192.168.2.30@tcp2:192.168.2.32@tcp1) <?> : GET try# 0 00000400:00000200:0.0:1629482846.280128:0:17222:0:(lib-move.c:1907:lnet_handle_send()) TRACE: 192.168.2.39@tcp1(192.168.2.39@tcp1:<?>) -> 192.168.2.31@tcp2(192.168.2.30@tcp2:192.168.2.32@tcp1) <?> : GET try# 0 00000400:00000200:0.0:1629482850.376103:0:17222:0:(lib-move.c:1907:lnet_handle_send()) TRACE: 192.168.2.38@tcp1(192.168.2.38@tcp1:<?>) -> 192.168.2.31@tcp2(192.168.2.30@tcp2:192.168.2.32@tcp1) <?> : GET try# 0 00000400:00000200:0.0:1629482858.568104:0:17222:0:(lib-move.c:1907:lnet_handle_send()) TRACE: 192.168.2.39@tcp1(192.168.2.39@tcp1:<?>) -> 192.168.2.31@tcp2(192.168.2.30@tcp2:192.168.2.32@tcp1) <?> : GET try# 0 00000400:00000200:0.0:1629482874.952126:0:17222:0:(lib-move.c:1907:lnet_handle_send()) TRACE: 192.168.2.38@tcp1(192.168.2.38@tcp1:<?>) -> 192.168.2.31@tcp2(192.168.2.30@tcp2:192.168.2.32@tcp1) <?> : GET try# 0 00000400:00000200:0.0:1629482906.696122:0:17222:0:(lib-move.c:1907:lnet_handle_send()) TRACE: 192.168.2.38@tcp1(192.168.2.38@tcp1:<?>) -> 192.168.2.31@tcp2(192.168.2.30@tcp2:192.168.2.32@tcp1) <?> : GET try# 0 sles15c01:~ # Re-test with the fix applied sles15c01:~ # lnetctl discover 192.168.2.30@tcp2
discover:
- primary nid: 192.168.2.30@tcp2
Multi-Rail: True
peer ni:
- nid: 192.168.2.30@tcp2
- nid: 192.168.2.31@tcp2
sles15c01:~ # lnetctl discover 192.168.2.30@tcp2
discover:
- primary nid: 192.168.2.30@tcp2
Multi-Rail: True
peer ni:
- nid: 192.168.2.30@tcp2
- nid: 192.168.2.31@tcp2
sles15c01:~ # lnetctl discover 192.168.2.30@tcp2
discover:
- primary nid: 192.168.2.30@tcp2
Multi-Rail: True
peer ni:
- nid: 192.168.2.30@tcp2
- nid: 192.168.2.31@tcp2
sles15c01:~ # lctl dk | grep lnet_parse | egrep -e 192.168.2.30@tcp2 -e 192.168.2.31@tcp2
00000400:00000200:0.0:1629483126.305212:0:17591:0:(lib-move.c:4434:lnet_parse()) TRACE: 192.168.2.39@tcp1(192.168.2.39@tcp1) <- 192.168.2.30@tcp2 : REPLY - for me
00000400:00000200:0.0:1629483126.305218:0:17591:0:(lib-move.c:4199:lnet_parse_reply()) 192.168.2.39@tcp1: Reply from 12345-192.168.2.30@tcp2 of length 64/64 into md 0x19
00000400:00000200:0.0:1629483126.306293:0:17591:0:(lib-move.c:4434:lnet_parse()) TRACE: 192.168.2.39@tcp1(192.168.2.39@tcp1) <- 192.168.2.30@tcp2 : ACK - for me
00000400:00000200:0.0:1629483126.306299:0:17591:0:(lib-move.c:4252:lnet_parse_ack()) 192.168.2.39@tcp1: ACK from 12345-192.168.2.30@tcp2 into md 0x1d
00000400:00000200:0.0:1629483127.213996:0:17591:0:(lib-move.c:4434:lnet_parse()) TRACE: 192.168.2.38@tcp1(192.168.2.38@tcp1) <- 192.168.2.31@tcp2 : REPLY - for me
00000400:00000200:0.0:1629483127.214003:0:17591:0:(lib-move.c:4199:lnet_parse_reply()) 192.168.2.38@tcp1: Reply from 12345-192.168.2.31@tcp2 of length 64/64 into md 0x21
00000400:00000200:0.0:1629483128.216060:0:17591:0:(lib-move.c:4434:lnet_parse()) TRACE: 192.168.2.39@tcp1(192.168.2.39@tcp1) <- 192.168.2.30@tcp2 : REPLY - for me
00000400:00000200:0.0:1629483128.216067:0:17591:0:(lib-move.c:4199:lnet_parse_reply()) 192.168.2.39@tcp1: Reply from 12345-192.168.2.30@tcp2 of length 64/64 into md 0x25
sles15c01:~ # lnetctl peer set --health 0 --nid 192.168.2.30@tcp2
sles15c01:~ # lnetctl peer show -v 2 --nid 192.168.2.30@tcp2 | egrep -e nid -e health
- primary nid: 192.168.2.30@tcp2
- nid: 192.168.2.30@tcp2
health stats:
health value: 0
- nid: 192.168.2.31@tcp2
health stats:
health value: 1000
sles15c01:~ # sleep 30
sles15c01:~ # lnetctl peer show -v 2 --nid 192.168.2.30@tcp2 | egrep -e nid -e health
- primary nid: 192.168.2.30@tcp2
- nid: 192.168.2.30@tcp2
health stats:
health value: 0
- nid: 192.168.2.31@tcp2
health stats:
health value: 1000
sles15c01:~ # lctl dk > /tmp/dk.log
sles15c01:~ # grep lnet_handle_send /tmp/dk.log | grep TRACE | grep -- '-> 192.168.2.30@tcp2'
00000400:00000200:0.0:1629483137.480098:0:17571:0:(lib-move.c:1907:lnet_handle_send()) TRACE: 192.168.2.38@tcp1(192.168.2.38@tcp1:<?>) -> 192.168.2.30@tcp2(192.168.2.30@tcp2:192.168.2.32@tcp1) <?> : GET try# 0
00000400:00000200:0.0:1629483139.528115:0:17571:0:(lib-move.c:1907:lnet_handle_send()) TRACE: 192.168.2.39@tcp1(192.168.2.39@tcp1:<?>) -> 192.168.2.30@tcp2(192.168.2.30@tcp2:192.168.2.32@tcp1) <?> : GET try# 0
00000400:00000200:0.0:1629483143.624105:0:17571:0:(lib-move.c:1907:lnet_handle_send()) TRACE: 192.168.2.38@tcp1(192.168.2.38@tcp1:<?>) -> 192.168.2.30@tcp2(192.168.2.30@tcp2:192.168.2.32@tcp1) <?> : GET try# 0
00000400:00000200:0.0:1629483151.816108:0:17571:0:(lib-move.c:1907:lnet_handle_send()) TRACE: 192.168.2.39@tcp1(192.168.2.39@tcp1:<?>) -> 192.168.2.30@tcp2(192.168.2.30@tcp2:192.168.2.32@tcp1) <?> : GET try# 0
00000400:00000200:0.0:1629483167.176149:0:17571:0:(lib-move.c:1907:lnet_handle_send()) TRACE: 192.168.2.38@tcp1(192.168.2.38@tcp1:<?>) -> 192.168.2.30@tcp2(192.168.2.30@tcp2:192.168.2.32@tcp1) <?> : GET try# 0
sles15c01:~ # grep lnet_handle_send /tmp/dk.log | grep TRACE | grep -- '-> 192.168.2.31@tcp2'
sles15c01:~ #
We can see that the recovery pings are now going to the correct NI. |
| Comment by Gerrit Updater [ 30/Nov/21 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/44730/ |
| Comment by Peter Jones [ 30/Nov/21 ] |
|
Landed for 2.15 |