[LU-13806] LNetError: 7229:0:(peer.c:529:lnet_peer_del_nid()) ASSERTION( lpni2 ) failed: Created: 20/Jul/20 Updated: 07/Feb/24 Resolved: 28/Apr/21 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.15.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Chris Horn | Assignee: | Chris Horn |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
sles15s01 config: sles15s01:~ # start.sh debug=+net sles15s01:~ # lctl list_nids 192.168.2.30@tcp1 sles15s01:~ # lctl show_route net tcp2 hops 4294967295 gw 192.168.2.20@tcp1 up pri 0 sles15s01:~ # sles15c01 config: sles15c01:~ # start.sh debug=+net sles15c01:~ # lctl list_nids 192.168.2.38@tcp2 sles15c01:~ # lctl show_route net tcp1 hops 4294967295 gw 192.168.2.50@tcp2 up pri 0 sles15c01:~ # sles15build01 config: sles15build01:~ # start.sh debug=+net sles15build01:~ # lctl list_nids 192.168.2.20@tcp1 192.168.2.50@tcp2 sles15build01:~ # sles15c01:~ # date; lnetctl peer show; lctl show_route; ssh sles15build01 'lnetctl net del --net tcp1 --if eth0'; sleep 1; lnetctl peer show; lctl show_route Assert was tripped after the lnetctl net del command was executed on sles15build01. I think the bug is probably in the initial router discovery. When we create the lpni for the primary nid, 192.168.2.20@tcp1, and attach it to peer it is at the end of the list, but I think primary nid is expected to be the first one. |
| Comments |
| Comment by Gerrit Updater [ 15/Dec/20 ] |
|
Chris Horn (chris.horn@hpe.com) uploaded a new patch: https://review.whamcloud.com/40985 |
| Comment by Gerrit Updater [ 28/Apr/21 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40985/ |
| Comment by Peter Jones [ 28/Apr/21 ] |
|
Landed for 2.15 |