[LU-13806] LNetError: 7229:0:(peer.c:529:lnet_peer_del_nid()) ASSERTION( lpni2 ) failed: Created: 20/Jul/20  Updated: 07/Feb/24  Resolved: 28/Apr/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.15.0

Type: Bug Priority: Minor
Reporter: Chris Horn Assignee: Chris Horn
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

sles15s01 config:

sles15s01:~ # start.sh
debug=+net
sles15s01:~ # lctl list_nids
192.168.2.30@tcp1
sles15s01:~ # lctl show_route
net               tcp2 hops 4294967295 gw                192.168.2.20@tcp1 up pri 0
sles15s01:~ #

sles15c01 config:

sles15c01:~ # start.sh
debug=+net
sles15c01:~ # lctl list_nids
192.168.2.38@tcp2
sles15c01:~ # lctl show_route
net               tcp1 hops 4294967295 gw                192.168.2.50@tcp2 up pri 0
sles15c01:~ #

sles15build01 config:

sles15build01:~ # start.sh
debug=+net
sles15build01:~ # lctl list_nids
192.168.2.20@tcp1
192.168.2.50@tcp2
sles15build01:~ #
sles15c01:~ # date; lnetctl peer show; lctl show_route; ssh sles15build01 'lnetctl net del --net tcp1 --if eth0'; sleep 1; lnetctl peer show; lctl show_route

Assert was tripped after the lnetctl net del command was executed on sles15build01.

I think the bug is probably in the initial router discovery. When we create the lpni for the primary nid, 192.168.2.20@tcp1, and attach it to peer it is at the end of the list, but I think primary nid is expected to be the first one.



 Comments   
Comment by Gerrit Updater [ 15/Dec/20 ]

Chris Horn (chris.horn@hpe.com) uploaded a new patch: https://review.whamcloud.com/40985
Subject: LU-13806 lnet: Ensure proper peer, peer NI, peer net hierarchy
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: db31e56552528e6d85fa06049372d2f22e6fab1e

Comment by Gerrit Updater [ 28/Apr/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40985/
Subject: LU-13806 lnet: Ensure proper peer, peer NI, peer net hierarchy
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 9eb9474c41c823c70f34e6bb102a8861ca21a3d1

Comment by Peter Jones [ 28/Apr/21 ]

Landed for 2.15

Generated at Sat Feb 10 03:04:22 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.