Details
-
Bug
-
Resolution: Fixed
-
Minor
-
None
-
3
-
9223372036854775807
Description
System configurations which result in Lustre layer specifying the same MR peer using multiple NIDs cause an issue with primary NID locking logic: when "primary nid locking" feature is enabled, LNet creates separate peer records, each record containing one NID of the MR peer as "locked primary". After the discovery completes in the background, these records are not being merged. This results in incorrect peer representation. Here's an example:
server:
# lnetctl net show net: - net type: lo local NI(s): - nid: 0@lo status: up - net type: tcp local NI(s): - nid: 192.168.122.50@tcp status: up interfaces: 0: eth0 - nid: 192.168.122.134@tcp status: up interfaces: 0: ens12
client:
# mount -t lustre 192.168.122.134@tcp:192.168.122.50@tcp:/lustrewt /mnt/lustrefs # lnetctl peer show peer: - primary nid: 192.168.122.134@tcp Multi-Rail: True peer ni: - nid: 192.168.122.134@tcp state: NA - primary nid: 192.168.122.50@tcp Multi-Rail: True peer ni: - nid: 192.168.122.50@tcp state: NA