Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Minor
Fix Version/s: Lustre 2.16.0, Lustre 2.15.5
Affects Version/s: None
Labels:
- lnet
- multi-rail

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

System configurations which result in Lustre layer specifying the same MR peer using multiple NIDs cause an issue with primary NID locking logic: when "primary nid locking" feature is enabled, LNet creates separate peer records, each record containing one NID of the MR peer as "locked primary". After the discovery completes in the background, these records are not being merged. This results in incorrect peer representation. Here's an example:

server:

# lnetctl net show
net:
    - net type: lo
      local NI(s):
        - nid: 0@lo
          status: up
    - net type: tcp
      local NI(s):
        - nid: 192.168.122.50@tcp
          status: up
          interfaces:
              0: eth0
        - nid: 192.168.122.134@tcp
          status: up
          interfaces:
              0: ens12

client:

# mount -t lustre 192.168.122.134@tcp:192.168.122.50@tcp:/lustrewt /mnt/lustrefs
# lnetctl peer show 
peer:
    - primary nid: 192.168.122.134@tcp
      Multi-Rail: True
      peer ni:
        - nid: 192.168.122.134@tcp
          state: NA
    - primary nid: 192.168.122.50@tcp
      Multi-Rail: True
      peer ni:
        - nid: 192.168.122.50@tcp
          state: NA

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

lctl-llog_print.txt
05/Apr/23 7:54 AM
20 kB
Shuichi Ihara

Activity

People

Assignee:: Serguei Smirnov

Reporter:: Serguei Smirnov

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 04/Apr/23 5:26 PM

Updated:: 22/Jun/24 2:52 PM

Resolved:: 28/Jun/23 10:49 PM