[LU-10959] Multirail remote peer issue Created: 26/Apr/18  Updated: 12/Aug/22  Resolved: 12/Aug/22

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.3
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Sebastien Buisson (Inactive) Assignee: Sonia Sharma (Inactive)
Resolution: Done Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

We are facing an issue with multirail remote peer declaration with Lustre 2.10.3.

Here is the description of the test cluster:
MGS:
eth0 10.128.11.155

MDS:
eth0 10.128.11.156
eth0:0 10.128.12.156

OSS:
eth0 10.128.11.157
eth0:0 10.128.12.157

Router:
eth0 10.128.11.158
eth0:0 10.128.12.158

Client:
eth0 10.128.11.159
eth0:0 10.128.12.159

Here is how I configure LNet on the different nodes:
On MGS node:

# modprobe lnet ; lnetctl lnet configure
# lnetctl net add --net tcp --if eth0
# lnetctl peer add --nid 10.128.11.156@tcp,10.128.12.156@tcp
# lnetctl peer add --nid 10.128.11.157@tcp,10.128.12.157@tcp
# lnetctl peer add --nid 10.128.11.158@tcp,10.128.12.158@tcp
# lnetctl route add --net tcp1 --gateway 10.128.11.158@tcp0

On MDS node:

# modprobe lnet ; lnetctl lnet configure
# lnetctl net add --net tcp --if eth0,eth0:0
# lnetctl peer add --nid 10.128.11.157@tcp,10.128.12.157@tcp
# lnetctl peer add --nid 10.128.11.158@tcp,10.128.12.158@tcp
# lnetctl route add --net tcp1 --gateway 10.128.11.158@tcp0

On OSS node:

# modprobe lnet ; lnetctl lnet configure
# lnetctl net add --net tcp --if eth0,eth0:0
# lnetctl peer add --nid 10.128.11.156@tcp,10.128.12.157@tcp
# lnetctl peer add --nid 10.128.11.158@tcp,10.128.12.158@tcp
# lnetctl route add --net tcp1 --gateway 10.128.11.158@tcp0

On Router node:

# modprobe lnet ; lnetctl lnet configure
# lnetctl net add --net tcp --if eth0,eth0:0
# lnetctl net add --net tcp1 --if eth0,eth0:0
# lnetctl peer add --nid 10.128.11.156@tcp,10.128.12.156@tcp
# lnetctl peer add --nid 10.128.11.157@tcp,10.128.12.157@tcp
# lnetctl peer add --nid 10.128.11.159@tcp1,10.128.12.159@tcp1
# lnetctl set routing 1

On client node:

# modprobe lnet ; lnetctl lnet configure
# lnetctl net add --net tcp1 --if eth0,eth0:0
# lnetctl peer add --nid 10.128.11.158@tcp1,10.128.12.158@tcp1
# lnetctl route add --net tcp0 --gateway 10.128.11.158@tcp1

After that, I can ‘lctl ping’ any node from any node on any NID, which is good. Then I mount the Lustre targets as I would usually do. But then, unfortunately I cannot mount Lustre from the client, it times out.

# mount -t lustre -o user_xattr 10.128.11.155@tcp:/lustre /mnt/lustre
^C

So, on the client I tried to declare as remote peers the servers that are on the other side of the router:

# lnetctl peer add --nid 10.128.11.156@tcp,10.128.12.156@tcp
# lnetctl peer add --nid 10.128.11.157@tcp,10.128.12.157@tcp

And now the mount works:

# mount -t lustre -o user_xattr 10.128.11.155@tcp:/lustre /mnt/lustre
# lfs df -h
UUID                       bytes        Used   Available Use% Mounted on
lustre-MDT0000_UUID           93.2M        1.6M       83.1M   2% /mnt/lustre[MDT:0]
lustre-MDT0001_UUID           93.2M        1.6M       83.1M   2% /mnt/lustre[MDT:1]
lustre-OST0000_UUID           14.0G       41.3M       13.3G   0% /mnt/lustre[OST:0]

filesystem_summary:        14.0G       41.3M       13.3G   0% /mnt/lustre

So it seems that without explicitly declaring the multirail capable nodes that are on the other side of the router, it would not work. But maybe I am doing something wrong in this configuration?

Thanks,
Sebastien.



 Comments   
Comment by Peter Jones [ 26/Apr/18 ]

Sonia

Could you please advise?

Thanks

Peter

Comment by Amir Shehata (Inactive) [ 12/May/18 ]

https://wiki.whamcloud.com/display/LNet/Multi-Rail+Selection+Algorithm+Router+Modifications

Generated at Sat Feb 10 02:39:42 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.