Details
-
Bug
-
Resolution: Done
-
Major
-
None
-
Lustre 2.10.3
-
None
-
3
-
9223372036854775807
Description
We are facing an issue with multirail remote peer declaration with Lustre 2.10.3.
Here is the description of the test cluster:
MGS:
eth0 10.128.11.155
MDS:
eth0 10.128.11.156
eth0:0 10.128.12.156
OSS:
eth0 10.128.11.157
eth0:0 10.128.12.157
Router:
eth0 10.128.11.158
eth0:0 10.128.12.158
Client:
eth0 10.128.11.159
eth0:0 10.128.12.159
Here is how I configure LNet on the different nodes:
On MGS node:
# modprobe lnet ; lnetctl lnet configure # lnetctl net add --net tcp --if eth0 # lnetctl peer add --nid 10.128.11.156@tcp,10.128.12.156@tcp # lnetctl peer add --nid 10.128.11.157@tcp,10.128.12.157@tcp # lnetctl peer add --nid 10.128.11.158@tcp,10.128.12.158@tcp # lnetctl route add --net tcp1 --gateway 10.128.11.158@tcp0
On MDS node:
# modprobe lnet ; lnetctl lnet configure # lnetctl net add --net tcp --if eth0,eth0:0 # lnetctl peer add --nid 10.128.11.157@tcp,10.128.12.157@tcp # lnetctl peer add --nid 10.128.11.158@tcp,10.128.12.158@tcp # lnetctl route add --net tcp1 --gateway 10.128.11.158@tcp0
On OSS node:
# modprobe lnet ; lnetctl lnet configure # lnetctl net add --net tcp --if eth0,eth0:0 # lnetctl peer add --nid 10.128.11.156@tcp,10.128.12.157@tcp # lnetctl peer add --nid 10.128.11.158@tcp,10.128.12.158@tcp # lnetctl route add --net tcp1 --gateway 10.128.11.158@tcp0
On Router node:
# modprobe lnet ; lnetctl lnet configure # lnetctl net add --net tcp --if eth0,eth0:0 # lnetctl net add --net tcp1 --if eth0,eth0:0 # lnetctl peer add --nid 10.128.11.156@tcp,10.128.12.156@tcp # lnetctl peer add --nid 10.128.11.157@tcp,10.128.12.157@tcp # lnetctl peer add --nid 10.128.11.159@tcp1,10.128.12.159@tcp1 # lnetctl set routing 1
On client node:
# modprobe lnet ; lnetctl lnet configure # lnetctl net add --net tcp1 --if eth0,eth0:0 # lnetctl peer add --nid 10.128.11.158@tcp1,10.128.12.158@tcp1 # lnetctl route add --net tcp0 --gateway 10.128.11.158@tcp1
After that, I can ‘lctl ping’ any node from any node on any NID, which is good. Then I mount the Lustre targets as I would usually do. But then, unfortunately I cannot mount Lustre from the client, it times out.
# mount -t lustre -o user_xattr 10.128.11.155@tcp:/lustre /mnt/lustre ^C
So, on the client I tried to declare as remote peers the servers that are on the other side of the router:
# lnetctl peer add --nid 10.128.11.156@tcp,10.128.12.156@tcp # lnetctl peer add --nid 10.128.11.157@tcp,10.128.12.157@tcp
And now the mount works:
# mount -t lustre -o user_xattr 10.128.11.155@tcp:/lustre /mnt/lustre # lfs df -h UUID bytes Used Available Use% Mounted on lustre-MDT0000_UUID 93.2M 1.6M 83.1M 2% /mnt/lustre[MDT:0] lustre-MDT0001_UUID 93.2M 1.6M 83.1M 2% /mnt/lustre[MDT:1] lustre-OST0000_UUID 14.0G 41.3M 13.3G 0% /mnt/lustre[OST:0] filesystem_summary: 14.0G 41.3M 13.3G 0% /mnt/lustre
So it seems that without explicitly declaring the multirail capable nodes that are on the other side of the router, it would not work. But maybe I am doing something wrong in this configuration?
Thanks,
Sebastien.