[LU-10959] Multirail remote peer issue Created: 26/Apr/18 Updated: 12/Aug/22 Resolved: 12/Aug/22 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.10.3 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Sebastien Buisson (Inactive) | Assignee: | Sonia Sharma (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
We are facing an issue with multirail remote peer declaration with Lustre 2.10.3. Here is the description of the test cluster: MDS: OSS: Router: Client: Here is how I configure LNet on the different nodes: # modprobe lnet ; lnetctl lnet configure # lnetctl net add --net tcp --if eth0 # lnetctl peer add --nid 10.128.11.156@tcp,10.128.12.156@tcp # lnetctl peer add --nid 10.128.11.157@tcp,10.128.12.157@tcp # lnetctl peer add --nid 10.128.11.158@tcp,10.128.12.158@tcp # lnetctl route add --net tcp1 --gateway 10.128.11.158@tcp0 On MDS node: # modprobe lnet ; lnetctl lnet configure # lnetctl net add --net tcp --if eth0,eth0:0 # lnetctl peer add --nid 10.128.11.157@tcp,10.128.12.157@tcp # lnetctl peer add --nid 10.128.11.158@tcp,10.128.12.158@tcp # lnetctl route add --net tcp1 --gateway 10.128.11.158@tcp0 On OSS node: # modprobe lnet ; lnetctl lnet configure # lnetctl net add --net tcp --if eth0,eth0:0 # lnetctl peer add --nid 10.128.11.156@tcp,10.128.12.157@tcp # lnetctl peer add --nid 10.128.11.158@tcp,10.128.12.158@tcp # lnetctl route add --net tcp1 --gateway 10.128.11.158@tcp0 On Router node: # modprobe lnet ; lnetctl lnet configure # lnetctl net add --net tcp --if eth0,eth0:0 # lnetctl net add --net tcp1 --if eth0,eth0:0 # lnetctl peer add --nid 10.128.11.156@tcp,10.128.12.156@tcp # lnetctl peer add --nid 10.128.11.157@tcp,10.128.12.157@tcp # lnetctl peer add --nid 10.128.11.159@tcp1,10.128.12.159@tcp1 # lnetctl set routing 1 On client node: # modprobe lnet ; lnetctl lnet configure # lnetctl net add --net tcp1 --if eth0,eth0:0 # lnetctl peer add --nid 10.128.11.158@tcp1,10.128.12.158@tcp1 # lnetctl route add --net tcp0 --gateway 10.128.11.158@tcp1 After that, I can ‘lctl ping’ any node from any node on any NID, which is good. Then I mount the Lustre targets as I would usually do. But then, unfortunately I cannot mount Lustre from the client, it times out. # mount -t lustre -o user_xattr 10.128.11.155@tcp:/lustre /mnt/lustre ^C So, on the client I tried to declare as remote peers the servers that are on the other side of the router: # lnetctl peer add --nid 10.128.11.156@tcp,10.128.12.156@tcp # lnetctl peer add --nid 10.128.11.157@tcp,10.128.12.157@tcp And now the mount works: # mount -t lustre -o user_xattr 10.128.11.155@tcp:/lustre /mnt/lustre # lfs df -h UUID bytes Used Available Use% Mounted on lustre-MDT0000_UUID 93.2M 1.6M 83.1M 2% /mnt/lustre[MDT:0] lustre-MDT0001_UUID 93.2M 1.6M 83.1M 2% /mnt/lustre[MDT:1] lustre-OST0000_UUID 14.0G 41.3M 13.3G 0% /mnt/lustre[OST:0] filesystem_summary: 14.0G 41.3M 13.3G 0% /mnt/lustre So it seems that without explicitly declaring the multirail capable nodes that are on the other side of the router, it would not work. But maybe I am doing something wrong in this configuration? Thanks, |
| Comments |
| Comment by Peter Jones [ 26/Apr/18 ] |
|
Sonia Could you please advise? Thanks Peter |
| Comment by Amir Shehata (Inactive) [ 12/May/18 ] |
|
https://wiki.whamcloud.com/display/LNet/Multi-Rail+Selection+Algorithm+Router+Modifications |