Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10959

Multirail remote peer issue

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • Major
    • None
    • Lustre 2.10.3
    • None
    • 3
    • 9223372036854775807

    Description

      We are facing an issue with multirail remote peer declaration with Lustre 2.10.3.

      Here is the description of the test cluster:
      MGS:
      eth0 10.128.11.155

      MDS:
      eth0 10.128.11.156
      eth0:0 10.128.12.156

      OSS:
      eth0 10.128.11.157
      eth0:0 10.128.12.157

      Router:
      eth0 10.128.11.158
      eth0:0 10.128.12.158

      Client:
      eth0 10.128.11.159
      eth0:0 10.128.12.159

      Here is how I configure LNet on the different nodes:
      On MGS node:

      # modprobe lnet ; lnetctl lnet configure
      # lnetctl net add --net tcp --if eth0
      # lnetctl peer add --nid 10.128.11.156@tcp,10.128.12.156@tcp
      # lnetctl peer add --nid 10.128.11.157@tcp,10.128.12.157@tcp
      # lnetctl peer add --nid 10.128.11.158@tcp,10.128.12.158@tcp
      # lnetctl route add --net tcp1 --gateway 10.128.11.158@tcp0
      

      On MDS node:

      # modprobe lnet ; lnetctl lnet configure
      # lnetctl net add --net tcp --if eth0,eth0:0
      # lnetctl peer add --nid 10.128.11.157@tcp,10.128.12.157@tcp
      # lnetctl peer add --nid 10.128.11.158@tcp,10.128.12.158@tcp
      # lnetctl route add --net tcp1 --gateway 10.128.11.158@tcp0
      

      On OSS node:

      # modprobe lnet ; lnetctl lnet configure
      # lnetctl net add --net tcp --if eth0,eth0:0
      # lnetctl peer add --nid 10.128.11.156@tcp,10.128.12.157@tcp
      # lnetctl peer add --nid 10.128.11.158@tcp,10.128.12.158@tcp
      # lnetctl route add --net tcp1 --gateway 10.128.11.158@tcp0
      

      On Router node:

      # modprobe lnet ; lnetctl lnet configure
      # lnetctl net add --net tcp --if eth0,eth0:0
      # lnetctl net add --net tcp1 --if eth0,eth0:0
      # lnetctl peer add --nid 10.128.11.156@tcp,10.128.12.156@tcp
      # lnetctl peer add --nid 10.128.11.157@tcp,10.128.12.157@tcp
      # lnetctl peer add --nid 10.128.11.159@tcp1,10.128.12.159@tcp1
      # lnetctl set routing 1
      

      On client node:

      # modprobe lnet ; lnetctl lnet configure
      # lnetctl net add --net tcp1 --if eth0,eth0:0
      # lnetctl peer add --nid 10.128.11.158@tcp1,10.128.12.158@tcp1
      # lnetctl route add --net tcp0 --gateway 10.128.11.158@tcp1
      

      After that, I can ‘lctl ping’ any node from any node on any NID, which is good. Then I mount the Lustre targets as I would usually do. But then, unfortunately I cannot mount Lustre from the client, it times out.

      # mount -t lustre -o user_xattr 10.128.11.155@tcp:/lustre /mnt/lustre
      ^C
      

      So, on the client I tried to declare as remote peers the servers that are on the other side of the router:

      # lnetctl peer add --nid 10.128.11.156@tcp,10.128.12.156@tcp
      # lnetctl peer add --nid 10.128.11.157@tcp,10.128.12.157@tcp
      

      And now the mount works:

      # mount -t lustre -o user_xattr 10.128.11.155@tcp:/lustre /mnt/lustre
      # lfs df -h
      UUID                       bytes        Used   Available Use% Mounted on
      lustre-MDT0000_UUID           93.2M        1.6M       83.1M   2% /mnt/lustre[MDT:0]
      lustre-MDT0001_UUID           93.2M        1.6M       83.1M   2% /mnt/lustre[MDT:1]
      lustre-OST0000_UUID           14.0G       41.3M       13.3G   0% /mnt/lustre[OST:0]
      
      filesystem_summary:        14.0G       41.3M       13.3G   0% /mnt/lustre
      

      So it seems that without explicitly declaring the multirail capable nodes that are on the other side of the router, it would not work. But maybe I am doing something wrong in this configuration?

      Thanks,
      Sebastien.

      Attachments

        Activity

          People

            sharmaso Sonia Sharma (Inactive)
            sbuisson Sebastien Buisson (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: