Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14608

Adding second network to filesystem

Details

    • Question/Request
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.12.5
    • None
    • 9223372036854775807

    Description

      We have filesystem using ko2ib NIDs. We want to add the option for tcp NIDs. If we add the tcp NID dynamic to the servers, do we need to add the new nids via tunefs.lustre to the msg config.? Can we add the second nids to the MGS configs with taking the filesystem down?

      Attachments

        Issue Links

          Activity

            [LU-14608] Adding second network to filesystem

            is r803i8n1 a new client? If so, the method I described above wouldn't work. I got the impression that you wanted existing clients which have the FS mounted to use the tcp interface. The method I described would work in that case. But for a brand new client it wouldn't. There is a patch in the pipeline which is meant to address this use case. Basically, adding interfaces dynamically and then allowing new clients to use them for mounting.

            https://review.whamcloud.com/#/c/40736/

            ashehata Amir Shehata (Inactive) added a comment - is r803i8n1 a new client? If so, the method I described above wouldn't work. I got the impression that you wanted existing clients which have the FS mounted to use the tcp interface. The method I described would work in that case. But for a brand new client it wouldn't. There is a patch in the pipeline which is meant to address this use case. Basically, adding interfaces dynamically and then allowing new clients to use them for mounting. https://review.whamcloud.com/#/c/40736/

            Here I add the tcp nid to the servers.

            nbp1-mds ~ # lctl list_nids
            10.151.26.117@o2ib
            10.151.26.117@tcp
            
            nbp1-mds ~ # lnetctl global show
            global:
                numa_range: 0
                max_intf: 200
                discovery: 1
                drop_asym_route: 0
                retry_count: 0
                transaction_timeout: 200
                health_sensitivity: 0
                recovery_interval: 1
            
            

            and on the client

            r803i8n1 ~ # lctl list_nids
            10.151.40.131@tcp
            
            r803i8n1 ~ # lctl ping 10.151.26.117@tcp
            12345-0@lo
            12345-10.151.26.117@o2ib
            12345-10.151.26.117@tcp
            
            r803i8n1 ~ # lnetctl global show
            global:
                numa_range: 0
                max_intf: 200
                discovery: 1
                drop_asym_route: 0
                retry_count: 0
                transaction_timeout: 200
                health_sensitivity: 0
                recovery_interval: 1
            
            

            Mounting fails

            r803i8n1 ~ # mount -t lustre 10.151.26.117@tcp:/nbp1 /nobackupp1
            mount.lustre: mount 10.151.26.117@tcp:/nbp1 at /nobackupp1 failed: No such file or directory
            Is the MGS specification correct?
            Is the filesystem name correct?
            If upgrading, is the copied client log valid? (see upgrade docs)
            
            mhanafi Mahmoud Hanafi added a comment - Here I add the tcp nid to the servers. nbp1-mds ~ # lctl list_nids 10.151.26.117@o2ib 10.151.26.117@tcp nbp1-mds ~ # lnetctl global show global:     numa_range: 0     max_intf: 200     discovery: 1     drop_asym_route: 0     retry_count: 0     transaction_timeout: 200     health_sensitivity: 0     recovery_interval: 1 and on the client r803i8n1 ~ # lctl list_nids 10.151.40.131@tcp r803i8n1 ~ # lctl ping 10.151.26.117@tcp 12345-0@lo 12345-10.151.26.117@o2ib 12345-10.151.26.117@tcp r803i8n1 ~ # lnetctl global show global:     numa_range: 0     max_intf: 200     discovery: 1     drop_asym_route: 0     retry_count: 0     transaction_timeout: 200     health_sensitivity: 0     recovery_interval: 1 Mounting fails r803i8n1 ~ # mount -t lustre 10.151.26.117@tcp:/nbp1 /nobackupp1 mount.lustre: mount 10.151.26.117@tcp:/nbp1 at /nobackupp1 failed: No such file or directory Is the MGS specification correct? Is the filesystem name correct? If upgrading, is the copied client log valid? (see upgrade docs)

            You should be able to dynamically add the NIDs to both the clients and the servers.
            Adding the NIDs will trigger a discovery round (if you have discovery on).
            After discovery is complete, the clients and the servers will know they can reach the other via the tcp NIDs as well.
            These tcp NIDs should start being used for FS requests, without having to add them explicitly via tunefs. This will happen due to MR capability.
            When selecting the NIDs, LNet looks at the credits available. Normally because o2ib is a faster interconnect, the number of credits available on the o2ib interfaces will be higher than the tcp and o2iblnd will be used more often.

            ashehata Amir Shehata (Inactive) added a comment - You should be able to dynamically add the NIDs to both the clients and the servers. Adding the NIDs will trigger a discovery round (if you have discovery on). After discovery is complete, the clients and the servers will know they can reach the other via the tcp NIDs as well. These tcp NIDs should start being used for FS requests, without having to add them explicitly via tunefs. This will happen due to MR capability. When selecting the NIDs, LNet looks at the credits available. Normally because o2ib is a faster interconnect, the number of credits available on the o2ib interfaces will be higher than the tcp and o2iblnd will be used more often.

            Normally changing nids/adding nids requires umonting target and using writeconf to make the change. We just want to test the new nids and don't want to take down/umount the servers.

            mhanafi Mahmoud Hanafi added a comment - Normally changing nids/adding nids requires umonting target and using writeconf to make the change. We just want to test the new nids and don't want to take down/umount the servers.

            mhanafi do you mean to ask if NIDs can be added without taking the filesystem down?

            adilger Andreas Dilger added a comment - mhanafi do you mean to ask if NIDs can be added without taking the filesystem down?

            People

              ashehata Amir Shehata (Inactive)
              mhanafi Mahmoud Hanafi
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: