Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8943

Enable Multiple IB/OPA Endpoints Between Nodes

Details

    • Improvement
    • Resolution: Fixed
    • Critical
    • Lustre 2.10.0
    • None
    • 9223372036854775807

    Description

      OPA driver optimizations are based on the MPI model where it is expected to have multiple endpoints between two given nodes. To enable this optimization for Lustre, we need to make it possible, via an LND-specific tuneable, to create multiple endpoints and to balance the traffic over them.

      I have already created an experimental patch to test this theory out. I was able to push OPA performance to 12.4GB/s by just having 2 QPs between the nodes and round robin messages between them.

      This Jira ticket is for productizing my patch and testing it out thoroughly for OPA and IB. Test results will be posted to this ticket.

      Attachments

        Issue Links

          Activity

            [LU-8943] Enable Multiple IB/OPA Endpoints Between Nodes
            pjones Peter Jones added a comment -

            Would it be a good idea to track all this under a new ticket instead of tacking onto an already closed one?

            pjones Peter Jones added a comment - Would it be a good idea to track all this under a new ticket instead of tacking onto an already closed one?

            Cliff is seeing this same problem on the soak cluster but there is no OPA, only MLX IB. I'm beginning to wonder if this is a problem with the Mutli-Rail drop rather than this change.

            doug Doug Oucharek (Inactive) added a comment - Cliff is seeing this same problem on the soak cluster but there is no OPA, only MLX IB. I'm beginning to wonder if this is a problem with the Mutli-Rail drop rather than this change.

            From server:

            # lctl                                                                          
            lctl > network o2ib
            lctl > conn_list
            192.168.213.125@o2ib mtu -1
            192.168.213.125@o2ib mtu -1
            192.168.213.125@o2ib mtu -1
            192.168.213.125@o2ib mtu -1
            ...
            
            dmiter Dmitry Eremin (Inactive) added a comment - From server: # lctl lctl > network o2ib lctl > conn_list 192.168.213.125@o2ib mtu -1 192.168.213.125@o2ib mtu -1 192.168.213.125@o2ib mtu -1 192.168.213.125@o2ib mtu -1 ...

            192.168.213.125@o2ib - client

            dmiter Dmitry Eremin (Inactive) added a comment - 192.168.213.125@o2ib - client
            # lctl
            lctl > network o2ib
            lctl > conn_list
            192.168.213.125@o2ib mtu -1
            192.168.213.125@o2ib mtu -1
            192.168.213.125@o2ib mtu -1
            192.168.213.125@o2ib mtu -1
            192.168.213.125@o2ib mtu -1
            192.168.213.125@o2ib mtu -1
            192.168.213.125@o2ib mtu -1
            192.168.213.125@o2ib mtu -1
            192.168.213.231@o2ib mtu -1
            192.168.213.231@o2ib mtu -1
            192.168.213.231@o2ib mtu -1
            192.168.213.231@o2ib mtu -1
            192.168.213.232@o2ib mtu -1
            192.168.213.232@o2ib mtu -1
            192.168.213.232@o2ib mtu -1
            192.168.213.232@o2ib mtu -1
            192.168.213.233@o2ib mtu -1
            192.168.213.233@o2ib mtu -1
            192.168.213.233@o2ib mtu -1
            192.168.213.233@o2ib mtu -1
            192.168.213.234@o2ib mtu -1
            192.168.213.234@o2ib mtu -1
            192.168.213.234@o2ib mtu -1
            192.168.213.234@o2ib mtu -1
            192.168.213.235@o2ib mtu -1
            192.168.213.235@o2ib mtu -1
            192.168.213.235@o2ib mtu -1
            192.168.213.235@o2ib mtu -1
            192.168.213.236@o2ib mtu -1
            192.168.213.236@o2ib mtu -1
            192.168.213.236@o2ib mtu -1
            192.168.213.236@o2ib mtu -1
            
            # lnetctl lnet unconfigure --all                                                
            unconfigure:
                - lnet:
                      errno: -16
                      descr: "LNet unconfigure error: Device or resource busy"
            
            
            
            

            Client: 2.9.57_48_g0386263
            Servers:
            lustre: 2.7.19.10
            kernel: patchless_client
            build: 2.7.19.10--PRISTINE-3.10.0-514.10.2.el7_lustre.x86_64

            dmiter Dmitry Eremin (Inactive) added a comment - # lctl lctl > network o2ib lctl > conn_list 192.168.213.125@o2ib mtu -1 192.168.213.125@o2ib mtu -1 192.168.213.125@o2ib mtu -1 192.168.213.125@o2ib mtu -1 192.168.213.125@o2ib mtu -1 192.168.213.125@o2ib mtu -1 192.168.213.125@o2ib mtu -1 192.168.213.125@o2ib mtu -1 192.168.213.231@o2ib mtu -1 192.168.213.231@o2ib mtu -1 192.168.213.231@o2ib mtu -1 192.168.213.231@o2ib mtu -1 192.168.213.232@o2ib mtu -1 192.168.213.232@o2ib mtu -1 192.168.213.232@o2ib mtu -1 192.168.213.232@o2ib mtu -1 192.168.213.233@o2ib mtu -1 192.168.213.233@o2ib mtu -1 192.168.213.233@o2ib mtu -1 192.168.213.233@o2ib mtu -1 192.168.213.234@o2ib mtu -1 192.168.213.234@o2ib mtu -1 192.168.213.234@o2ib mtu -1 192.168.213.234@o2ib mtu -1 192.168.213.235@o2ib mtu -1 192.168.213.235@o2ib mtu -1 192.168.213.235@o2ib mtu -1 192.168.213.235@o2ib mtu -1 192.168.213.236@o2ib mtu -1 192.168.213.236@o2ib mtu -1 192.168.213.236@o2ib mtu -1 192.168.213.236@o2ib mtu -1 # lnetctl lnet unconfigure --all unconfigure: - lnet: errno: -16 descr: "LNet unconfigure error: Device or resource busy" Client: 2.9.57_48_g0386263 Servers: lustre: 2.7.19.10 kernel: patchless_client build: 2.7.19.10--PRISTINE-3.10.0-514.10.2.el7_lustre.x86_64

            Dmitry, when you get the file system mounted, can you issue the following sequence on both nodes to ensure we are creating 4 connections on each:

             

            lctl
            > network o2ib
            > conn_list
            

            You should see 4 connections to the peer if the initiator (usually the client) has the MultiQP patch, and 1 connection to the peer if it doesn't.

            doug Doug Oucharek (Inactive) added a comment - Dmitry, when you get the file system mounted, can you issue the following sequence on both nodes to ensure we are creating 4 connections on each:   lctl > network o2ib > conn_list You should see 4 connections to the peer if the initiator (usually the client) has the MultiQP patch, and 1 connection to the peer if it doesn't.

            I just tried to reproduce with the passive node being unpatched.  Was not able to reproduce your issue.  The "lctl network down" takes a long time, but does succeed.  There must be something else here.  Do you know if your parameters like map_on_demand are different?  Is a reconnection happening to renegotiate the parameters?  This is something I have not tried.

            doug Doug Oucharek (Inactive) added a comment - I just tried to reproduce with the passive node being unpatched.  Was not able to reproduce your issue.  The "lctl network down" takes a long time, but does succeed.  There must be something else here.  Do you know if your parameters like map_on_demand are different?  Is a reconnection happening to renegotiate the parameters?  This is something I have not tried.

            That might be the reason.  The client will create multiple connections, but the server will only have one they are all talking to.  When one connection on the client is closed, the connection on the server will be closed.  I suspect the remaining connections on the client can't be closed.  I'll have to look at the code to see what I can do in this situation.

            I suspect if the server has the patch, you would not have a problem.

            doug Doug Oucharek (Inactive) added a comment - That might be the reason.  The client will create multiple connections, but the server will only have one they are all talking to.  When one connection on the client is closed, the connection on the server will be closed.  I suspect the remaining connections on the client can't be closed.  I'll have to look at the code to see what I can do in this situation. I suspect if the server has the patch, you would not have a problem.

            I'm using new Lustre client with this patch and old Lustre servers without this patch. So, I just mount lustre FS then use it and then try to unload after umount. I don't use DLC. I have CentOS 7.3 in both sites.

             

            dmiter Dmitry Eremin (Inactive) added a comment - I'm using new Lustre client with this patch and old Lustre servers without this patch. So, I just mount lustre FS then use it and then try to unload after umount. I don't use DLC. I have CentOS 7.3 in both sites.  

            When I created the performance spreadsheet, I needed to keep changing conns_per_peer.  I had no problems taking down and brining up LNet using these commands:

            Up:

            modprobe lnet
            lctl network configure
            modprobe lnet-selftest
            
            
            

            Down:

            rmmod lnet-selftest
            lctl network down
            rmmod ko2iblnd
            rmmod lnet
            
            
            

            There must be something different about what you are doing which is triggering ref counters to not be reduced. Are you using DLC? What is your environment?  Are both nodes running the latest code with this patch?

            doug Doug Oucharek (Inactive) added a comment - When I created the performance spreadsheet, I needed to keep changing conns_per_peer.  I had no problems taking down and brining up LNet using these commands: Up: modprobe lnet lctl network configure modprobe lnet-selftest Down: rmmod lnet-selftest lctl network down rmmod ko2iblnd rmmod lnet There must be something different about what you are doing which is triggering ref counters to not be reduced. Are you using DLC? What is your environment?  Are both nodes running the latest code with this patch?

            No, as I mentoined before only reboot helps.

            # lustre_rmmod                                                                  
            rmmod: ERROR: Module ko2iblnd is in use
            
            # lsmod|less                                                                    
            Module                  Size  Used by
            ko2iblnd              233790  1 
            ptlrpc               1343928  0 
            obdclass             1744518  1 ptlrpc
            lnet                  483843  3 ko2iblnd,obdclass,ptlrpc
            libcfs                416336  4 lnet,ko2iblnd,obdclass,ptlrpc
            [...]
            
            # lctl network down                                                             
            LNET busy
            
            lnetctl > lnet unconfigure
            unconfigure:
                - lnet:
                      errno: -16
                      descr: "LNet unconfigure error: Device or resource busy"
            lnetctl > lnet unconfigure --all
            unconfigure:
                - lnet:
                      errno: -16
                      descr: "LNet unconfigure error: Device or resource busy"
            
            # lustre_rmmod                                                                  
            rmmod: ERROR: Module ko2iblnd is in use
            
            
            
            
            dmiter Dmitry Eremin (Inactive) added a comment - No, as I mentoined before only reboot helps. # lustre_rmmod rmmod: ERROR: Module ko2iblnd is in use # lsmod|less Module Size Used by ko2iblnd 233790 1 ptlrpc 1343928 0 obdclass 1744518 1 ptlrpc lnet 483843 3 ko2iblnd,obdclass,ptlrpc libcfs 416336 4 lnet,ko2iblnd,obdclass,ptlrpc [...] # lctl network down LNET busy lnetctl > lnet unconfigure unconfigure: - lnet: errno: -16 descr: "LNet unconfigure error: Device or resource busy" lnetctl > lnet unconfigure --all unconfigure: - lnet: errno: -16 descr: "LNet unconfigure error: Device or resource busy" # lustre_rmmod rmmod: ERROR: Module ko2iblnd is in use

            People

              doug Doug Oucharek (Inactive)
              doug Doug Oucharek (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              21 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: