[LU-8943] Enable Multiple IB/OPA Endpoints Between Nodes - Whamcloud Community JIRA

Details

Type: Improvement
Resolution: Fixed
Priority: Critical
Fix Version/s: Lustre 2.10.0
Affects Version/s: None
Labels:
- lnet

Rank (Obsolete):
9223372036854775807

Description

OPA driver optimizations are based on the MPI model where it is expected to have multiple endpoints between two given nodes. To enable this optimization for Lustre, we need to make it possible, via an LND-specific tuneable, to create multiple endpoints and to balance the traffic over them.

I have already created an experimental patch to test this theory out. I was able to push OPA performance to 12.4GB/s by just having 2 QPs between the nodes and round robin messages between them.

This Jira ticket is for productizing my patch and testing it out thoroughly for OPA and IB. Test results will be posted to this ticket.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

MultiQP-Tests.xlsx
87 kB
19/Apr/17 11:41 PM

Issue Links

has to be finished together with

LUDOC-374 Add notes about conns_per_peer ko2iblnd parameter

Resolved

Activity

[LU-8943] Enable Multiple IB/OPA Endpoints Between Nodes

Doug Oucharek (Inactive) added a comment - 16/May/17 9:25 PM

Dmitry, when you get the file system mounted, can you issue the following sequence on both nodes to ensure we are creating 4 connections on each:

lctl
> network o2ib
> conn_list

You should see 4 connections to the peer if the initiator (usually the client) has the MultiQP patch, and 1 connection to the peer if it doesn't.

Doug Oucharek (Inactive) added a comment - 16/May/17 9:25 PM Dmitry, when you get the file system mounted, can you issue the following sequence on both nodes to ensure we are creating 4 connections on each: lctl > network o2ib > conn_list You should see 4 connections to the peer if the initiator (usually the client) has the MultiQP patch, and 1 connection to the peer if it doesn't.

Doug Oucharek (Inactive) added a comment - 16/May/17 1:05 AM

I just tried to reproduce with the passive node being unpatched. Was not able to reproduce your issue. The "lctl network down" takes a long time, but does succeed. There must be something else here. Do you know if your parameters like map_on_demand are different? Is a reconnection happening to renegotiate the parameters? This is something I have not tried.

Doug Oucharek (Inactive) added a comment - 16/May/17 1:05 AM I just tried to reproduce with the passive node being unpatched. Was not able to reproduce your issue. The "lctl network down" takes a long time, but does succeed. There must be something else here. Do you know if your parameters like map_on_demand are different? Is a reconnection happening to renegotiate the parameters? This is something I have not tried.

Doug Oucharek (Inactive) added a comment - 15/May/17 7:08 PM

That might be the reason. The client will create multiple connections, but the server will only have one they are all talking to. When one connection on the client is closed, the connection on the server will be closed. I suspect the remaining connections on the client can't be closed. I'll have to look at the code to see what I can do in this situation.

I suspect if the server has the patch, you would not have a problem.

Doug Oucharek (Inactive) added a comment - 15/May/17 7:08 PM That might be the reason. The client will create multiple connections, but the server will only have one they are all talking to. When one connection on the client is closed, the connection on the server will be closed. I suspect the remaining connections on the client can't be closed. I'll have to look at the code to see what I can do in this situation. I suspect if the server has the patch, you would not have a problem.

Dmitry Eremin (Inactive) added a comment - 15/May/17 7:04 PM

I'm using new Lustre client with this patch and old Lustre servers without this patch. So, I just mount lustre FS then use it and then try to unload after umount. I don't use DLC. I have CentOS 7.3 in both sites.

Dmitry Eremin (Inactive) added a comment - 15/May/17 7:04 PM I'm using new Lustre client with this patch and old Lustre servers without this patch. So, I just mount lustre FS then use it and then try to unload after umount. I don't use DLC. I have CentOS 7.3 in both sites.

Doug Oucharek (Inactive) added a comment - 15/May/17 6:47 PM

When I created the performance spreadsheet, I needed to keep changing conns_per_peer. I had no problems taking down and brining up LNet using these commands:

Up:

modprobe lnet
lctl network configure
modprobe lnet-selftest

Down:

rmmod lnet-selftest
lctl network down
rmmod ko2iblnd
rmmod lnet

There must be something different about what you are doing which is triggering ref counters to not be reduced. Are you using DLC? What is your environment? Are both nodes running the latest code with this patch?

Doug Oucharek (Inactive) added a comment - 15/May/17 6:47 PM When I created the performance spreadsheet, I needed to keep changing conns_per_peer. I had no problems taking down and brining up LNet using these commands: Up: modprobe lnet lctl network configure modprobe lnet-selftest Down: rmmod lnet-selftest lctl network down rmmod ko2iblnd rmmod lnet There must be something different about what you are doing which is triggering ref counters to not be reduced. Are you using DLC? What is your environment? Are both nodes running the latest code with this patch?

Dmitry Eremin (Inactive) added a comment - 14/May/17 6:59 AM

No, as I mentoined before only reboot helps.

# lustre_rmmod                                                                  
rmmod: ERROR: Module ko2iblnd is in use

# lsmod|less                                                                    
Module                  Size  Used by
ko2iblnd              233790  1 
ptlrpc               1343928  0 
obdclass             1744518  1 ptlrpc
lnet                  483843  3 ko2iblnd,obdclass,ptlrpc
libcfs                416336  4 lnet,ko2iblnd,obdclass,ptlrpc
[...]

# lctl network down                                                             
LNET busy

lnetctl > lnet unconfigure
unconfigure:
    - lnet:
          errno: -16
          descr: "LNet unconfigure error: Device or resource busy"
lnetctl > lnet unconfigure --all
unconfigure:
    - lnet:
          errno: -16
          descr: "LNet unconfigure error: Device or resource busy"

# lustre_rmmod                                                                  
rmmod: ERROR: Module ko2iblnd is in use

Dmitry Eremin (Inactive) added a comment - 14/May/17 6:59 AM No, as I mentoined before only reboot helps. # lustre_rmmod rmmod: ERROR: Module ko2iblnd is in use # lsmod|less Module Size Used by ko2iblnd 233790 1 ptlrpc 1343928 0 obdclass 1744518 1 ptlrpc lnet 483843 3 ko2iblnd,obdclass,ptlrpc libcfs 416336 4 lnet,ko2iblnd,obdclass,ptlrpc [...] # lctl network down LNET busy lnetctl > lnet unconfigure unconfigure: - lnet: errno: -16 descr: "LNet unconfigure error: Device or resource busy" lnetctl > lnet unconfigure --all unconfigure: - lnet: errno: -16 descr: "LNet unconfigure error: Device or resource busy" # lustre_rmmod rmmod: ERROR: Module ko2iblnd is in use

Andreas Dilger added a comment - 14/May/17 12:13 AM - edited

Does "lctl network down" or "lnetctl lnet unconfigure" help?

Andreas Dilger added a comment - 14/May/17 12:13 AM - edited Does " lctl network down " or " lnetctl lnet unconfigure " help?

Dmitry Eremin (Inactive) added a comment - 13/May/17 7:49 PM

I observed strange behavior. It looks after this commit I cannot unload ko2iblnd module. LNet is busy even all unmounted successfully. Only reboot helps.

Dmitry Eremin (Inactive) added a comment - 13/May/17 7:49 PM I observed strange behavior. It looks after this commit I cannot unload ko2iblnd module. LNet is busy even all unmounted successfully. Only reboot helps.

Peter Jones added a comment - 12/May/17 12:22 PM

Landed for 2.10

Peter Jones added a comment - 12/May/17 12:22 PM Landed for 2.10

Gerrit Updater added a comment - 12/May/17 5:06 AM

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/25168/
Subject: ~~LU-8943~~ lnd: Enable Multiple OPA Endpoints between Nodes
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 7241e68f37962991ef43a6c01b3a83ff67282d88

Gerrit Updater added a comment - 12/May/17 5:06 AM Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/25168/ Subject: LU-8943 lnd: Enable Multiple OPA Endpoints between Nodes Project: fs/lustre-release Branch: master Current Patch Set: Commit: 7241e68f37962991ef43a6c01b3a83ff67282d88

Doug Oucharek (Inactive) added a comment - 04/May/17 10:41 PM

I did update the OPA defaults to set conns_per_peer to 4 when OPA is detected. I'll also update the manual under ~~LUDOC-374~~.

I bumped the conns_per_peer to 4 from 3 because OPA team is going to start recommending a krcvqs default of 4 especially for a low number of cores (i.e. VMs). Having a conns_per_peer of 4 helps to compensate for the lower krcvqs number so we should work well out of the box whether krcvqs is 4 or 8.

Doug Oucharek (Inactive) added a comment - 04/May/17 10:41 PM I did update the OPA defaults to set conns_per_peer to 4 when OPA is detected. I'll also update the manual under LUDOC-374 . I bumped the conns_per_peer to 4 from 3 because OPA team is going to start recommending a krcvqs default of 4 especially for a low number of cores (i.e. VMs). Having a conns_per_peer of 4 helps to compensate for the lower krcvqs number so we should work well out of the box whether krcvqs is 4 or 8.

People

Assignee:: Doug Oucharek (Inactive)

Reporter:: Doug Oucharek (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 21 Start watching this issue

Dates

Created:: 15/Dec/16 7:02 PM

Updated:: 05/Dec/17 8:40 AM

Resolved:: 12/May/17 12:22 PM