[LU-7245] Improve SMP scaling support for LND drivers - Whamcloud Community JIRA

Details

Type: Improvement
Resolution: Fixed
Priority: Minor
Fix Version/s: Lustre 2.9.0
Affects Version/s: Lustre 2.9.0
Labels:
- patch
Environment:
Any TCP, infiniband or Gemini network system

Epic/Theme:
- lnet
Rank (Obsolete):
9223372036854775807

Description

While working on enhancing the lnetctl utility it was discovered that more SMP scaling improvements can be done to the currently supported LND driver.

Attachments

Issue Links

is related to

LU-14676 Better hash distribution to different CPTs when LNET router is exist

Resolved

is related to

LU-5960 Add ability to get peer and connection information from lnetctl

Open

LU-2544 Add SMP scaling to Cray Gemini driver

Closed

Activity

[LU-7245] Improve SMP scaling support for LND drivers

James A Simmons added a comment - 22/Apr/17 6:22 PM

Patches for this work already landed and the multi-rail work filled in the rest of the gaps.

James A Simmons added a comment - 22/Apr/17 6:22 PM Patches for this work already landed and the multi-rail work filled in the rest of the gaps.

James A Simmons added a comment - 28/Oct/15 2:32 PM

I suspect more work will coming from this ticket for 2.9.

James A Simmons added a comment - 28/Oct/15 2:32 PM I suspect more work will coming from this ticket for 2.9.

Gerrit Updater added a comment - 28/Oct/15 1:49 PM

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16710/
Subject: ~~LU-7245~~ socklnd: Bind peers to a specific CPT
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 68eb6e841f49d41a289bd0b3f559973b6cb31738

Gerrit Updater added a comment - 28/Oct/15 1:49 PM Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16710/ Subject: LU-7245 socklnd: Bind peers to a specific CPT Project: fs/lustre-release Branch: master Current Patch Set: Commit: 68eb6e841f49d41a289bd0b3f559973b6cb31738

James A Simmons added a comment - 08/Oct/15 5:24 PM

The patch for the Gemini SMP work under ~~LU-2544~~ reworked a lot of the data structures to deal with what you described. Never looked closely at the other LND drivers but I can see the problem there. Will require a lot of data structure reworking :-/

James A Simmons added a comment - 08/Oct/15 5:24 PM The patch for the Gemini SMP work under LU-2544 reworked a lot of the data structures to deal with what you described. Never looked closely at the other LND drivers but I can see the problem there. Will require a lot of data structure reworking :-/

Olaf Weber (Inactive) added a comment - 07/Oct/15 9:21 AM

James, since you asked, a few notes about NUMA.

Reasoning about NUMA is a lot like thinking about a cluster, except you spend a lot of time worrying about cache lines instead of files. (Much of the terminology is also similar, which becomes confusing when discussing NUMA issues for systems that are part of a cluster.) It is worth noting that NUMA considerations already apply once a system has more than one socket. Ideally, the process driving I/O, the memory involved, and the interface involved all live on the same socket.

In practice, the memory placement may have been done by some user space process outside our control, and the same goes for the process that initiates the I/O. Selection of an interface that is close in the topology of the system is useful, but that already assumes a multi-rail type configuration. Much of the time there will be no choice because there is only one interface. (LNet routers are an exception: there we do have full control over the location of all buffers and threads relative to the interfaces used.)

So the main concern becomes doing the best we can in areas we do control, in particular avoiding cache line bouncing. Placing a datastructure like ksock_peer in the same CPT as the interface helps a little bit here, but only a little. The layout of the ksock_peer structure is actually a good example of what not to do. Take a look at the first few members, which will likely all be in the same cache line:

typedef struct ksock_peer
{
        struct list_head        ksnp_list;      /* stash on global peer list */
        cfs_time_t            ksnp_last_alive;  /* when (in jiffies) I was last alive */
        lnet_process_id_t     ksnp_id;       /* who's on the other end(s) */
        atomic_t              ksnp_refcount; /* # users */
        int                   ksnp_sharecount;  /* lconf usage counter */

ksnp_list and ksnp_id are semi-constant, and read by any thread that does a lookup of some peer in the hash table (shared/read access). In contrast ksnp_refcount and ksnp_last_alive are updated by threads doing work for this particular peer (exclusive/write access). So a lookup of some unrelated peer causes a cache line bounce between the CPU doing the lookup and the CPU managing the I/O. This particular case can be mitigated by being very careful with the layout of a datastructure, and by making sure that threads that do modify the structure run on the same socket, even if that socket is not where the datastructure lives.

Olaf Weber (Inactive) added a comment - 07/Oct/15 9:21 AM James, since you asked, a few notes about NUMA. Reasoning about NUMA is a lot like thinking about a cluster, except you spend a lot of time worrying about cache lines instead of files. (Much of the terminology is also similar, which becomes confusing when discussing NUMA issues for systems that are part of a cluster.) It is worth noting that NUMA considerations already apply once a system has more than one socket. Ideally, the process driving I/O, the memory involved, and the interface involved all live on the same socket. In practice, the memory placement may have been done by some user space process outside our control, and the same goes for the process that initiates the I/O. Selection of an interface that is close in the topology of the system is useful, but that already assumes a multi-rail type configuration. Much of the time there will be no choice because there is only one interface. (LNet routers are an exception: there we do have full control over the location of all buffers and threads relative to the interfaces used.) So the main concern becomes doing the best we can in areas we do control, in particular avoiding cache line bouncing. Placing a datastructure like ksock_peer in the same CPT as the interface helps a little bit here, but only a little. The layout of the ksock_peer structure is actually a good example of what not to do. Take a look at the first few members, which will likely all be in the same cache line: typedef struct ksock_peer { struct list_head ksnp_list; /* stash on global peer list */ cfs_time_t ksnp_last_alive; /* when (in jiffies) I was last alive */ lnet_process_id_t ksnp_id; /* who's on the other end(s) */ atomic_t ksnp_refcount; /* # users */ int ksnp_sharecount; /* lconf usage counter */ ksnp_list and ksnp_id are semi-constant, and read by any thread that does a lookup of some peer in the hash table (shared/read access). In contrast ksnp_refcount and ksnp_last_alive are updated by threads doing work for this particular peer (exclusive/write access). So a lookup of some unrelated peer causes a cache line bounce between the CPU doing the lookup and the CPU managing the I/O. This particular case can be mitigated by being very careful with the layout of a datastructure, and by making sure that threads that do modify the structure run on the same socket, even if that socket is not where the datastructure lives.

Peter Jones added a comment - 06/Oct/15 5:05 PM

James

Are you thinking of targeting this work for 2.9?

Peter

Peter Jones added a comment - 06/Oct/15 5:05 PM James Are you thinking of targeting this work for 2.9? Peter

Gerrit Updater added a comment - 02/Oct/15 6:48 PM

James Simmons (uja.ornl@yahoo.com) uploaded a new patch: http://review.whamcloud.com/16710
Subject: ~~LU-7245~~ socklnd: Bind peers to a specific CPT
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 05f09aec7aecbd1d57eb321da2f4a65056d9a483

Gerrit Updater added a comment - 02/Oct/15 6:48 PM James Simmons (uja.ornl@yahoo.com) uploaded a new patch: http://review.whamcloud.com/16710 Subject: LU-7245 socklnd: Bind peers to a specific CPT Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 05f09aec7aecbd1d57eb321da2f4a65056d9a483

People

Assignee:: James A Simmons

Reporter:: James A Simmons

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 02/Oct/15 5:56 PM

Updated:: 13/May/21 6:56 PM

Resolved:: 22/Apr/17 6:22 PM