[LU-7245] Improve SMP scaling support for LND drivers Created: 02/Oct/15  Updated: 13/May/21  Resolved: 22/Apr/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.9.0
Fix Version/s: Lustre 2.9.0

Type: Improvement Priority: Minor
Reporter: James A Simmons Assignee: James A Simmons
Resolution: Fixed Votes: 0
Labels: patch
Environment:

Any TCP, infiniband or Gemini network system


Issue Links:
Related
is related to LU-2544 Add SMP scaling to Cray Gemini driver Open
is related to LU-5960 Add ability to get peer and connectio... Open
is related to LU-14676 Better hash distribution to different... Resolved
Epic/Theme: lnet
Rank (Obsolete): 9223372036854775807

 Description   

While working on enhancing the lnetctl utility it was discovered that more SMP scaling improvements can be done to the currently supported LND driver.



 Comments   
Comment by Gerrit Updater [ 02/Oct/15 ]

James Simmons (uja.ornl@yahoo.com) uploaded a new patch: http://review.whamcloud.com/16710
Subject: LU-7245 socklnd: Bind peers to a specific CPT
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 05f09aec7aecbd1d57eb321da2f4a65056d9a483

Comment by Peter Jones [ 06/Oct/15 ]

James

Are you thinking of targeting this work for 2.9?

Peter

Comment by Olaf Weber [ 07/Oct/15 ]

James, since you asked, a few notes about NUMA.

Reasoning about NUMA is a lot like thinking about a cluster, except you spend a lot of time worrying about cache lines instead of files. (Much of the terminology is also similar, which becomes confusing when discussing NUMA issues for systems that are part of a cluster.) It is worth noting that NUMA considerations already apply once a system has more than one socket. Ideally, the process driving I/O, the memory involved, and the interface involved all live on the same socket.

In practice, the memory placement may have been done by some user space process outside our control, and the same goes for the process that initiates the I/O. Selection of an interface that is close in the topology of the system is useful, but that already assumes a multi-rail type configuration. Much of the time there will be no choice because there is only one interface. (LNet routers are an exception: there we do have full control over the location of all buffers and threads relative to the interfaces used.)

So the main concern becomes doing the best we can in areas we do control, in particular avoiding cache line bouncing. Placing a datastructure like ksock_peer in the same CPT as the interface helps a little bit here, but only a little. The layout of the ksock_peer structure is actually a good example of what not to do. Take a look at the first few members, which will likely all be in the same cache line:

typedef struct ksock_peer
{
        struct list_head        ksnp_list;      /* stash on global peer list */
        cfs_time_t            ksnp_last_alive;  /* when (in jiffies) I was last alive */
        lnet_process_id_t     ksnp_id;       /* who's on the other end(s) */
        atomic_t              ksnp_refcount; /* # users */
        int                   ksnp_sharecount;  /* lconf usage counter */

ksnp_list and ksnp_id are semi-constant, and read by any thread that does a lookup of some peer in the hash table (shared/read access). In contrast ksnp_refcount and ksnp_last_alive are updated by threads doing work for this particular peer (exclusive/write access). So a lookup of some unrelated peer causes a cache line bounce between the CPU doing the lookup and the CPU managing the I/O. This particular case can be mitigated by being very careful with the layout of a datastructure, and by making sure that threads that do modify the structure run on the same socket, even if that socket is not where the datastructure lives.

Comment by James A Simmons [ 08/Oct/15 ]

The patch for the Gemini SMP work under LU-2544 reworked a lot of the data structures to deal with what you described. Never looked closely at the other LND drivers but I can see the problem there. Will require a lot of data structure reworking :-/

Comment by Gerrit Updater [ 28/Oct/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16710/
Subject: LU-7245 socklnd: Bind peers to a specific CPT
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 68eb6e841f49d41a289bd0b3f559973b6cb31738

Comment by James A Simmons [ 28/Oct/15 ]

I suspect more work will coming from this ticket for 2.9.

Comment by James A Simmons [ 22/Apr/17 ]

Patches for this work already landed and the multi-rail work filled in the rest of the gaps.

Generated at Sat Feb 10 02:07:14 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.