[LU-7245] Improve SMP scaling support for LND drivers Created: 02/Oct/15 Updated: 13/May/21 Resolved: 22/Apr/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.9.0 |
| Fix Version/s: | Lustre 2.9.0 |
| Type: | Improvement | Priority: | Minor |
| Reporter: | James A Simmons | Assignee: | James A Simmons |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | patch | ||
| Environment: |
Any TCP, infiniband or Gemini network system |
||
| Issue Links: |
|
||||||||||||||||
| Epic/Theme: | lnet | ||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||
| Description |
|
While working on enhancing the lnetctl utility it was discovered that more SMP scaling improvements can be done to the currently supported LND driver. |
| Comments |
| Comment by Gerrit Updater [ 02/Oct/15 ] |
|
James Simmons (uja.ornl@yahoo.com) uploaded a new patch: http://review.whamcloud.com/16710 |
| Comment by Peter Jones [ 06/Oct/15 ] |
|
James Are you thinking of targeting this work for 2.9? Peter |
| Comment by Olaf Weber [ 07/Oct/15 ] |
|
James, since you asked, a few notes about NUMA. Reasoning about NUMA is a lot like thinking about a cluster, except you spend a lot of time worrying about cache lines instead of files. (Much of the terminology is also similar, which becomes confusing when discussing NUMA issues for systems that are part of a cluster.) It is worth noting that NUMA considerations already apply once a system has more than one socket. Ideally, the process driving I/O, the memory involved, and the interface involved all live on the same socket. In practice, the memory placement may have been done by some user space process outside our control, and the same goes for the process that initiates the I/O. Selection of an interface that is close in the topology of the system is useful, but that already assumes a multi-rail type configuration. Much of the time there will be no choice because there is only one interface. (LNet routers are an exception: there we do have full control over the location of all buffers and threads relative to the interfaces used.) So the main concern becomes doing the best we can in areas we do control, in particular avoiding cache line bouncing. Placing a datastructure like ksock_peer in the same CPT as the interface helps a little bit here, but only a little. The layout of the ksock_peer structure is actually a good example of what not to do. Take a look at the first few members, which will likely all be in the same cache line: typedef struct ksock_peer
{
struct list_head ksnp_list; /* stash on global peer list */
cfs_time_t ksnp_last_alive; /* when (in jiffies) I was last alive */
lnet_process_id_t ksnp_id; /* who's on the other end(s) */
atomic_t ksnp_refcount; /* # users */
int ksnp_sharecount; /* lconf usage counter */
ksnp_list and ksnp_id are semi-constant, and read by any thread that does a lookup of some peer in the hash table (shared/read access). In contrast ksnp_refcount and ksnp_last_alive are updated by threads doing work for this particular peer (exclusive/write access). So a lookup of some unrelated peer causes a cache line bounce between the CPU doing the lookup and the CPU managing the I/O. This particular case can be mitigated by being very careful with the layout of a datastructure, and by making sure that threads that do modify the structure run on the same socket, even if that socket is not where the datastructure lives. |
| Comment by James A Simmons [ 08/Oct/15 ] |
|
The patch for the Gemini SMP work under LU-2544 reworked a lot of the data structures to deal with what you described. Never looked closely at the other LND drivers but I can see the problem there. Will require a lot of data structure reworking :-/ |
| Comment by Gerrit Updater [ 28/Oct/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16710/ |
| Comment by James A Simmons [ 28/Oct/15 ] |
|
I suspect more work will coming from this ticket for 2.9. |
| Comment by James A Simmons [ 22/Apr/17 ] |
|
Patches for this work already landed and the multi-rail work filled in the rest of the gaps. |