Details
-
Technical task
-
Resolution: Fixed
-
Minor
-
Lustre 2.15.0
-
None
-
9223372036854775807
Description
When server receives messages from the clients, those messages are going into each CPT(CPU partition), then pass them to upper layer.
And CPT ID distribution is decided by hashing based on client's NID.
However, if there is lnet routers between clients and servers, hashing is based on router's NID, not client's NIDs.
Let's assume the following configuration.
1 x server(20 cpu cores, CPT=20 means 1 CPU core belong into each CPT)
1 x lnet router
10 x client
Without LNET router
All client's NID are active.
nid refs state last max rtr min tx min queue 0@lo 1 NA -1 0 0 0 0 0 0 10.0.0.34@o2ib12 7 NA -1 8 8 8 2 -20 3616 10.0.11.226@o2ib12 1 NA -1 8 8 8 8 -8 0 10.0.0.39@o2ib12 5 NA -1 8 8 8 4 -18 2560 10.0.0.31@o2ib12 5 NA -1 8 8 8 4 -20 1984 10.0.0.35@o2ib12 5 NA -1 8 8 8 4 -18 1752 10.0.0.36@o2ib12 6 NA -1 8 8 8 3 -19 2544 10.0.0.32@o2ib12 1 NA -1 8 8 8 8 -18 0 10.0.0.33@o2ib12 6 NA -1 8 8 8 3 -17 2312 10.0.11.225@o2ib12 1 NA -1 8 8 8 8 -8 0 10.0.0.40@o2ib12 6 NA -1 8 8 8 3 -19 3056 10.0.0.38@o2ib12 1 NA -1 8 8 8 8 -21 0 10.0.11.227@o2ib12 1 NA -1 8 8 8 8 -8 0 10.0.0.37@o2ib12 6 NA -1 8 8 8 3 -18 3248
And, those messages are handled by lnet threads in different CPTs because of hash(client's NID).
top - 01:05:44 up 1 day, 16:09, 2 users, load average: 39.70, 18.43, 7.46 Tasks: 1442 total, 75 running, 1367 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.0 us, 50.6 sy, 0.0 ni, 48.3 id, 1.0 wa, 0.0 hi, 0.2 si, 0.0 st KiB Mem : 15369398+total, 13057227+free, 18096748 used, 5024956 buff/cache KiB Swap: 11075580 total, 11075580 free, 0 used. 13475987+avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 17303 root 20 0 0 0 0 S 17.8 0.0 0:03.16 kworker/u40:1 17601 root 20 0 0 0 0 S 9.2 0.0 0:00.47 ll_ost19_004 17642 root 20 0 0 0 0 S 7.9 0.0 0:00.33 ll_ost19_007 16187 root 20 0 0 0 0 R 7.3 0.0 0:07.56 kiblnd_sd_03_00 16192 root 20 0 0 0 0 R 7.3 0.0 0:07.57 kiblnd_sd_08_00 16198 root 20 0 0 0 0 R 7.3 0.0 0:11.14 kiblnd_sd_14_00 16201 root 20 0 0 0 0 R 7.3 0.0 0:07.70 kiblnd_sd_17_00 16632 root 20 0 0 0 0 R 7.3 0.0 0:07.10 mdt03_000 16634 root 20 0 0 0 0 R 7.3 0.0 0:06.95 mdt03_002 16647 root 20 0 0 0 0 R 7.3 0.0 0:07.24 mdt08_000 16649 root 20 0 0 0 0 R 7.3 0.0 0:07.06 mdt08_002
With LNET router
It's same test from 10 clients, but messages goes through the lnet router.
There is only single active NID on server which is router node.
nid refs state last max rtr min tx min queue 0@lo 1 NA -1 0 0 0 0 0 0 192.168.11.35@o2ib10 2 NA -1 0 0 0 0 0 0 10.0.11.226@o2ib12 1 NA -1 8 8 8 8 -8 0 192.168.11.36@o2ib10 2 NA -1 0 0 0 0 0 0 10.12.11.135@o2ib12 13 up -1 8 8 8 3 -94 3248 192.168.11.40@o2ib10 2 NA -1 0 0 0 0 0 0 192.168.11.32@o2ib10 2 NA -1 0 0 0 0 0 0 192.168.11.33@o2ib10 2 NA -1 0 0 0 0 0 0 192.168.11.37@o2ib10 2 NA -1 0 0 0 0 0 0 192.168.11.34@o2ib10 2 NA -1 0 0 0 0 0 0 10.0.11.225@o2ib12 1 NA -1 8 8 8 8 -7 0 192.168.11.39@o2ib10 2 NA -1 0 0 0 0 0 0 10.0.11.227@o2ib12 1 NA -1 8 8 8 8 -8 0 192.168.11.38@o2ib10 2 NA -1 0 0 0 0 0 0 192.168.11.31@o2ib10 2 NA -1 0 0 0 0 0 0
Then, that goes into cpt=2 and other 19 CPTs (19 CPU cores) are idle.
Tasks: 1067 total, 3 running, 1064 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.0 us, 5.0 sy, 0.0 ni, 94.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 15369398+total, 14010728+free, 12679048 used, 907648 buff/cache KiB Swap: 11075580 total, 11075580 free, 0 used. 14017849+avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 13044 root 20 0 0 0 0 R 7.0 0.0 0:01.81 kiblnd_sd_02_00 14146 root 20 0 0 0 0 S 6.6 0.0 0:00.93 mdt02_005 13489 root 20 0 0 0 0 R 6.3 0.0 0:01.37 mdt02_002
It would be nice to have better hashing to distribute messages to different CPTs on server to improve metadata performance and IOPS when LNET router is exist.
Attachments
Issue Links
- is related to
-
LU-17752 Advanced hash function for better CPT allocation
- Open
-
LU-14293 Poor lnet/ksocklnd(?) performance on 2x100G bonded ethernet
- Resolved
-
LU-12815 Create multiple TCP sockets per SockLND
- Resolved
-
LU-16797 improve numeric NID to CPT hashing
- Resolved
- is related to
-
LU-11454 Allow switching off CPT binding for PTLRPC threads
- Resolved
-
LU-56 Finish SMP scalability work
- Resolved
-
LU-13621 LNET peer doesn't distribute well to different CPT
- Resolved
-
LU-7245 Improve SMP scaling support for LND drivers
- Resolved