[LU-9998] Default partition setup is not optimal for best metadata performance Created: 16/Sep/17 Updated: 09/Feb/18 Resolved: 22/Dec/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.11.0, Lustre 2.10.4 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Shuichi Ihara (Inactive) | Assignee: | Dmitry Eremin (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
b2_10 |
||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
Here is MDS's CPU configuration. [root@mds11 ~]# lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 48 On-line CPU(s) list: 0-47 Thread(s) per core: 2 Core(s) per socket: 24 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 85 Model name: Intel(R) Xeon(R) Platinum 8160 CPU @ 2.10GHz Stepping: 4 CPU MHz: 2101.000 BogoMIPS: 4200.00 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 1024K L3 cache: 33792K NUMA node0 CPU(s): 0-47 [root@mds11 ~]# numactl -H available: 1 nodes (0) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 node 0 size: 96940 MB node 0 free: 90229 MB node distances: node 0 0: 10 only single partition created by default for single CPU configuration. [root@mds11 ~]# cat /proc/sys/lnet/cpu_partition_table 0 : 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 This default partition configuration is not optimal and affects huge metadata performance impact. especially stats and read operations. Default partition (npartition=1) mpirun -np 128 /work/tools/bin/mdtest -n 5000 -v -d /scratch0/dir0 -F -i 3 -p 10 -w 0 -u SUMMARY: (of 3 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- File creation : 90269.484 73210.911 83067.818 7212.787 File stat : 192519.466 191217.586 191843.135 532.702 File read : 84278.190 74407.351 78726.036 4123.061 File removal : 152552.089 141405.693 148541.612 5058.776 Tree creation : 576.227 129.569 332.039 184.718 Tree removal : 28.016 12.466 18.019 7.083 V-1: Entering timestamp... npartition=6 [root@mds11 ~]# cat /proc/sys/lnet/cpu_partition_table 0 : 0 1 2 3 24 25 26 27 1 : 4 5 6 7 28 29 30 31 2 : 8 9 10 11 32 33 34 35 3 : 12 13 14 15 36 37 38 39 4 : 16 17 18 19 40 41 42 43 5 : 20 21 22 23 44 45 46 47 mpirun -np 128 /work/tools/bin/mdtest -n 5000 -v -d /scratch0/dir0 -F -i 3 -p 10 -w 0 -u SUMMARY: (of 3 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- File creation : 130215.199 112298.894 123903.497 8216.228 File stat : 447219.644 422373.391 436421.078 10400.374 File read : 224856.656 216383.752 219513.555 3796.625 File removal : 142603.040 138102.147 139843.976 1973.252 Tree creation : 561.879 170.631 379.767 160.865 Tree removal : 41.908 41.042 41.509 0.357 V-1: Entering timestamp... |
| Comments |
| Comment by Joseph Gmitter (Inactive) [ 18/Sep/17 ] |
|
Hi Dmitry, Can you please investigate and advise? Thanks. |
| Comment by Gerrit Updater [ 17/Oct/17 ] |
|
Dmitry Eremin (dmitry.eremin@intel.com) uploaded a new patch: https://review.whamcloud.com/29645 |
| Comment by Dmitry Eremin (Inactive) [ 17/Oct/17 ] |
|
I would like to propose workaround for this. In my patch I return old behavior for machines with single NUMA node. |
| Comment by Gerrit Updater [ 22/Dec/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/29645/ |
| Comment by Peter Jones [ 22/Dec/17 ] |
|
Landed for 2.11 |
| Comment by Gerrit Updater [ 02/Jan/18 ] |
|
Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/30690 |
| Comment by Gerrit Updater [ 09/Feb/18 ] |
|
John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/30690/ |