[LU-14172] DIR Stat performance regression in striped dir Created: 02/Dec/20 Updated: 09/Dec/20 Resolved: 09/Dec/20 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.6 |
| Fix Version/s: | Lustre 2.14.0, Lustre 2.12.6 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Shuichi Ihara | Assignee: | Lai Siyao |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||
| Severity: | 2 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
There is a metadata (DIR Stat) performance regression in 2.12.6 (RC1). It looks like that regression is exist in the part of striped directory and server side. client: version=2.12.6_RC1_1_g327c8b7 server: version=2.12.6_RC1_1_g327c8b7 or lustre-2.12.5 # mkdir /ai400x/mdt0 # lfs setdirstripe -c 4 /ai400x/mdt_stripe # lfs setdirstripe -c 4 -D /ai400x/mdt_stripe # salloc -p 40n -N 40 --ntasks-per-node=16 mpirun -mca btl_openib_if_include mlx5_1:1 -x UCX_NET_DEVICES=mlx5_1:1 --bind-to core:overload-allowed --allow-run-as-root /work/tools/bin/mdtest -i 3 -p 10 -n 1500 -u -D -d $PATH Single MDT without DNE SUMMARY rate: (of 3 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- Directory creation : 54315.552 50037.128 52618.576 1855.106 Directory stat : 186516.109 184354.609 185726.143 972.887 Directory removal : 66572.651 64990.546 65627.103 681.777 Tree creation : 46.771 24.099 36.301 9.336 Tree removal : 16.926 13.890 15.720 1.316 Server: Lustre-2.12.6-RC1 SUMMARY rate: (of 3 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- Directory creation : 68098.113 59185.349 62208.643 4164.966 Directory stat : 193338.869 192650.348 193031.824 285.743 Directory removal : 65905.804 64842.618 65212.728 490.440 Tree creation : 44.234 33.906 39.452 4.251 Tree removal : 17.024 15.068 16.279 0.864 Stripe Directory across four MDTs SUMMARY rate: (of 3 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- Directory creation : 6385.748 5929.670 6113.851 196.251 Directory stat : 166190.895 162991.180 164733.372 1321.263 Directory removal : 4789.518 4294.122 4584.600 211.099 Tree creation : 13.200 1.102 6.937 4.948 Tree removal : 9.126 8.479 8.810 0.264 Server: Lustre-2.12.6-RC1 SUMMARY rate: (of 3 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- Directory creation : 6694.539 6505.265 6613.160 79.512 Directory stat : 49873.850 48817.530 49260.117 447.881 <--- This is regression. Directory removal : 4768.841 4253.124 4592.927 240.327 Tree creation : 13.490 0.705 7.321 5.229 Tree removal : 9.051 8.441 8.774 0.252 |
| Comments |
| Comment by Peter Jones [ 02/Dec/20 ] |
|
Lai Is this related to the Peter |
| Comment by Lai Siyao [ 04/Dec/20 ] |
|
Yes, and the cause is that directory stripe revalidate takes more time in checking it's a stripe (see mdt_object_is_shard()), I made a simple fix and the result looks good, I'll tidy it up and push later. |
| Comment by Gerrit Updater [ 04/Dec/20 ] |
|
Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40863 |
| Comment by Shuichi Ihara [ 04/Dec/20 ] |
|
Here is test results on master branch (commit:e5c8f66) and reproduced same regression in DIR stat that I saw on lustre-2.12.6-RC1. Operation Max Min Mean Std Dev --------- --- --- ---- ------- Directory creation : 6323.394 6141.754 6238.210 74.580 Directory stat : 48295.593 46827.765 47645.451 610.794 Directory removal : 4336.014 4274.571 4315.516 28.952 Tree creation : 11.842 0.614 4.587 5.138 Tree removal : 9.204 8.894 9.048 0.126 And, unfortunueotry, patch 40863 against master doesn't solve problem. Operation Max Min Mean Std Dev --------- --- --- ---- ------- Directory creation : 6437.097 6084.071 6279.844 146.672 Directory stat : 47235.709 44762.233 46347.987 1123.868 Directory removal : 4745.993 4348.202 4504.821 173.053 Tree creation : 6.530 0.789 2.762 2.665 Tree removal : 8.983 8.477 8.741 0.207 |
| Comment by Gerrit Updater [ 04/Dec/20 ] |
|
Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40872 |
| Comment by Shuichi Ihara [ 05/Dec/20 ] |
|
It looks that patch https://review.whamcloud.com/40863 fixes regression after patch applied both server and client side. Previous test was that the patch only applied on server side, but I realized changes in patch contained both server and client. Operation Max Min Mean Std Dev --------- --- --- ---- ------- Directory creation : 6417.575 6007.465 6186.944 171.288 Directory stat : 143940.330 139376.396 141106.210 2020.042 Directory removal : 4677.840 4377.965 4569.627 135.902 Tree creation : 13.348 0.656 5.006 5.901 Tree removal : 8.832 8.782 8.810 0.021 This numbers is still a bit lower than 2.12.5, but I don't have baseline number on master without this regression impacts. So, it might be other issues in master if we compare against 2.12.5. |
| Comment by Gerrit Updater [ 05/Dec/20 ] |
|
Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40875 |
| Comment by Shuichi Ihara [ 05/Dec/20 ] |
|
Here is final test results apple to apple. # mkdir /ai400x/mdt0 # lfs setdirstripe -c 4 /ai400x/mdt_stripe # lfs setdirstripe -c 4 -D /ai400x/mdt_stripe # salloc -p 40n -N 40 --ntasks-per-node=16 mpirun -mca btl_openib_if_include mlx5_1:1 -x UCX_NET_DEVICES=mlx5_1:1 --bind-to core:overload-allowed --allow-run-as-root /work/tools/bin/mdtest -i 3 -p 10 -n 1500 -u -D -d $PATH 2.12.5 Operation Max Min Mean Std Dev --------- --- --- ---- ------- Directory creation : 6173.268 5805.240 5933.825 169.465 Directory stat : 151800.690 148071.970 150256.305 1587.764 Directory removal : 4648.674 4173.113 4417.583 194.376 Tree creation : 12.984 0.756 6.940 4.993 Tree removal 2.12.6-RC1 Operation Max Min Mean Std Dev --------- --- --- ---- ------- Directory creation : 6344.954 5834.617 6020.427 230.277 Directory stat : 44887.807 43460.779 43964.038 654.049 Directory removal : 4559.802 4114.146 4392.390 198.099 Tree creation : 13.336 0.734 7.153 5.148 Tree removal : 8.723 8.120 8.359 0.261 2.12.6-RC1 + https://review.whamcloud.com/40872 Operation Max Min Mean Std Dev --------- --- --- ---- ------- Directory creation : 6117.054 5850.628 5941.143 124.404 Directory stat : 151638.423 143319.490 148509.338 3695.492 Directory removal : 4498.161 3971.102 4219.711 216.202 Tree creation : 12.974 0.990 8.916 5.605 Tree removal : 8.616 8.349 8.458 0.114 2.12.6-RC1 + patch https://review.whamcloud.com/40875 Operation Max Min Mean Std Dev --------- --- --- ---- ------- Directory creation : 6328.334 5993.743 6113.977 151.946 Directory stat : 154744.046 148145.747 152434.570 3035.537 Directory removal : 4628.371 4174.092 4457.011 201.538 Tree creation : 13.789 1.132 7.503 5.167 Tree removal : 8.654 8.373 8.499 0.117 I think that patch 40875 solves the regression and the numbers are consistent. |
| Comment by Gerrit Updater [ 07/Dec/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40875/ |
| Comment by Gerrit Updater [ 09/Dec/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40863/ |
| Comment by Peter Jones [ 09/Dec/20 ] |
|
Landed for 2.14 and 2.12.6 |