Details
-
Bug
-
Resolution: Cannot Reproduce
-
Critical
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
While testing 2.15 and comparing it to our 2.12 branch, I observed a noticeable regression on the the following:
- client side file create regression
- client side 32K file remove regression
- and all of the high std dev for creates/remove that we have been experiencing for creates/remove
A git bisect revealed that this commit is the root cause (LU-14792):
b9c4dc3c33 LU-14792 llite: enable filesystem-wide default LMV
More details:
commit b9c4dc3c33fe87ecaa79a290190524ea21b7fa8a Author: Lai Siyao <lai.siyao@whamcloud.com> Date: Mon Jun 21 11:52:01 2021 +0800 LU-14792 llite: enable filesystem-wide default LMV This change includes three parts: 1. save dir depth to ROOT after lookup on client side. 2. once space balanced default LMV is set on ROOT, and max-inherit/max-inherit-rr is unlimited or not less than directory depth, new directory will be created in QOS or roundrobin mode. 3. set ROOT default LMV max-inherit unlimited, and max-inherit-rr to 3, and increase the ratio to create subdirectory on local MDT with the directory depth to ROOT, so that new directories will be created by space usage, and the deeper it's located it's more likely to create on local MDTs; and the top 3 layer will be created in roundrobin mode if system is balanced. Set default LMV in mkdir_on_mdt() to make sure its subdirectories are created on the same MDT. Add sanity 413d. Create a test directory on MDT0 for pjdfstest, because cross-MDT rename of symlink will migrate symlink to target MDT, which will cause inode change (LU-11631).
All commits before this look great. All commits after this exhibit the above symptoms.
git log on master:
4668283cd1 LU-14806 o2iblnd: clear fatal error on successful failover ---> introduces regression b9c4dc3c33 LU-14792 llite: enable filesystem-wide default LMV ---> looks good b7bd4e3422 LU-14621 mdd: fix lock-tx order in mdd_xattr_merge() 3e04b0fd6c LU-13417 mdd: set default LMV on ROOT 4e05f3b70b (tag: v2_14_53, tag: 2.14.53) New tag 2.14.53
Testing b7bd4e3422 (before patch):
SUMMARY rate: (of 5 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- Directory creation : 109280.683 100961.554 105818.622 3136.705 Directory stat : 410841.732 388930.761 404696.344 7969.689 Directory removal : 220323.614 150785.433 181709.288 25249.587 File creation : 154658.972 143961.530 149709.807 4125.522 File stat : 700893.743 685670.701 692684.713 6583.956 File read : 271890.920 183951.839 205427.555 33679.583 File removal : 147697.301 135354.855 140847.877 4338.359 Tree creation : 275.553 170.019 248.261 39.874 Tree removal : 99.770 85.408 91.795 5.479
Testing b9c4dc3c33 (after patch):
SUMMARY rate: (of 5 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- Directory creation : 108068.523 102899.926 105606.738 2004.020 Directory stat : 428322.427 395826.681 404486.906 12222.136 Directory removal : 236153.570 146400.162 179968.138 32242.271 File creation : 156681.218 101096.295 122707.414 23848.521 File stat : 689022.637 677108.079 683537.598 4706.503 File read : 276963.750 184493.079 241172.371 30923.700 File removal : 148977.883 100569.361 123812.878 18654.554 Tree creation : 280.232 0.994 142.324 123.201 Tree removal : 99.952 20.766 57.230 35.277
Again, every test run b9c4dc3c33 and after continues exhibiting the regressions and high deviations noted above. It varies from run to run but I can get regressions 15% or more for both file creates and file removes.
mdtest script:
#!/bin/bash NODES=21 PPN=16 PROCS=$(( $NODES * $PPN )) MDT_COUNT=1 PAUSED=120 # Unique directory # srun -N $NODES --ntasks-per-node $PPN ~bloewe/benchmarks/ior-3.3.0-CentOS-8.2/install/bin/mdtest -v -i 5 -p $PAUSED -C -E -T -r -n $(( $MDT_COUNT * 1048576 / $PROCS )) -u -d /mnt/kjlmo13/pkoutoupis/mdt0/test.`date +"%Y%m%d.%H%M%S"` 2>&1 |& tee f_mdt0_0k_ost_uniq.out srun -N $NODES --ntasks-per-node $PPN ~bloewe/benchmarks/ior-3.3.0-CentOS-8.2/install/bin/mdtest -v -i 5 -p $PAUSED -C -w 32768 -E -e 32768 -T -r -n $(( $MDT_COUNT * 1048576 / $PROCS )) -u -d /mnt/kjlmo13/pkoutoupis/mdt0/test.`date +"%Y%m%d.%H%M%S"` 2>&1 |& tee f_mdt0_32k_ost_uniq.out # Shared directory # srun -N $NODES --ntasks-per-node $PPN ~bloewe/benchmarks/ior-3.3.0-CentOS-8.2/install/bin/mdtest -v -i 5 -p $PAUSED -C -E -T -r -n $(( $MDT_COUNT * 1048576 / $PROCS )) -d /mnt/kjlmo13/pkoutoupis/mdt0/test.`date +"%Y%m%d.%H%M%S"` 2>&1 |& tee f_mdt0_0k_ost_shared.out srun -N $NODES --ntasks-per-node $PPN ~bloewe/benchmarks/ior-3.3.0-CentOS-8.2/install/bin/mdtest -v -i 5 -p $PAUSED -C -w 32768 -E -e 32768 -T -r -n $(( $MDT_COUNT * 1048576 / $PROCS )) -d /mnt/kjlmo13/pkoutoupis/mdt0/test.`date +"%Y%m%d.%H%M%S"` 2>&1 |& tee f_mdt0_32k_ost_shared.out
Attachments
Issue Links
- is related to
-
LU-14792 DNE3: enable filesystem-wide default LMV
- Resolved