[LU-11854] Dir operation on DNE2 are slower than DNE1 or non DNE Created: 14/Jan/19 Updated: 31/Jan/22 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Shuichi Ihara | Assignee: | Lai Siyao |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | llnl | ||
| Environment: |
2.12.0 |
||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
Directory operations (create and removal) on stripe dir with DNE2 has been significantly slower (more than 50x) than non DNE configuration and DNE1. Here is test results. Non DNE configuration [root@c01 ~]# mkdir /scratch1/nodne [root@c01 ~]# salloc -N 32 --ntasks-per-node=24 mpirun -np 768 --allow-run-as-root /work/tools/bin/mdtest -n 1000 -u -vv -d /scratch1/nodne SUMMARY: (of 1 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- Directory creation: 97259.037 97259.037 97259.037 0.000 Directory stat : 347749.775 347749.775 347749.775 0.000 Directory removal : 96064.306 96064.306 96064.306 0.000 File creation : 111509.920 111509.920 111509.920 0.000 File stat : 317489.856 317489.856 317489.856 0.000 File read : 183132.719 183132.719 183132.719 0.000 File removal : 166205.620 166205.620 166205.620 0.000 Tree creation : 29.571 29.571 29.571 0.000 Tree removal : 26.353 26.353 26.353 0.000 V-1: Entering print_timestamp... DNE1 configuration [root@c01 ~]# lfs mkdir -i 0 /scratch1/mdt0 [root@c01 ~]# lfs mkdir -i 1 /scratch1/mdt1 [root@c01 ~]# salloc -N 32 --ntasks-per-node=24 mpirun -np 768 --allow-run-as-root /work/tools/bin/mdtest -n 1000 -u -vv -d /scratch1/mdt0@/scratch1/mdt1 SUMMARY: (of 1 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- Directory creation: 189546.945 189546.945 189546.945 0.000 Directory stat : 688947.817 688947.817 688947.817 0.000 Directory removal : 255838.417 255838.417 255838.417 0.000 File creation : 203077.460 203077.460 203077.460 0.000 File stat : 692292.941 692292.941 692292.941 0.000 File read : 350911.938 350911.938 350911.938 0.000 File removal : 339358.198 339358.198 339358.198 0.000 Tree creation : 38.326 38.326 38.326 0.000 Tree removal : 43.928 43.928 43.928 0.000 V-1: Entering print_timestamp... DNE2 configuration [root@c01 ~]# lfs setdirstripe -c 2 /scratch1/stripedir [root@c01 ~]# lfs setdirstripe -c 2 -D /scratch1/stripedir [root@c01 ~]# salloc -N 32 --ntasks-per-node=24 mpirun -np 768 --allow-run-as-root /work/tools/bin/mdtest -n 1000 -u -vv -d /scratch1/stripedir SUMMARY: (of 1 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- Directory creation: 6585.023 6585.023 6585.023 0.000 <----- Directory stat : 247222.630 247222.630 247222.630 0.000 Directory removal : 3554.469 3554.469 3554.469 0.000 <----- File creation : 250751.232 250751.232 250751.232 0.000 File stat : 566094.009 566094.009 566094.009 0.000 File read : 362051.872 362051.872 362051.872 0.000 File removal : 298873.347 298873.347 298873.347 0.000 Tree creation : 13.632 13.632 13.632 0.000 Tree removal : 4.830 4.830 4.830 0.000 V-1: Entering print_timestamp... |
| Comments |
| Comment by Peter Jones [ 14/Jan/19 ] |
|
Lai Could you please investigate? Ihara Could you please share some more details about the configuration and the test script used? Peter |
| Comment by Andreas Dilger [ 14/Jan/19 ] |
|
Ihara, setting lfs setdirstripe -c 2 -D /scratch1/stripedir causes every subdirectory to also be created with 2 stripes, which triggers distributed transactions and is definitely slower than creating a local 1-stripe directory. That is why we are working on the dynamic restriping, so that the directory can be created with 1 stripe at the start, and only move to DNE2 striping if it is needed. |
| Comment by Shuichi Ihara [ 15/Jan/19 ] |
|
OK, but 100K to 6k dir creations 110K to 3k dir removal, is that level performance drop expected? If this is expected, that's fine when striped dir is enabled by default, but I wonder if we could get a bit better performance for resonable performance on dir operation when stripe dir enabled. |
| Comment by Lai Siyao [ 15/Jan/19 ] |
|
It's hard to improve this in a short time, because a striped directory creation will often trigger old transactions (previous creations) commit to make DNE recovery easier, which means it causes sync on all MDTs. Without changing DNE recovery, we can't do much right now. |
| Comment by Andreas Dilger [ 19/Oct/20 ] |
|
Could this be fixed by |
| Comment by Shuichi Ihara [ 13/Dec/20 ] |
|
|
| Comment by Lai Siyao [ 14/Dec/20 ] |
|
Striped directory creation and removal will start distributed transaction, if the involved MDTs are not located on the same MDS, this may be optimized: if one MDT failed, distributed transactions can be recovered from logs on other MDTs, thus the dependencies between distributed transactions can be removed. This means, only if a distributed transaction depends on a local transaction, it needs to commit the local transaction. But this seems not true on current deployment, there are often more than one MDTs on an MDS. |
| Comment by Olaf Faaland [ 14/Dec/20 ] |
Hi Lai, |
| Comment by Lai Siyao [ 15/Dec/20 ] |
|
Okay, I'll implement it and add a tunable option for this. |
| Comment by Raphael Druon [ 06/Sep/21 ] |
|
Do we have update for this? |