[LU-11854] Dir operation on DNE2 are slower than DNE1 or non DNE Created: 14/Jan/19  Updated: 31/Jan/22

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Shuichi Ihara Assignee: Lai Siyao
Resolution: Unresolved Votes: 0
Labels: llnl
Environment:

2.12.0


Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Directory operations (create and removal) on stripe dir with DNE2 has been significantly slower (more than 50x) than non DNE configuration and DNE1. Here is test results.

Non DNE configuration

 [root@c01 ~]# mkdir /scratch1/nodne 
 [root@c01 ~]# salloc -N 32 --ntasks-per-node=24 mpirun -np 768 --allow-run-as-root /work/tools/bin/mdtest -n 1000 -u -vv -d /scratch1/nodne 
SUMMARY: (of 1 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   Directory creation:      97259.037      97259.037      97259.037          0.000
   Directory stat    :     347749.775     347749.775     347749.775          0.000
   Directory removal :      96064.306      96064.306      96064.306          0.000
   File creation     :     111509.920     111509.920     111509.920          0.000
   File stat         :     317489.856     317489.856     317489.856          0.000
   File read         :     183132.719     183132.719     183132.719          0.000
   File removal      :     166205.620     166205.620     166205.620          0.000
   Tree creation     :         29.571         29.571         29.571          0.000
   Tree removal      :         26.353         26.353         26.353          0.000
V-1: Entering print_timestamp...

DNE1 configuration

 [root@c01 ~]# lfs mkdir -i 0 /scratch1/mdt0
 [root@c01 ~]# lfs mkdir -i 1 /scratch1/mdt1
 [root@c01 ~]# salloc -N 32 --ntasks-per-node=24 mpirun -np 768 --allow-run-as-root /work/tools/bin/mdtest -n 1000 -u -vv -d /scratch1/mdt0@/scratch1/mdt1
 SUMMARY: (of 1 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   Directory creation:     189546.945     189546.945     189546.945          0.000
   Directory stat    :     688947.817     688947.817     688947.817          0.000
   Directory removal :     255838.417     255838.417     255838.417          0.000
   File creation     :     203077.460     203077.460     203077.460          0.000
   File stat         :     692292.941     692292.941     692292.941          0.000
   File read         :     350911.938     350911.938     350911.938          0.000
   File removal      :     339358.198     339358.198     339358.198          0.000
   Tree creation     :         38.326         38.326         38.326          0.000
   Tree removal      :         43.928         43.928         43.928          0.000
V-1: Entering print_timestamp...

DNE2 configuration

 [root@c01 ~]# lfs setdirstripe -c 2 /scratch1/stripedir
 [root@c01 ~]# lfs setdirstripe -c 2 -D /scratch1/stripedir
 [root@c01 ~]# salloc -N 32 --ntasks-per-node=24 mpirun -np 768 --allow-run-as-root /work/tools/bin/mdtest -n 1000 -u -vv -d /scratch1/stripedir
 SUMMARY: (of 1 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   Directory creation:       6585.023       6585.023       6585.023          0.000  <----- 
   Directory stat    :     247222.630     247222.630     247222.630          0.000
   Directory removal :       3554.469       3554.469       3554.469          0.000  <-----
   File creation     :     250751.232     250751.232     250751.232          0.000
   File stat         :     566094.009     566094.009     566094.009          0.000
   File read         :     362051.872     362051.872     362051.872          0.000
   File removal      :     298873.347     298873.347     298873.347          0.000
   Tree creation     :         13.632         13.632         13.632          0.000
   Tree removal      :          4.830          4.830          4.830          0.000
V-1: Entering print_timestamp...


 Comments   
Comment by Peter Jones [ 14/Jan/19 ]

Lai

Could you please investigate?

Ihara

Could you please share some more details about the configuration and the test script used?

Peter

Comment by Andreas Dilger [ 14/Jan/19 ]

Ihara, setting lfs setdirstripe -c 2 -D /scratch1/stripedir causes every subdirectory to also be created with 2 stripes, which triggers distributed transactions and is definitely slower than creating a local 1-stripe directory. That is why we are working on the dynamic restriping, so that the directory can be created with 1 stripe at the start, and only move to DNE2 striping if it is needed.

Comment by Shuichi Ihara [ 15/Jan/19 ]

OK, but 100K to 6k dir creations 110K to 3k dir removal, is that level performance drop expected? If this is expected, that's fine when striped dir is enabled by default, but I wonder if we could get a bit better performance for resonable performance on dir operation when stripe dir enabled.

Comment by Lai Siyao [ 15/Jan/19 ]

It's hard to improve this in a short time, because a striped directory creation will often trigger old transactions (previous creations) commit to make DNE recovery easier, which means it causes sync on all MDTs. Without changing DNE recovery, we can't do much right now.

Comment by Andreas Dilger [ 19/Oct/20 ]

Could this be fixed by LU-11999?

Comment by Shuichi Ihara [ 13/Dec/20 ]

LU-11999 still doesn't solve problem. Dir creation and removal operations in the striped directory are still much slower than remote Dir.

Comment by Lai Siyao [ 14/Dec/20 ]

Striped directory creation and removal will start distributed transaction, if the involved MDTs are not located on the same MDS, this may be optimized: if one MDT failed, distributed transactions can be recovered from logs on other MDTs, thus the dependencies between distributed transactions can be removed. This means, only if a distributed transaction depends on a local transaction, it needs to commit the local transaction. But this seems not true on current deployment, there are often more than one MDTs on an MDS.

Comment by Olaf Faaland [ 14/Dec/20 ]

if the involved MDTs are not located on the same MDS, this may be optimized ... <redacted> ... But this seems not true on current deployment, there are often more than one MDTs on an MDS.

Hi Lai,
I'm not sure whose deployment you're referring to, but at least some sites would benefit from that optimization. Our file systems have 16 MDTs each on their own MDS. And if that optimization were implemented and worked well, we would take it into account when designing new file systems.

Comment by Lai Siyao [ 15/Dec/20 ]

Okay, I'll implement it and add a tunable option for this.

Comment by Raphael Druon [ 06/Sep/21 ]

Do we have update for this?

Generated at Sat Feb 10 02:47:31 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.