[LU-14761] DNE2 Metadata degradation Created: 14/Jun/21  Updated: 23/Mar/22

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.15.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Shuichi Ihara Assignee: Lai Siyao
Resolution: Unresolved Votes: 0
Labels: None
Environment:

master (tag 2.14.52)


Attachments: PNG File DoM-DNE1.png     PNG File DoM-DNE2.png    
Issue Links:
Related
is related to LU-15528 downgrade remote PW/EX lock taken in ... Open
is related to LU-6864 DNE3: Support multiple modify RPCs in... Resolved
is related to LU-15526 PDO lock for object on remote MDT Resolved
Severity: 2
Rank (Obsolete): 9223372036854775807

 Description   

Here is performance comparison with DNE1 and DNE2 against 8 x MDTs. running mdtest for 300sec.

8 x MDT with DNE1 + DoM

# mkdir /exa5/mdtest-easy-dne1-dom
# for i in `seq 0 7`; do
lfs mkdir -i $i /exa5/mdtest-easy-dne1-dom/$i
lfs setstripe -E 1m -L mdt /exa5/mdtest-easy-dne1-dom/$i
done

File Creation

# salloc -p 40n --nodelist=ec[01-04,08-10,15,18-19] --nodes=10 --ntasks-per-node=32 mpirun --bind-to core:overload-allowed --allow-run-as-root ./b
in/mdtest -n 1000000 -u -L -F -P -G 1664431930 -d /exa5/mdtest-easy-dne1-dom/0@/exa5/mdtest-easy-dne1-dom/1@/exa5/mdtest-easy-dne1-dom/2@/exa5/m
dtest-easy-dne1-dom/3@/exa5/mdtest-easy-dne1-dom/4@/exa5/mdtest-easy-dne1-dom/5@/exa5/mdtest-easy-dne1-dom/6@/exa5/mdtest-easy-dne1-dom/7 -x /ex
a5/mdtest-easy.stonewall -C -Y -W 300 -a POSIX                                                                                                  

                                                                                  
SUMMARY rate: (of 1 iterations)                                                                                                                 
   Operation                     Max            Min           Mean        Std Dev                                                               
   ---------                     ---            ---           ----        -------                                                               
   File creation              438695.359     438695.359     438695.359          0.000                                                           
   File stat                       0.000          0.000          0.000          0.000                                                           
   File read                       0.000          0.000          0.000          0.000                                                           
   File removal                    0.000          0.000          0.000          0.000                                                           
   Tree creation                  61.855         61.855         61.855          0.000                                                           
   Tree removal                    0.000          0.000          0.000          0.000                                                           
                                                                                                                                                
SUMMARY time: (of 1 iterations)                                                                                                                 
   Operation                     Max            Min           Mean        Std Dev                                                               
   ---------                     ---            ---           ----        -------                                                               
   File creation                 330.342        330.342        330.342          0.000                                                           
   File stat                       0.000          0.000          0.000          0.000                                                           
   File read                       0.000          0.000          0.000          0.000                                                           
   File removal                    0.000          0.000          0.000          0.000                                                           
   Tree creation                   0.016          0.016          0.016          0.000                                                           
   Tree removal                    0.000          0.000          0.000          0.000                                                           

File Removal

# salloc -p 40n --nodelist=ec[01-04,08-10,15,18-19] --nodes=10 --ntasks-per-node=32 mpirun --bind-to core:overload-allowed 
--allow-run-as-root ./bin/mdtest -n 1000000 -u -L -F -P -G 1664431930 -d /exa5/mdtest-easy-dne1-dom/0@/exa5/mdtest-easy-dne1-dom/1@/exa5/mdtest-
easy-dne1-dom/2@/exa5/mdtest-easy-dne1-dom/3@/exa5/mdtest-easy-dne1-dom/4@/exa5/mdtest-easy-dne1-dom/5@/exa5/mdtest-easy-dne1-dom/6@/exa5/mdtest
-easy-dne1-dom/7 -x /exa5/mdtest-easy.stonewall -r -Y -a POSIX                                                                                  
salloc: Granted job allocation 8842                                                                                                             

SUMMARY rate: (of 1 iterations)
   Operation                     Max            Min           Mean        Std Dev
   ---------                     ---            ---           ----        -------
   File creation                   0.000          0.000          0.000          0.000
   File stat                       0.000          0.000          0.000          0.000
   File read                       0.000          0.000          0.000          0.000
   File removal               403932.020     403932.020     403932.020          0.000
   Tree creation                   0.000          0.000          0.000          0.000
   Tree removal                    0.376          0.376          0.376          0.000

SUMMARY time: (of 1 iterations)
   Operation                     Max            Min           Mean        Std Dev
   ---------                     ---            ---           ----        -------
   File creation                   0.000          0.000          0.000          0.000
   File stat                       0.000          0.000          0.000          0.000
   File read                       0.000          0.000          0.000          0.000
   File removal                  358.772        358.772        358.772          0.000
   Tree creation                   0.000          0.000          0.000          0.000
   Tree removal                    2.658          2.658          2.658          0.000

DNE2 + DoM

# lfs setdirstripe -c 8 /exa5/mdtest-easy-dne2-dom
# lfs setdirstripe -c 8 -D /exa5/mdtest-easy-dne2-dom
# lfs setstripe -E 1m -L mdt /exa5/mdtest-easy-dne2-dom

File Creation

# salloc -p 40n --nodelist=ec[01-04,08-10,15,18-19] --nodes=10 --ntasks-per-node=32 mpirun --bind-to core:overload-allowed --allow-run-as-root ./bin/mdtest -n 1000000 -u -L -F -P -G 1664431930 -d /exa5/mdtest-easy-dne2-dom/ -x /exa5/mdtest-easy.stonewall -C -Y -W 300 -a POSIX            
                                                                                                                                                                                 
SUMMARY rate: (of 1 iterations)                                                                                                                 
   Operation                     Max            Min           Mean        Std Dev                                                               
   ---------                     ---            ---           ----        -------                                                               
   File creation              231527.436     231527.436     231527.436          0.000                                                           
   File stat                       0.000          0.000          0.000          0.000                                                           
   File read                       0.000          0.000          0.000          0.000                                                           
   File removal                    0.000          0.000          0.000          0.000                                                           
   Tree creation                   6.607          6.607          6.607          0.000                                                           
   Tree removal                    0.000          0.000          0.000          0.000                                                           
                                                                                                                                                
SUMMARY time: (of 1 iterations)                                                                                                                 
   Operation                     Max            Min           Mean        Std Dev                                                               
   ---------                     ---            ---           ----        -------                                                               
   File creation                 887.163        887.163        887.163          0.000                                                           
   File stat                       0.000          0.000          0.000          0.000                                                           
   File read                       0.000          0.000          0.000          0.000                                                           
   File removal                    0.000          0.000          0.000          0.000                                                           
   Tree creation                   0.151          0.151          0.151          0.000                                                           
   Tree removal                    0.000          0.000          0.000          0.000                                                           

File removal

# salloc -p 40n --nodelist=ec[01-04,08-10,15,18-19] --nodes=10 --ntasks-per-node=32 mpirun --bind-to core:overload-allowed 
--allow-run-as-root ./bin/mdtest -n 1000000 -u -L -F -P -G 1664431930 -d /exa5/mdtest-easy-dne2-dom/ -x /exa5/mdtest-easy.stonewall -r -Y -a POS
IX                                                                                                                                              

SUMMARY rate: (of 1 iterations)
   Operation                     Max            Min           Mean        Std Dev
   ---------                     ---            ---           ----        -------
   File creation                   0.000          0.000          0.000          0.000
   File stat                       0.000          0.000          0.000          0.000
   File read                       0.000          0.000          0.000          0.000
   File removal               222840.053     222840.053     222840.053          0.000
   Tree creation                   0.000          0.000          0.000          0.000
   Tree removal                    1.159          1.159          1.159          0.000

SUMMARY time: (of 1 iterations)
   Operation                     Max            Min           Mean        Std Dev
   ---------                     ---            ---           ----        -------
   File creation                   0.000          0.000          0.000          0.000
   File stat                       0.000          0.000          0.000          0.000
   File read                       0.000          0.000          0.000          0.000
   File removal                  921.749        921.749        921.749          0.000
   Tree creation                   0.000          0.000          0.000          0.000
   Tree removal                    0.863          0.863          0.863          0.000

See attached. Not sure if degradation at file creation and file removel are same problem, but it might be different problem.
In file creation, the peak performance of DNE2 is close of DNE1 at the first 30sec, but the performance dropped significantly after 30sec.
However, file removal in DNE2 is slower than DNE1 overall.



 Comments   
Comment by Andreas Dilger [ 23/Mar/22 ]

Lai is working on improving DNE performance in a patch series ending at:

https://review.whamcloud.com/46734 "LU-15528 mdt: enqueue newly created object locks in TXN mode"

Generated at Sat Feb 10 03:12:33 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.