Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14761

DNE2 Metadata degradation

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • Lustre 2.15.0
    • None
    • master (tag 2.14.52)
    • 2
    • 9223372036854775807

    Description

      Here is performance comparison with DNE1 and DNE2 against 8 x MDTs. running mdtest for 300sec.

      8 x MDT with DNE1 + DoM

      # mkdir /exa5/mdtest-easy-dne1-dom
      # for i in `seq 0 7`; do
      lfs mkdir -i $i /exa5/mdtest-easy-dne1-dom/$i
      lfs setstripe -E 1m -L mdt /exa5/mdtest-easy-dne1-dom/$i
      done
      

      File Creation

      # salloc -p 40n --nodelist=ec[01-04,08-10,15,18-19] --nodes=10 --ntasks-per-node=32 mpirun --bind-to core:overload-allowed --allow-run-as-root ./b
      in/mdtest -n 1000000 -u -L -F -P -G 1664431930 -d /exa5/mdtest-easy-dne1-dom/0@/exa5/mdtest-easy-dne1-dom/1@/exa5/mdtest-easy-dne1-dom/2@/exa5/m
      dtest-easy-dne1-dom/3@/exa5/mdtest-easy-dne1-dom/4@/exa5/mdtest-easy-dne1-dom/5@/exa5/mdtest-easy-dne1-dom/6@/exa5/mdtest-easy-dne1-dom/7 -x /ex
      a5/mdtest-easy.stonewall -C -Y -W 300 -a POSIX                                                                                                  
      
                                                                                        
      SUMMARY rate: (of 1 iterations)                                                                                                                 
         Operation                     Max            Min           Mean        Std Dev                                                               
         ---------                     ---            ---           ----        -------                                                               
         File creation              438695.359     438695.359     438695.359          0.000                                                           
         File stat                       0.000          0.000          0.000          0.000                                                           
         File read                       0.000          0.000          0.000          0.000                                                           
         File removal                    0.000          0.000          0.000          0.000                                                           
         Tree creation                  61.855         61.855         61.855          0.000                                                           
         Tree removal                    0.000          0.000          0.000          0.000                                                           
                                                                                                                                                      
      SUMMARY time: (of 1 iterations)                                                                                                                 
         Operation                     Max            Min           Mean        Std Dev                                                               
         ---------                     ---            ---           ----        -------                                                               
         File creation                 330.342        330.342        330.342          0.000                                                           
         File stat                       0.000          0.000          0.000          0.000                                                           
         File read                       0.000          0.000          0.000          0.000                                                           
         File removal                    0.000          0.000          0.000          0.000                                                           
         Tree creation                   0.016          0.016          0.016          0.000                                                           
         Tree removal                    0.000          0.000          0.000          0.000                                                           
      

      File Removal

      # salloc -p 40n --nodelist=ec[01-04,08-10,15,18-19] --nodes=10 --ntasks-per-node=32 mpirun --bind-to core:overload-allowed 
      --allow-run-as-root ./bin/mdtest -n 1000000 -u -L -F -P -G 1664431930 -d /exa5/mdtest-easy-dne1-dom/0@/exa5/mdtest-easy-dne1-dom/1@/exa5/mdtest-
      easy-dne1-dom/2@/exa5/mdtest-easy-dne1-dom/3@/exa5/mdtest-easy-dne1-dom/4@/exa5/mdtest-easy-dne1-dom/5@/exa5/mdtest-easy-dne1-dom/6@/exa5/mdtest
      -easy-dne1-dom/7 -x /exa5/mdtest-easy.stonewall -r -Y -a POSIX                                                                                  
      salloc: Granted job allocation 8842                                                                                                             
      
      SUMMARY rate: (of 1 iterations)
         Operation                     Max            Min           Mean        Std Dev
         ---------                     ---            ---           ----        -------
         File creation                   0.000          0.000          0.000          0.000
         File stat                       0.000          0.000          0.000          0.000
         File read                       0.000          0.000          0.000          0.000
         File removal               403932.020     403932.020     403932.020          0.000
         Tree creation                   0.000          0.000          0.000          0.000
         Tree removal                    0.376          0.376          0.376          0.000
      
      SUMMARY time: (of 1 iterations)
         Operation                     Max            Min           Mean        Std Dev
         ---------                     ---            ---           ----        -------
         File creation                   0.000          0.000          0.000          0.000
         File stat                       0.000          0.000          0.000          0.000
         File read                       0.000          0.000          0.000          0.000
         File removal                  358.772        358.772        358.772          0.000
         Tree creation                   0.000          0.000          0.000          0.000
         Tree removal                    2.658          2.658          2.658          0.000
      

      DNE2 + DoM

      # lfs setdirstripe -c 8 /exa5/mdtest-easy-dne2-dom
      # lfs setdirstripe -c 8 -D /exa5/mdtest-easy-dne2-dom
      # lfs setstripe -E 1m -L mdt /exa5/mdtest-easy-dne2-dom
      

      File Creation

      # salloc -p 40n --nodelist=ec[01-04,08-10,15,18-19] --nodes=10 --ntasks-per-node=32 mpirun --bind-to core:overload-allowed --allow-run-as-root ./bin/mdtest -n 1000000 -u -L -F -P -G 1664431930 -d /exa5/mdtest-easy-dne2-dom/ -x /exa5/mdtest-easy.stonewall -C -Y -W 300 -a POSIX            
                                                                                                                                                                                       
      SUMMARY rate: (of 1 iterations)                                                                                                                 
         Operation                     Max            Min           Mean        Std Dev                                                               
         ---------                     ---            ---           ----        -------                                                               
         File creation              231527.436     231527.436     231527.436          0.000                                                           
         File stat                       0.000          0.000          0.000          0.000                                                           
         File read                       0.000          0.000          0.000          0.000                                                           
         File removal                    0.000          0.000          0.000          0.000                                                           
         Tree creation                   6.607          6.607          6.607          0.000                                                           
         Tree removal                    0.000          0.000          0.000          0.000                                                           
                                                                                                                                                      
      SUMMARY time: (of 1 iterations)                                                                                                                 
         Operation                     Max            Min           Mean        Std Dev                                                               
         ---------                     ---            ---           ----        -------                                                               
         File creation                 887.163        887.163        887.163          0.000                                                           
         File stat                       0.000          0.000          0.000          0.000                                                           
         File read                       0.000          0.000          0.000          0.000                                                           
         File removal                    0.000          0.000          0.000          0.000                                                           
         Tree creation                   0.151          0.151          0.151          0.000                                                           
         Tree removal                    0.000          0.000          0.000          0.000                                                           
      

      File removal

      # salloc -p 40n --nodelist=ec[01-04,08-10,15,18-19] --nodes=10 --ntasks-per-node=32 mpirun --bind-to core:overload-allowed 
      --allow-run-as-root ./bin/mdtest -n 1000000 -u -L -F -P -G 1664431930 -d /exa5/mdtest-easy-dne2-dom/ -x /exa5/mdtest-easy.stonewall -r -Y -a POS
      IX                                                                                                                                              
      
      SUMMARY rate: (of 1 iterations)
         Operation                     Max            Min           Mean        Std Dev
         ---------                     ---            ---           ----        -------
         File creation                   0.000          0.000          0.000          0.000
         File stat                       0.000          0.000          0.000          0.000
         File read                       0.000          0.000          0.000          0.000
         File removal               222840.053     222840.053     222840.053          0.000
         Tree creation                   0.000          0.000          0.000          0.000
         Tree removal                    1.159          1.159          1.159          0.000
      
      SUMMARY time: (of 1 iterations)
         Operation                     Max            Min           Mean        Std Dev
         ---------                     ---            ---           ----        -------
         File creation                   0.000          0.000          0.000          0.000
         File stat                       0.000          0.000          0.000          0.000
         File read                       0.000          0.000          0.000          0.000
         File removal                  921.749        921.749        921.749          0.000
         Tree creation                   0.000          0.000          0.000          0.000
         Tree removal                    0.863          0.863          0.863          0.000
      

      See attached. Not sure if degradation at file creation and file removel are same problem, but it might be different problem.
      In file creation, the peak performance of DNE2 is close of DNE1 at the first 30sec, but the performance dropped significantly after 30sec.
      However, file removal in DNE2 is slower than DNE1 overall.

      Attachments

        1. DoM-DNE1.png
          DoM-DNE1.png
          250 kB
        2. DoM-DNE2.png
          DoM-DNE2.png
          345 kB

        Issue Links

          Activity

            People

              laisiyao Lai Siyao
              sihara Shuichi Ihara
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated: