Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14146

Massive directory metadata operation performance decrease

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • Lustre 2.13.0, Lustre 2.14.0
    • None
    • RHEL7 running the latest master.
    • 3
    • 9223372036854775807

    Description

      While comparing the results of Lustre 2.12 LTS and the latest master version of Lustre a noticeable decrease of performance was seen with mdtest. I did a git bisected to discover the source of this regression to be https://review.whamcloud.com/#/c/35825. The results are as follows before and after the patch landed:

      mdtest-3.4.0+dev was launched with 54 total task(s) on 9 node(s)

      Command line used: /lustre/crius/stf008/scratch/jsimmons/x86_64/mdtest '-n' '1000' '-p' '10' '-e' '4096' '-w' '4096' '-i' '5' '-z' '2' '-d' '/lustre/crius/stf008/scratch/jsimmons/test_mdtest'

      Path: /lustre/crius/stf008/scratch/jsimmons

      FS: 806.0 TiB   Used FS: 0.0%   Inodes: 4298.4 Mi   Used Inodes: 0.0%

       

      Nodemap: 111111000000000000000000000000000000000000000000000000

      54 tasks, 53946 files/directories

       

      SUMMARY rate: (of 5 iterations)

         Operation                      Max            Min           Mean        Std Dev

         ---------                      ---            —           ----        -------

         Directory creation        :      10929.296      10229.518      10551.707        269.772

         Directory stat            :      45397.727      44566.564      45101.666        285.915

         Directory removal         :      14509.663      13822.493      14198.406        282.821

         File creation             :       6180.597       6097.217       6142.435         30.776

         File stat                 :      43473.036      31895.809      37446.331       4316.809

         File read                 :      18142.575      16228.362      17383.867        750.963

         File removal              :       7412.350       7061.313       7227.328        118.574

         Tree creation             :       3478.676       2899.108       3328.345        219.993

         Tree removal              :        764.549        583.999        672.962         59.213

      – finished at 11/20/2020 10:55:32 –

      And after landing the patch:

      mdtest-3.4.0+dev was launched with 54 total task(s) on 9 node(s)

      Command line used: /lustre/crius/stf008/scratch/jsimmons/x86_64/mdtest '-n' '1000' '-p' '10' '-e' '4096' '-w' '4096' '-i' '5' '-z' '2' '-d' '/lustre/crius/stf008/scratch/jsimmons/test_mdtest'

      Path: /lustre/crius/stf008/scratch/jsimmons

      FS: 806.0 TiB   Used FS: 0.0%   Inodes: 4667.2 Mi   Used Inodes: 0.0%

       

      Nodemap: 111111000000000000000000000000000000000000000000000000

      54 tasks, 53946 files/directories

       

      SUMMARY rate: (of 5 iterations)

         Operation                      Max            Min           Mean        Std Dev

         ---------                      ---            —           ----        -------

         Directory creation        :       1823.563       1497.613       1687.840        105.551

         Directory stat            :      26132.733      18515.334      23994.365       2847.665

         Directory removal         :       2721.120       1783.451       2383.377        329.561

         File creation             :       6880.575       6428.112       6702.467        153.483

         File stat                 :      44519.556      38352.962      42705.219       2270.727

         File read                 :      19180.528      18379.633      18696.723        276.664

         File removal              :       9229.889       8597.003       8889.050        222.742

         Tree creation             :         48.123         42.574         46.095          1.908

         Tree removal              :         39.628         10.159         28.961          9.911

      – finished at 11/20/2020 10:18:56 –

      Attachments

        Issue Links

          Activity

            [LU-14146] Massive directory metadata operation performance decrease
            pjones Peter Jones added a comment -

            James

            While I understand that there are ongoing investigations on how to address your performance issues I don't think that these are unique to 2.14

            Peter

            pjones Peter Jones added a comment - James While I understand that there are ongoing investigations on how to address your performance issues I don't think that these are unique to 2.14 Peter
            simmonsja James A Simmons added a comment - - edited

            I'm using 48 MDTs (2 per MDS). This is with ZFS. The main function costing the most time is 

            dt_declare_create() being called by lod_sub_declare_create(). I wonder if we need a precreate like OST have.

            simmonsja James A Simmons added a comment - - edited I'm using 48 MDTs (2 per MDS). This is with ZFS. The main function costing the most time is  dt_declare_create() being called by lod_sub_declare_create(). I wonder if we need a precreate like OST have.

            What is your testing setup?

            My configurartion was included in posted my results, but it was two MDSs and two MDTs and I used exact same mdtest options you tested below.

            [root@ec01 ~]# lfs setdirstripe -c 2 /ai400x/mdt_stripe
            [root@ec01 ~]# lfs setdirstripe -c 2 -D /ai400x/mdt_stripe
            [root@ec01 ~]#  salloc -p 40n -N 40 --ntasks-per-node=8  mpirun -mca btl_openib_if_include mlx5_1:1 -x UCX_NET_DEVICES=mlx5_1:1 --bind-to core:overload-allowed --allow-run-as-root /work/tools/bin/mdtest -n 1000 -p 10 -e 4096 -w 4096 -i 5 -z 2 -d /ai400x/mdt_stripe/
            
            sihara Shuichi Ihara added a comment - What is your testing setup? My configurartion was included in posted my results, but it was two MDSs and two MDTs and I used exact same mdtest options you tested below. [root@ec01 ~]# lfs setdirstripe -c 2 /ai400x/mdt_stripe [root@ec01 ~]# lfs setdirstripe -c 2 -D /ai400x/mdt_stripe [root@ec01 ~]# salloc -p 40n -N 40 --ntasks-per-node=8 mpirun -mca btl_openib_if_include mlx5_1:1 -x UCX_NET_DEVICES=mlx5_1:1 --bind-to core:overload-allowed --allow-run-as-root /work/tools/bin/mdtest -n 1000 -p 10 -e 4096 -w 4096 -i 5 -z 2 -d /ai400x/mdt_stripe/

            Fire have been put out. I'm looking at this now.

            simmonsja James A Simmons added a comment - Fire have been put out. I'm looking at this now.

            What is your testing setup?

            simmonsja James A Simmons added a comment - What is your testing setup?

            I'm using this setup:

            lfs setdirstripe -c $MDTCOUNT -i -1 $OUTDIR       

            lfs setdirstripe -D -c $MDTCOUNT -i -1 $OUTDIR       

            lfs setstripe -c $OSTCOUNT $OUTDIR

            and mdtest (latest) command is:

            usr/lib64/openmpi/bin/mpirun -npernode 6 -mca pml ob1 -mca btl openib,sm,self -bind-to core:overload-allowed --allow-run-as-root -machinefile $BINDIR/$(arch)/hostfile $BINDIR/$(arch)/mdtest -n 1000 -p 10 -e 4096 -w 4096 -i $ITER -z 2 -d $OUTDIR

            simmonsja James A Simmons added a comment - I'm using this setup: lfs setdirstripe -c $MDTCOUNT -i -1 $OUTDIR        lfs setdirstripe -D -c $MDTCOUNT -i -1 $OUTDIR        lfs setstripe -c $OSTCOUNT $OUTDIR and mdtest (latest) command is: usr/lib64/openmpi/bin/mpirun -npernode 6 -mca pml ob1 -mca btl openib,sm,self -bind-to core:overload-allowed --allow-run-as-root -machinefile $BINDIR/$(arch)/hostfile $BINDIR/$(arch)/mdtest -n 1000 -p 10 -e 4096 -w 4096 -i $ITER -z 2 -d $OUTDIR

            James, I still can't repo your problem on my test system and results with 2.12.5 and master are still consistent. Would you have an chance to test on the latest master again?

               Operation                      Max            Min           Mean        Std Dev
               ---------                      ---            —           ----        -------
               Directory creation        :      10929.296      10229.518      10551.707        269.772
               Directory stat            :      45397.727      44566.564      45101.666        285.915
               Directory removal         :      14509.663      13822.493      14198.406        282.821
            

            btw, above your higher Dir creation and removal, I wonder if you had -D (inherited) option in 'lfs setdirstripe' properly?

            laisiyao sorry confusions, what I wanted to say, I couldn't see any regressions in master on test system. Please see my posted results you already realized though.

            sihara Shuichi Ihara added a comment - James, I still can't repo your problem on my test system and results with 2.12.5 and master are still consistent. Would you have an chance to test on the latest master again? Operation Max Min Mean Std Dev --------- --- — ---- ------- Directory creation : 10929.296 10229.518 10551.707 269.772 Directory stat : 45397.727 44566.564 45101.666 285.915 Directory removal : 14509.663 13822.493 14198.406 282.821 btw, above your higher Dir creation and removal, I wonder if you had -D (inherited) option in 'lfs setdirstripe' properly? laisiyao sorry confusions, what I wanted to say, I couldn't see any regressions in master on test system. Please see my posted results you already realized though.
            laisiyao Lai Siyao added a comment -

            Hi Ihara, can you help create flamegraph on both client and MDS in your test?

            laisiyao Lai Siyao added a comment - Hi Ihara, can you help create flamegraph on both client and MDS in your test?
            simmonsja James A Simmons added a comment - - edited

            Excellent. Let me try the latest master then. Looking at the fix I think it only addressed the stats issues. Not the creation and removal of directories. Removal + creation rates are 1/10 what 2.12 LTS can do.

            simmonsja James A Simmons added a comment - - edited Excellent. Let me try the latest master then. Looking at the fix I think it only addressed the stats issues. Not the creation and removal of directories. Removal + creation rates are 1/10 what 2.12 LTS can do.
            sihara Shuichi Ihara added a comment - - edited

            Hm. I didn't confirm yet regressions on my test enviorment that was 2 x MDS/MDT, 4 x OSS/OST and 40 clients, 320 processes.

            Single MDT, no DNE setup

            [root@ec01 ~]# mkdir /ai400x/mdt0/
            [root@ec01 ~]# salloc -p 40n -N 40 --ntasks-per-node=8  mpirun -mca btl_openib_if_include mlx5_1:1 -x UCX_NET_DEVICES=mlx5_1:1 --bind-to core:overload-allowed --allow-run-as-root /work/tools/bin/mdtest -n 1000 -p 10 -e 4096 -w 4096 -i 5 -z 2 -d /ai400x/mdt0/
            
            lustre-2.12.5
            SUMMARY rate: (of 5 iterations)
               Operation                      Max            Min           Mean        Std Dev
               ---------                      ---            ---           ----        -------
               Directory creation        :      40166.668      29889.711      36926.039       3721.687
               Directory stat            :     181972.127     163686.767     171839.868       6830.690
               Directory removal         :      72596.455      64023.722      67605.022       2865.954
               File creation             :      61473.277      33357.894      49626.877       8756.461
               File stat                 :     182319.720     172986.277     176813.231       3043.802
               File read                 :      96716.113      91506.710      94630.270       1908.325
               File removal              :      73915.610      71204.711      72434.411       1189.090
               Tree creation             :       4883.894       4224.418       4489.395        238.875
               Tree removal              :        121.542        119.264        120.320          0.870
            
            master (commit: e5c8f66)
            SUMMARY rate: (of 5 iterations)
               Operation                      Max            Min           Mean        Std Dev
               ---------                      ---            ---           ----        -------
               Directory creation        :      42269.194      40350.392      41677.374        700.045
               Directory stat            :     169511.255     151004.927     160570.614       7062.870
               Directory removal         :      73562.337      66378.685      71053.900       2461.351
               File creation             :      71462.132      38186.018      55635.280       8982.025
               File stat                 :     320154.330     289927.273     309750.141      10796.857
               File read                 :      88594.789      76983.081      83738.015       3793.636
               File removal              :      69072.712      62536.441      65716.920       2125.631
               Tree creation             :       4713.705         32.602       3367.272       1702.228
               Tree removal              :        280.514         17.496        193.251         95.416
            

            Two MDS/MDT, DNE setup

            [root@ec01 ~]# lfs setdirstripe -c 2 /ai400x/mdt_stripe
            [root@ec01 ~]# lfs setdirstripe -c 2 -D /ai400x/mdt_stripe
            [root@ec01 ~]#  salloc -p 40n -N 40 --ntasks-per-node=8  mpirun -mca btl_openib_if_include mlx5_1:1 -x UCX_NET_DEVICES=mlx5_1:1 --bind-to core:overload-allowed --allow-run-as-root /work/tools/bin/mdtest -n 1000 -p 10 -e 4096 -w 4096 -i 5 -z 2 -d /ai400x/mdt_stripe/
            
            lustre-2.12.5
            SUMMARY rate: (of 5 iterations)
               Operation                      Max            Min           Mean        Std Dev
               ---------                      ---            ---           ----        -------
               Directory creation        :       4091.011       3697.938       3995.214        150.244
               Directory stat            :     160784.657     158579.052     159864.416        885.088
               Directory removal         :       3346.025       3289.510       3319.668         18.116
               File creation             :      71590.829      36867.505      61846.370      11509.343
               File stat                 :     353953.112     316962.501     339006.051      13982.944
               File read                 :     185607.391     180289.664     182559.629       1791.647
               File removal              :     129448.873     127389.601     128672.603        719.608
               Tree creation             :        543.402          3.326        111.930        215.737
               Tree removal              :        116.905         97.208        104.334          6.869
            
            
            master (commit: e5c8f66)
            SUMMARY rate: (of 5 iterations)
               Operation                      Max            Min           Mean        Std Dev
               ---------                      ---            ---           ----        -------
               Directory creation        :       4244.489       4153.787       4204.182         35.519
               Directory stat            :      45417.105      44573.071      45017.015        327.182
               Directory removal         :       3253.162       3166.240       3206.250         34.838
               File creation             :     103608.274      64457.023      91383.228      10534.101
               File stat                 :     513544.947     489825.324     505082.825       9879.991
               File read                 :     169268.803     160600.607     165519.732       3198.057
               File removal              :     116843.421     111635.972     114741.924       1985.082
               Tree creation             :        189.871          4.421         42.099         73.888
               Tree removal              :        218.595        190.424        208.575         10.343
             

            We know a regression for DIR stat in DNE setup with master branch. That's a known issue in LU-14172 and patch https://review.whamcloud.com/#/c/40863/ solved problem.

            sihara Shuichi Ihara added a comment - - edited Hm. I didn't confirm yet regressions on my test enviorment that was 2 x MDS/MDT, 4 x OSS/OST and 40 clients, 320 processes. Single MDT, no DNE setup [root@ec01 ~]# mkdir /ai400x/mdt0/ [root@ec01 ~]# salloc -p 40n -N 40 --ntasks-per-node=8 mpirun -mca btl_openib_if_include mlx5_1:1 -x UCX_NET_DEVICES=mlx5_1:1 --bind-to core:overload-allowed --allow-run-as-root /work/tools/bin/mdtest -n 1000 -p 10 -e 4096 -w 4096 -i 5 -z 2 -d /ai400x/mdt0/ lustre-2.12.5 SUMMARY rate: (of 5 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- Directory creation : 40166.668 29889.711 36926.039 3721.687 Directory stat : 181972.127 163686.767 171839.868 6830.690 Directory removal : 72596.455 64023.722 67605.022 2865.954 File creation : 61473.277 33357.894 49626.877 8756.461 File stat : 182319.720 172986.277 176813.231 3043.802 File read : 96716.113 91506.710 94630.270 1908.325 File removal : 73915.610 71204.711 72434.411 1189.090 Tree creation : 4883.894 4224.418 4489.395 238.875 Tree removal : 121.542 119.264 120.320 0.870 master (commit: e5c8f66) SUMMARY rate: (of 5 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- Directory creation : 42269.194 40350.392 41677.374 700.045 Directory stat : 169511.255 151004.927 160570.614 7062.870 Directory removal : 73562.337 66378.685 71053.900 2461.351 File creation : 71462.132 38186.018 55635.280 8982.025 File stat : 320154.330 289927.273 309750.141 10796.857 File read : 88594.789 76983.081 83738.015 3793.636 File removal : 69072.712 62536.441 65716.920 2125.631 Tree creation : 4713.705 32.602 3367.272 1702.228 Tree removal : 280.514 17.496 193.251 95.416 Two MDS/MDT, DNE setup [root@ec01 ~]# lfs setdirstripe -c 2 /ai400x/mdt_stripe [root@ec01 ~]# lfs setdirstripe -c 2 -D /ai400x/mdt_stripe [root@ec01 ~]# salloc -p 40n -N 40 --ntasks-per-node=8 mpirun -mca btl_openib_if_include mlx5_1:1 -x UCX_NET_DEVICES=mlx5_1:1 --bind-to core:overload-allowed --allow-run-as-root /work/tools/bin/mdtest -n 1000 -p 10 -e 4096 -w 4096 -i 5 -z 2 -d /ai400x/mdt_stripe/ lustre-2.12.5 SUMMARY rate: (of 5 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- Directory creation : 4091.011 3697.938 3995.214 150.244 Directory stat : 160784.657 158579.052 159864.416 885.088 Directory removal : 3346.025 3289.510 3319.668 18.116 File creation : 71590.829 36867.505 61846.370 11509.343 File stat : 353953.112 316962.501 339006.051 13982.944 File read : 185607.391 180289.664 182559.629 1791.647 File removal : 129448.873 127389.601 128672.603 719.608 Tree creation : 543.402 3.326 111.930 215.737 Tree removal : 116.905 97.208 104.334 6.869 master (commit: e5c8f66) SUMMARY rate: (of 5 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- Directory creation : 4244.489 4153.787 4204.182 35.519 Directory stat : 45417.105 44573.071 45017.015 327.182 Directory removal : 3253.162 3166.240 3206.250 34.838 File creation : 103608.274 64457.023 91383.228 10534.101 File stat : 513544.947 489825.324 505082.825 9879.991 File read : 169268.803 160600.607 165519.732 3198.057 File removal : 116843.421 111635.972 114741.924 1985.082 Tree creation : 189.871 4.421 42.099 73.888 Tree removal : 218.595 190.424 208.575 10.343   We know a regression for DIR stat in DNE setup with master branch. That's a known issue in LU-14172 and patch https://review.whamcloud.com/#/c/40863/ solved problem.

            Also we did test on a single MDT setup and it showed the same results.

            simmonsja James A Simmons added a comment - Also we did test on a single MDT setup and it showed the same results.

            People

              laisiyao Lai Siyao
              simmonsja James A Simmons
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: