Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14172

DIR Stat performance regression in striped dir

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.14.0, Lustre 2.12.6
    • Lustre 2.12.6
    • None
    • 2
    • 9223372036854775807

    Description

       There is a metadata (DIR Stat) performance regression in 2.12.6 (RC1). It looks like that regression is exist in the part of striped directory and server side.
      Here is a reproducer and test results.

      client: version=2.12.6_RC1_1_g327c8b7
      server: version=2.12.6_RC1_1_g327c8b7 or lustre-2.12.5
      
      # mkdir /ai400x/mdt0
      # lfs setdirstripe -c 4 /ai400x/mdt_stripe
      # lfs setdirstripe -c 4 -D /ai400x/mdt_stripe
      
      #  salloc -p 40n -N 40 --ntasks-per-node=16  mpirun -mca btl_openib_if_include mlx5_1:1 -x UCX_NET_DEVICES=mlx5_1:1 --bind-to core:overload-allowed --allow-run-as-root /work/tools/bin/mdtest -i 3 -p 10 -n 1500 -u -D -d $PATH
      

      Single MDT without DNE
      Server: Lustre-2.12.5

      SUMMARY rate: (of 3 iterations)
         Operation                      Max            Min           Mean        Std Dev
         ---------                      ---            ---           ----        -------
         Directory creation        :      54315.552      50037.128      52618.576       1855.106
         Directory stat            :     186516.109     184354.609     185726.143        972.887
         Directory removal         :      66572.651      64990.546      65627.103        681.777
         Tree creation             :         46.771         24.099         36.301          9.336
         Tree removal              :         16.926         13.890         15.720          1.316
      

      Server: Lustre-2.12.6-RC1

      SUMMARY rate: (of 3 iterations)
         Operation                      Max            Min           Mean        Std Dev
         ---------                      ---            ---           ----        -------
         Directory creation        :      68098.113      59185.349      62208.643       4164.966
         Directory stat            :     193338.869     192650.348     193031.824        285.743
         Directory removal         :      65905.804      64842.618      65212.728        490.440
         Tree creation             :         44.234         33.906         39.452          4.251
         Tree removal              :         17.024         15.068         16.279          0.864
      

      Stripe Directory across four MDTs
      Server: Lustre-2.12.5

      SUMMARY rate: (of 3 iterations)
         Operation                      Max            Min           Mean        Std Dev
         ---------                      ---            ---           ----        -------
         Directory creation        :       6385.748       5929.670       6113.851        196.251
         Directory stat            :     166190.895     162991.180     164733.372       1321.263
         Directory removal         :       4789.518       4294.122       4584.600        211.099
         Tree creation             :         13.200          1.102          6.937          4.948
         Tree removal              :          9.126          8.479          8.810          0.264
      

      Server: Lustre-2.12.6-RC1

      SUMMARY rate: (of 3 iterations)
         Operation                      Max            Min           Mean        Std Dev
         ---------                      ---            ---           ----        -------
         Directory creation        :       6694.539       6505.265       6613.160         79.512
         Directory stat            :      49873.850      48817.530      49260.117        447.881   <--- This is regression.
         Directory removal         :       4768.841       4253.124       4592.927        240.327
         Tree creation             :         13.490          0.705          7.321          5.229
         Tree removal              :          9.051          8.441          8.774          0.252
      

      Attachments

        Issue Links

          Activity

            [LU-14172] DIR Stat performance regression in striped dir
            pjones Peter Jones added a comment -

            Landed for 2.14 and 2.12.6

            pjones Peter Jones added a comment - Landed for 2.14 and 2.12.6

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40863/
            Subject: LU-14172 lmv: optimize dir shard revalidate
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: de47c7671f29b2a3a79f6a126b7e01f0b2c5991a

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40863/ Subject: LU-14172 lmv: optimize dir shard revalidate Project: fs/lustre-release Branch: master Current Patch Set: Commit: de47c7671f29b2a3a79f6a126b7e01f0b2c5991a

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40875/
            Subject: LU-14172 lmv: optimize dir shard revalidate
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set:
            Commit: 94ec63ed67c6f09a2b15b2227ef6b189df623f4d

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40875/ Subject: LU-14172 lmv: optimize dir shard revalidate Project: fs/lustre-release Branch: b2_12 Current Patch Set: Commit: 94ec63ed67c6f09a2b15b2227ef6b189df623f4d

            Here is final test results apple to apple.

            # mkdir /ai400x/mdt0
            # lfs setdirstripe -c 4 /ai400x/mdt_stripe
            # lfs setdirstripe -c 4 -D /ai400x/mdt_stripe
            
            #  salloc -p 40n -N 40 --ntasks-per-node=16  mpirun -mca btl_openib_if_include mlx5_1:1 -x UCX_NET_DEVICES=mlx5_1:1 --bind-to core:overload-allowed --allow-run-as-root /work/tools/bin/mdtest -i 3 -p 10 -n 1500 -u -D -d $PATH
            

            2.12.5

               Operation                      Max            Min           Mean        Std Dev
               ---------                      ---            ---           ----        -------
               Directory creation        :       6173.268       5805.240       5933.825        169.465
               Directory stat            :     151800.690     148071.970     150256.305       1587.764
               Directory removal         :       4648.674       4173.113       4417.583        194.376
               Tree creation             :         12.984          0.756          6.940          4.993
               Tree removal          
            

            2.12.6-RC1

               Operation                      Max            Min           Mean        Std Dev
               ---------                      ---            ---           ----        -------
               Directory creation        :       6344.954       5834.617       6020.427        230.277
               Directory stat            :      44887.807      43460.779      43964.038        654.049
               Directory removal         :       4559.802       4114.146       4392.390        198.099
               Tree creation             :         13.336          0.734          7.153          5.148
               Tree removal              :          8.723          8.120          8.359          0.261
            

            2.12.6-RC1 + https://review.whamcloud.com/40872

               Operation                      Max            Min           Mean        Std Dev
               ---------                      ---            ---           ----        -------
               Directory creation        :       6117.054       5850.628       5941.143        124.404
               Directory stat            :     151638.423     143319.490     148509.338       3695.492
               Directory removal         :       4498.161       3971.102       4219.711        216.202
               Tree creation             :         12.974          0.990          8.916          5.605
               Tree removal              :          8.616          8.349          8.458          0.114
            

            2.12.6-RC1 + patch https://review.whamcloud.com/40875

               Operation                      Max            Min           Mean        Std Dev
               ---------                      ---            ---           ----        -------
               Directory creation        :       6328.334       5993.743       6113.977        151.946
               Directory stat            :     154744.046     148145.747     152434.570       3035.537
               Directory removal         :       4628.371       4174.092       4457.011        201.538
               Tree creation             :         13.789          1.132          7.503          5.167
               Tree removal              :          8.654          8.373          8.499          0.117
            

            I think that patch 40875 solves the regression and the numbers are consistent.

            sihara Shuichi Ihara added a comment - Here is final test results apple to apple. # mkdir /ai400x/mdt0 # lfs setdirstripe -c 4 /ai400x/mdt_stripe # lfs setdirstripe -c 4 -D /ai400x/mdt_stripe # salloc -p 40n -N 40 --ntasks-per-node=16 mpirun -mca btl_openib_if_include mlx5_1:1 -x UCX_NET_DEVICES=mlx5_1:1 --bind-to core:overload-allowed --allow-run-as-root /work/tools/bin/mdtest -i 3 -p 10 -n 1500 -u -D -d $PATH 2.12.5 Operation Max Min Mean Std Dev --------- --- --- ---- ------- Directory creation : 6173.268 5805.240 5933.825 169.465 Directory stat : 151800.690 148071.970 150256.305 1587.764 Directory removal : 4648.674 4173.113 4417.583 194.376 Tree creation : 12.984 0.756 6.940 4.993 Tree removal 2.12.6-RC1 Operation Max Min Mean Std Dev --------- --- --- ---- ------- Directory creation : 6344.954 5834.617 6020.427 230.277 Directory stat : 44887.807 43460.779 43964.038 654.049 Directory removal : 4559.802 4114.146 4392.390 198.099 Tree creation : 13.336 0.734 7.153 5.148 Tree removal : 8.723 8.120 8.359 0.261 2.12.6-RC1 + https://review.whamcloud.com/40872 Operation Max Min Mean Std Dev --------- --- --- ---- ------- Directory creation : 6117.054 5850.628 5941.143 124.404 Directory stat : 151638.423 143319.490 148509.338 3695.492 Directory removal : 4498.161 3971.102 4219.711 216.202 Tree creation : 12.974 0.990 8.916 5.605 Tree removal : 8.616 8.349 8.458 0.114 2.12.6-RC1 + patch https://review.whamcloud.com/40875 Operation Max Min Mean Std Dev --------- --- --- ---- ------- Directory creation : 6328.334 5993.743 6113.977 151.946 Directory stat : 154744.046 148145.747 152434.570 3035.537 Directory removal : 4628.371 4174.092 4457.011 201.538 Tree creation : 13.789 1.132 7.503 5.167 Tree removal : 8.654 8.373 8.499 0.117 I think that patch 40875 solves the regression and the numbers are consistent.

            Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40875
            Subject: LU-14172 lmv: optimize dir shard revalidate
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: 0d603e858ee236c779516d7672c14deaa6749e5c

            gerrit Gerrit Updater added a comment - Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40875 Subject: LU-14172 lmv: optimize dir shard revalidate Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: 0d603e858ee236c779516d7672c14deaa6749e5c

            It looks that patch https://review.whamcloud.com/40863 fixes regression after patch applied both server and client side. Previous test was that the patch only applied on server side, but I realized changes in patch contained both server and client.

               Operation                      Max            Min           Mean        Std Dev
               ---------                      ---            ---           ----        -------
               Directory creation        :       6417.575       6007.465       6186.944        171.288
               Directory stat            :     143940.330     139376.396     141106.210       2020.042
               Directory removal         :       4677.840       4377.965       4569.627        135.902
               Tree creation             :         13.348          0.656          5.006          5.901
               Tree removal              :          8.832          8.782          8.810          0.021
            

            This numbers is still a bit lower than 2.12.5, but I don't have baseline number on master without this regression impacts. So, it might be other issues in master if we compare against 2.12.5.
            Anyway, for b2_12, let me back b2_12 and check with backport patch Lai provided if the performance is back as same level of 2.12.5.

            sihara Shuichi Ihara added a comment - It looks that patch https://review.whamcloud.com/40863 fixes regression after patch applied both server and client side. Previous test was that the patch only applied on server side, but I realized changes in patch contained both server and client. Operation Max Min Mean Std Dev --------- --- --- ---- ------- Directory creation : 6417.575 6007.465 6186.944 171.288 Directory stat : 143940.330 139376.396 141106.210 2020.042 Directory removal : 4677.840 4377.965 4569.627 135.902 Tree creation : 13.348 0.656 5.006 5.901 Tree removal : 8.832 8.782 8.810 0.021 This numbers is still a bit lower than 2.12.5, but I don't have baseline number on master without this regression impacts. So, it might be other issues in master if we compare against 2.12.5. Anyway, for b2_12, let me back b2_12 and check with backport patch Lai provided if the performance is back as same level of 2.12.5.

            Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40872
            Subject: LU-14172 mds: disable GETATTR_PFID feature
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: df61386547f026e6d4f6ca7878d1485d15f7e784

            gerrit Gerrit Updater added a comment - Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40872 Subject: LU-14172 mds: disable GETATTR_PFID feature Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: df61386547f026e6d4f6ca7878d1485d15f7e784

            Here is test results on master branch (commit:e5c8f66) and reproduced same regression in DIR stat that I saw on lustre-2.12.6-RC1.

               Operation                      Max            Min           Mean        Std Dev
               ---------                      ---            ---           ----        -------
               Directory creation        :       6323.394       6141.754       6238.210         74.580
               Directory stat            :      48295.593      46827.765      47645.451        610.794
               Directory removal         :       4336.014       4274.571       4315.516         28.952
               Tree creation             :         11.842          0.614          4.587          5.138
               Tree removal              :          9.204          8.894          9.048          0.126
            

            And, unfortunueotry, patch 40863 against master doesn't solve problem.

               Operation                      Max            Min           Mean        Std Dev
               ---------                      ---            ---           ----        -------
               Directory creation        :       6437.097       6084.071       6279.844        146.672
               Directory stat            :      47235.709      44762.233      46347.987       1123.868
               Directory removal         :       4745.993       4348.202       4504.821        173.053
               Tree creation             :          6.530          0.789          2.762          2.665
               Tree removal              :          8.983          8.477          8.741          0.207
            
            sihara Shuichi Ihara added a comment - Here is test results on master branch (commit:e5c8f66) and reproduced same regression in DIR stat that I saw on lustre-2.12.6-RC1. Operation Max Min Mean Std Dev --------- --- --- ---- ------- Directory creation : 6323.394 6141.754 6238.210 74.580 Directory stat : 48295.593 46827.765 47645.451 610.794 Directory removal : 4336.014 4274.571 4315.516 28.952 Tree creation : 11.842 0.614 4.587 5.138 Tree removal : 9.204 8.894 9.048 0.126 And, unfortunueotry, patch 40863 against master doesn't solve problem. Operation Max Min Mean Std Dev --------- --- --- ---- ------- Directory creation : 6437.097 6084.071 6279.844 146.672 Directory stat : 47235.709 44762.233 46347.987 1123.868 Directory removal : 4745.993 4348.202 4504.821 173.053 Tree creation : 6.530 0.789 2.762 2.665 Tree removal : 8.983 8.477 8.741 0.207

            Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40863
            Subject: LU-14172 lmv: optimize dir shard revalidate
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 1e31225721c98ab48c8a4572cc59b3661cbe1dda

            gerrit Gerrit Updater added a comment - Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40863 Subject: LU-14172 lmv: optimize dir shard revalidate Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 1e31225721c98ab48c8a4572cc59b3661cbe1dda
            laisiyao Lai Siyao added a comment -

            Yes, and the cause is that directory stripe revalidate takes more time in checking it's a stripe (see mdt_object_is_shard()), I made a simple fix and the result looks good, I'll tidy it up and push later.

            laisiyao Lai Siyao added a comment - Yes, and the cause is that directory stripe revalidate takes more time in checking it's a stripe (see mdt_object_is_shard()), I made a simple fix and the result looks good, I'll tidy it up and push later.

            People

              laisiyao Lai Siyao
              sihara Shuichi Ihara
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: