Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15546

Shared Directory File Creates regression seen in 2.15 when comparing to 2.12.6

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.15.0
    • Lustre 2.15.0
    • None
    • 3
    • 9223372036854775807

    Description

      When testing mdtest 2.15 (2.14.57) and comparing to 2.12.6, I see a large 25% regression with Shared Directory File Creates. Perf traces (attached) show a lot of extra ldlm overhead.

      #!/bin/bash
      
      NODES=21
      PPN=16
      PROCS=$(( $NODES * $PPN ))
      MDT_COUNT=1
      PAUSED=120
      
      srun -N $NODES --ntasks-per-node $PPN ~bloewe/benchmarks/ior-3.3.0-CentOS-8.2/install/bin/mdtest -v -i 5 -p $PAUSED -C -E -T -r -n $(( $MDT_COUNT * 1048576 / $PROCS )) -d /mnt/kjlmo2/pkoutoupis/mdt0/test.`date +"%Y%m%d.%H%M%S"` 2>&1 |& tee f_mdt0_0k_ost_shared.out
      
      srun -N $NODES --ntasks-per-node $PPN ~bloewe/benchmarks/ior-3.3.0-CentOS-8.2/install/bin/mdtest -v -i 5 -p $PAUSED -C -w 32768 -E -e 32768 -T -r -n $(( $MDT_COUNT * 1048576 / $PROCS )) -d /mnt/kjlmo2/pkoutoupis/mdt0/test.`date +"%Y%m%d.%H%M%S"` 2>&1 |& tee f_mdt0_32k_ost_shared.out
      

      Attachments

        Issue Links

          Activity

            [LU-15546] Shared Directory File Creates regression seen in 2.15 when comparing to 2.12.6
            pjones Peter Jones made changes -
            Link New: This issue is related to DDN-3129 [ DDN-3129 ]

            Shuichi tested my patch but didn't find it changed the performance significantly:

            I did test patch and compared against lustre-2.15.0-RC3.

            • 1 x MDS(1xMDT, 12 CPU cores, 142GB RAM)
            • 4 x OSS(2xOST/OSS)
            • 40 x client(16 CPU cores, 96GB RAM)
            • IB-HDR100 network

            a workload is many processes (640 processes) write huge amount of files (19.2M files) into a single shared directory.

            mpirun -np 640 mdtest -n 30000 -F -v -d /exafs/d0/d1/d2/mdtest.out -C -r -p 30 -i 3
            lustre-2.15.0-RC3
            SUMMARY rate: (of 3 iterations)
               Operation                     Max            Min           Mean        Std Dev
               ---------                     ---            ---           ----        -------
               File creation               64713.164      53431.547      60835.215       6414.183
               File stat                       0.000          0.000          0.000          0.000
               File read                       0.000          0.000          0.000          0.000
               File removal                46277.792      44080.164      45406.512       1167.351
               Tree creation                4629.475       3495.253       4131.971        579.784
               Tree removal                    2.302          2.019          2.137          0.147
            
            lustre-2.15.0-RC3 + patch46696
            SUMMARY rate: (of 3 iterations)
               Operation                     Max            Min           Mean        Std Dev
               ---------                     ---            ---           ----        -------
               File creation               67544.538      52056.964      61429.964       8241.920
               File stat                       0.000          0.000          0.000          0.000
               File read                       0.000          0.000          0.000          0.000
               File removal                45532.402      41724.110      43966.753       1992.363
               Tree creation                4132.319       3472.106       3721.652        358.386
               Tree removal                    2.251          1.837          2.030          0.209
            

            In my test environment, I didn't see huge improvements by patch (might be limited of MDS's CPU resources), but didn't find regressions too.

            adilger Andreas Dilger added a comment - Shuichi tested my patch but didn't find it changed the performance significantly: I did test patch and compared against lustre-2.15.0-RC3. 1 x MDS(1xMDT, 12 CPU cores, 142GB RAM) 4 x OSS(2xOST/OSS) 40 x client(16 CPU cores, 96GB RAM) IB-HDR100 network a workload is many processes (640 processes) write huge amount of files (19.2M files) into a single shared directory. mpirun -np 640 mdtest -n 30000 -F -v -d /exafs/d0/d1/d2/mdtest.out -C -r -p 30 -i 3 lustre-2.15.0-RC3 SUMMARY rate: (of 3 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- File creation 64713.164 53431.547 60835.215 6414.183 File stat 0.000 0.000 0.000 0.000 File read 0.000 0.000 0.000 0.000 File removal 46277.792 44080.164 45406.512 1167.351 Tree creation 4629.475 3495.253 4131.971 579.784 Tree removal 2.302 2.019 2.137 0.147 lustre-2.15.0-RC3 + patch46696 SUMMARY rate: (of 3 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- File creation 67544.538 52056.964 61429.964 8241.920 File stat 0.000 0.000 0.000 0.000 File read 0.000 0.000 0.000 0.000 File removal 45532.402 41724.110 43966.753 1992.363 Tree creation 4132.319 3472.106 3721.652 358.386 Tree removal 2.251 1.837 2.030 0.209 In my test environment, I didn't see huge improvements by patch (might be limited of MDS's CPU resources), but didn't find regressions too.

            "Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/47487
            Subject: LU-15546 mdt: mdt_reint_open lookup before locking
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: cfb5c55e9550a04aa22a5849ac8e86a2dc36eada

            gerrit Gerrit Updater added a comment - "Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/47487 Subject: LU-15546 mdt: mdt_reint_open lookup before locking Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: cfb5c55e9550a04aa22a5849ac8e86a2dc36eada
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-15720 [ LU-15720 ]
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-15692 [ LU-15692 ]

            Petros and Shuichi, the latest version of my patch: https://review.whamcloud.com/46696 "LU-15546 mdt: keep history of mdt_reint_open() lock" looks like it is working properly in my local testing, but needs some benchmarking on real hardware to see whether it provides a performance improvement.

            The patch has been updated to have a per-directory history counter. In my local testing it takes about 128 open+creates (with pre-lookup, like Etienne's just-landed patch) before it gets "into the zone" and speculatively skips the lookup to predict the PW lock mode and skip the pre-lookup. It takes about 16 "bad" lookups in the same directory before it reverts to doing the pre-lookup again, and 256 open-existing before it swings to the opposite end to predict PR locks and skip the pre-lookup.

            Mixed workloads within a single directory will be essentially the same as the current code, so it will always do a pre-lookup in the directory if the open mode doesn't give enough info.

            adilger Andreas Dilger added a comment - Petros and Shuichi, the latest version of my patch: https://review.whamcloud.com/46696 " LU-15546 mdt: keep history of mdt_reint_open() lock " looks like it is working properly in my local testing, but needs some benchmarking on real hardware to see whether it provides a performance improvement. The patch has been updated to have a per-directory history counter. In my local testing it takes about 128 open+creates (with pre-lookup, like Etienne's just-landed patch) before it gets "into the zone" and speculatively skips the lookup to predict the PW lock mode and skip the pre-lookup. It takes about 16 "bad" lookups in the same directory before it reverts to doing the pre-lookup again, and 256 open-existing before it swings to the opposite end to predict PR locks and skip the pre-lookup. Mixed workloads within a single directory will be essentially the same as the current code, so it will always do a pre-lookup in the directory if the open mode doesn't give enough info.
            pjones Peter Jones made changes -
            Resolution New: Fixed [ 1 ]
            Status Original: Open [ 1 ] New: Resolved [ 5 ]
            pjones Peter Jones added a comment -

            As per the discussion on the LWG call, Etienne's patch (which just landed) is what we will address for 2.15. I suggest that the other patches in flight get moved to a new JIRA for possible inclusion in a future release

            pjones Peter Jones added a comment - As per the discussion on the LWG call, Etienne's patch (which just landed) is what we will address for 2.15. I suggest that the other patches in flight get moved to a new JIRA for possible inclusion in a future release

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/46679/
            Subject: LU-15546 mdt: mdt_reint_open lookup before locking
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: f14090e56c9d94e3cfaa6f13f357173d6d570547

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/46679/ Subject: LU-15546 mdt: mdt_reint_open lookup before locking Project: fs/lustre-release Branch: master Current Patch Set: Commit: f14090e56c9d94e3cfaa6f13f357173d6d570547

            Andreas, I have not made the fixes. I see the bugs that you are referring to now.

            koutoupis Petros Koutoupis added a comment - Andreas, I have not made the fixes. I see the bugs that you are referring to now.

            People

              eaujames Etienne Aujames
              koutoupis Petros Koutoupis
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: