Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15546

Shared Directory File Creates regression seen in 2.15 when comparing to 2.12.6

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.15.0
    • Lustre 2.15.0
    • None
    • 3
    • 9223372036854775807

    Description

      When testing mdtest 2.15 (2.14.57) and comparing to 2.12.6, I see a large 25% regression with Shared Directory File Creates. Perf traces (attached) show a lot of extra ldlm overhead.

      #!/bin/bash
      
      NODES=21
      PPN=16
      PROCS=$(( $NODES * $PPN ))
      MDT_COUNT=1
      PAUSED=120
      
      srun -N $NODES --ntasks-per-node $PPN ~bloewe/benchmarks/ior-3.3.0-CentOS-8.2/install/bin/mdtest -v -i 5 -p $PAUSED -C -E -T -r -n $(( $MDT_COUNT * 1048576 / $PROCS )) -d /mnt/kjlmo2/pkoutoupis/mdt0/test.`date +"%Y%m%d.%H%M%S"` 2>&1 |& tee f_mdt0_0k_ost_shared.out
      
      srun -N $NODES --ntasks-per-node $PPN ~bloewe/benchmarks/ior-3.3.0-CentOS-8.2/install/bin/mdtest -v -i 5 -p $PAUSED -C -w 32768 -E -e 32768 -T -r -n $(( $MDT_COUNT * 1048576 / $PROCS )) -d /mnt/kjlmo2/pkoutoupis/mdt0/test.`date +"%Y%m%d.%H%M%S"` 2>&1 |& tee f_mdt0_32k_ost_shared.out
      

      Attachments

        Issue Links

          Activity

            [LU-15546] Shared Directory File Creates regression seen in 2.15 when comparing to 2.12.6

            Petros and Shuichi, the latest version of my patch: https://review.whamcloud.com/46696 "LU-15546 mdt: keep history of mdt_reint_open() lock" looks like it is working properly in my local testing, but needs some benchmarking on real hardware to see whether it provides a performance improvement.

            The patch has been updated to have a per-directory history counter. In my local testing it takes about 128 open+creates (with pre-lookup, like Etienne's just-landed patch) before it gets "into the zone" and speculatively skips the lookup to predict the PW lock mode and skip the pre-lookup. It takes about 16 "bad" lookups in the same directory before it reverts to doing the pre-lookup again, and 256 open-existing before it swings to the opposite end to predict PR locks and skip the pre-lookup.

            Mixed workloads within a single directory will be essentially the same as the current code, so it will always do a pre-lookup in the directory if the open mode doesn't give enough info.

            adilger Andreas Dilger added a comment - Petros and Shuichi, the latest version of my patch: https://review.whamcloud.com/46696 " LU-15546 mdt: keep history of mdt_reint_open() lock " looks like it is working properly in my local testing, but needs some benchmarking on real hardware to see whether it provides a performance improvement. The patch has been updated to have a per-directory history counter. In my local testing it takes about 128 open+creates (with pre-lookup, like Etienne's just-landed patch) before it gets "into the zone" and speculatively skips the lookup to predict the PW lock mode and skip the pre-lookup. It takes about 16 "bad" lookups in the same directory before it reverts to doing the pre-lookup again, and 256 open-existing before it swings to the opposite end to predict PR locks and skip the pre-lookup. Mixed workloads within a single directory will be essentially the same as the current code, so it will always do a pre-lookup in the directory if the open mode doesn't give enough info.
            pjones Peter Jones added a comment -

            As per the discussion on the LWG call, Etienne's patch (which just landed) is what we will address for 2.15. I suggest that the other patches in flight get moved to a new JIRA for possible inclusion in a future release

            pjones Peter Jones added a comment - As per the discussion on the LWG call, Etienne's patch (which just landed) is what we will address for 2.15. I suggest that the other patches in flight get moved to a new JIRA for possible inclusion in a future release

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/46679/
            Subject: LU-15546 mdt: mdt_reint_open lookup before locking
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: f14090e56c9d94e3cfaa6f13f357173d6d570547

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/46679/ Subject: LU-15546 mdt: mdt_reint_open lookup before locking Project: fs/lustre-release Branch: master Current Patch Set: Commit: f14090e56c9d94e3cfaa6f13f357173d6d570547

            Andreas, I have not made the fixes. I see the bugs that you are referring to now.

            koutoupis Petros Koutoupis added a comment - Andreas, I have not made the fixes. I see the bugs that you are referring to now.

            Petros, did you make fixes to 46696 before testing it? There are a couple of bugs in the current version patch (though clearly described there), but they could be fixed relatively easily, but I have no fast system to benchmark it myself. If you have made fixes, please push a new version, otherwise I will take another crack at it.

            adilger Andreas Dilger added a comment - Petros, did you make fixes to 46696 before testing it? There are a couple of bugs in the current version patch (though clearly described there), but they could be fixed relatively easily, but I have no fast system to benchmark it myself. If you have made fixes, please push a new version, otherwise I will take another crack at it.
            koutoupis Petros Koutoupis added a comment - - edited

            I tested change 46696 and it performs a bit better than just 46679 (on shared directory file creates). It is within 5% of my original baseline. Actually, around 3.5-4% less which I consider great.

            koutoupis Petros Koutoupis added a comment - - edited I tested change 46696 and it performs a bit better than just 46679 (on shared directory file creates). It is within 5% of my original baseline. Actually, around 3.5-4% less which I consider great.

            "Dominique Martinet <qhufhnrynczannqp.f@noclue.notk.org>" uploaded a new patch: https://review.whamcloud.com/46738
            Subject: LU-15546 mdt: optimistically trust non-locked lookup
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 55162c4d0fc798b5070d08a670f1e36b575672c6

            gerrit Gerrit Updater added a comment - "Dominique Martinet <qhufhnrynczannqp.f@noclue.notk.org>" uploaded a new patch: https://review.whamcloud.com/46738 Subject: LU-15546 mdt: optimistically trust non-locked lookup Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 55162c4d0fc798b5070d08a670f1e36b575672c6

            koutoupis feel free to play with my patch https://review.whamcloud.com/46696 - it isn't quite ready, just something I whipped together and have not tested, but close to something that would work.

            adilger Andreas Dilger added a comment - koutoupis feel free to play with my patch https://review.whamcloud.com/46696 - it isn't quite ready, just something I whipped together and have not tested, but close to something that would work.

            I tested (mdtest shared directory file creates) change 46679 against my original 2.15 baseline and it is about a 20-22% improvement but still a 5% regression from a revert of LU-10262.

            koutoupis Petros Koutoupis added a comment - I tested (mdtest shared directory file creates) change 46679 against my original 2.15 baseline and it is about a 20-22% improvement but still a 5% regression from a revert of LU-10262 .

            "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/46696
            Subject: LU-15546 mdt: keep history of mdt_reint_open() lock
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 47cf66750b44a34be3543e0f3e629cbf5103f8da

            gerrit Gerrit Updater added a comment - "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/46696 Subject: LU-15546 mdt: keep history of mdt_reint_open() lock Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 47cf66750b44a34be3543e0f3e629cbf5103f8da

            @Andreas Dilger. Of course. I can do so over the next few days.

            koutoupis Petros Koutoupis added a comment - @Andreas Dilger. Of course. I can do so over the next few days.

            People

              eaujames Etienne Aujames
              koutoupis Petros Koutoupis
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: