Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6800

Significant performance regression with patch LU-5264

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.8.0
    • None
    • master
    • 2
    • 9223372036854775807

    Description

      Durding our performance testing, we found siginicant metadata performance regression with LU-5264 on master.

      # mpirun -np 128 -ppn 4 -hostfile ./hostfile /work/tools/bin/mdtest -n 1000 -p 10 -i 5 -d /scratch1/mdtest.out
      

      master

      SUMMARY: (of 5 iterations)
         Operation                      Max            Min           Mean        Std Dev
         ---------                      ---            ---           ----        -------
         Directory creation:      39552.671      33039.129      37024.828       2875.617
         Directory stat    :      33462.417      29340.691      31662.586       1384.330
         Directory removal :      40938.777      40238.677      40571.960        283.701
         File creation     :      17696.663      17209.531      17542.185        171.470
         File stat         :      33892.041      33429.312      33680.603        170.577
         File read         :      11284.121      11012.694      11220.417        104.978
         File removal      :      39718.200      39449.348      39556.254         90.590
         Tree creation     :       4583.939        700.335       3652.356       1487.449
         Tree removal      :        170.563        156.738        162.935          5.172
      

      keep client version, but revert patch 42fdf8355791cb682c6120f7950bb2ecd50f97aa (LU-5264 obdclass: fix race during key quiescency) on servers.

      SUMMARY: (of 5 iterations)
         Operation                      Max            Min           Mean        Std Dev
         ---------                      ---            ---           ----        -------
         Directory creation:      44937.511      42117.095      43780.402       1335.927
         Directory stat    :     135310.427     129560.951     133625.293       2077.128
         Directory removal :      51525.499      46852.534      49965.297       1611.759
         File creation     :      42978.506      41435.145      42413.409        586.294
         File stat         :     135882.699     133344.886     134466.144        977.577
         File read         :     121788.787     111332.613     116374.190       3351.730
         File removal      :      84827.815      78120.995      80378.741       2522.662
         Tree creation     :       4650.004       3788.893       4268.099        336.241
         Tree removal      :        198.059        129.234        179.980         25.563
      

      Attachments

        Issue Links

          Activity

            [LU-6800] Significant performance regression with patch LU-5264

            Unfortunately, we did not test with 15558. Not sure we will be able to do this on the production system.

            adegremont Aurelien Degremont (Inactive) added a comment - Unfortunately, we did not test with 15558. Not sure we will be able to do this on the production system.

            Hi Aurelien,

            Did you test with or without 15558? Does it help or still have the same problem?

            lixi Li Xi (Inactive) added a comment - Hi Aurelien, Did you test with or without 15558? Does it help or still have the same problem?

            FYI, at CEA, we faced heavy load on MDT with several codes. This was introducing bad performance and instability on the filesystem, so we decided to revert the patch from LU-5264 for now, until we get something better.

            adegremont Aurelien Degremont (Inactive) added a comment - FYI, at CEA, we faced heavy load on MDT with several codes. This was introducing bad performance and instability on the filesystem, so we decided to revert the patch from LU-5264 for now, until we get something better.

            As far as we know, http://review.whamcloud.com/15558/ is not perfect. It helps to get perforamnce back on most of metadata operation, but the file read operation is still slow before appled LU-5264.
            I will post benchmark resutls soon.

            ihara Shuichi Ihara (Inactive) added a comment - As far as we know, http://review.whamcloud.com/15558/ is not perfect. It helps to get perforamnce back on most of metadata operation, but the file read operation is still slow before appled LU-5264 . I will post benchmark resutls soon.
            pjones Peter Jones added a comment -

            Landed for 2.8

            pjones Peter Jones added a comment - Landed for 2.8

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/15558/
            Subject: LU-6800 obdclass: change spinlock of key to rwlock
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 96d773d993cd48a069da4098b87da7d9ef0dd52e

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/15558/ Subject: LU-6800 obdclass: change spinlock of key to rwlock Project: fs/lustre-release Branch: master Current Patch Set: Commit: 96d773d993cd48a069da4098b87da7d9ef0dd52e

            Gu Zheng (gzheng@ddn.com) uploaded a new patch: http://review.whamcloud.com/15558
            Subject: LU-6800 obdclass: change spinlock of key to rwlock
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 56ee901b05afbe3eab2fda3eb2a055025f3a7779

            gerrit Gerrit Updater added a comment - Gu Zheng (gzheng@ddn.com) uploaded a new patch: http://review.whamcloud.com/15558 Subject: LU-6800 obdclass: change spinlock of key to rwlock Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 56ee901b05afbe3eab2fda3eb2a055025f3a7779

            I noticed this performance regression as well. In fact so far when testing a DNE2 directory striped across 2 MDS it performance worst than when just using one MDS.

            simmonsja James A Simmons added a comment - I noticed this performance regression as well. In fact so far when testing a DNE2 directory striped across 2 MDS it performance worst than when just using one MDS.

            I am trying to setup a test platform to understand the impact of my original patch from LU-5264 on lu_keys_guard lock usage.

            bfaccini Bruno Faccini (Inactive) added a comment - I am trying to setup a test platform to understand the impact of my original patch from LU-5264 on lu_keys_guard lock usage.

            People

              bfaccini Bruno Faccini (Inactive)
              ihara Shuichi Ihara (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: