Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9972

Performance regressions on unique directory removal

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.11.0, Lustre 2.10.4
    • None
    • None
    • 2.10 (and 2.11)
    • 3
    • 9223372036854775807

    Description

      There is a performance regression on dir removal.

      Server and client : RHEL7.3
      Lustre version : 2.10.52
      Backend filesystem: ldiskfs

      mpirun --allow-run-as-root /work/tools/bin/mdtest -n 5000 -v -d /scratch0/mdtest.out -D -i 3 -p 10 -w 0 -u

      SUMMARY: (of 3 iterations)
         Operation                      Max            Min           Mean        Std Dev
         ---------                      ---            ---           ----        -------
         Directory creation:      89757.381      65618.928      74607.900      10774.356
         Directory stat    :     320946.433     319888.242     320294.264        465.749
         Directory removal :      19028.569      17837.487      18351.200        499.838
         Tree creation     :        434.446        158.826        318.943        116.860
         Tree removal      :         27.018         25.210         26.281          0.775
      

      Attachments

        Issue Links

          Activity

            [LU-9972] Performance regressions on unique directory removal

            John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/31211/
            Subject: LU-9972 osd: cache OI mapping in dt_declare_ref_add
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set:
            Commit: 025d0412599ed9381be4a0ab84d190b59fc2c451

            gerrit Gerrit Updater added a comment - John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/31211/ Subject: LU-9972 osd: cache OI mapping in dt_declare_ref_add Project: fs/lustre-release Branch: b2_10 Current Patch Set: Commit: 025d0412599ed9381be4a0ab84d190b59fc2c451

            Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/31211
            Subject: LU-9972 osd: cache OI mapping in dt_declare_ref_add
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set: 1
            Commit: d334fd8aa117d9a957365bdfb6792bcaaf6533cc

            gerrit Gerrit Updater added a comment - Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/31211 Subject: LU-9972 osd: cache OI mapping in dt_declare_ref_add Project: fs/lustre-release Branch: b2_10 Current Patch Set: 1 Commit: d334fd8aa117d9a957365bdfb6792bcaaf6533cc
            pjones Peter Jones added a comment -

            Landed for 2.11

            pjones Peter Jones added a comment - Landed for 2.11

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/29709/
            Subject: LU-9972 osd: cache OI mapping in dt_declare_ref_add
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: b6e718def348c53759a12afee9450207fc7ab56f

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/29709/ Subject: LU-9972 osd: cache OI mapping in dt_declare_ref_add Project: fs/lustre-release Branch: master Current Patch Set: Commit: b6e718def348c53759a12afee9450207fc7ab56f

            Hi Ihara,

            Have you been able to confirm that Alex's patch resolves the issue?

            Thanks.
            Joe

            jgmitter Joseph Gmitter (Inactive) added a comment - Hi Ihara, Have you been able to confirm that Alex's patch resolves the issue? Thanks. Joe

            please, try with the updated patch.

            bzzz Alex Zhuravlev added a comment - please, try with the updated patch.

            thanks for the data. I've updated https://review.whamcloud.com/29709
            please benchmark master with that. the patch collects additional stats and dump it at umount in the following form:
            + if (atomic_read(&o->od_idc_remotes) ||
            + atomic_read(&o->od_idc_oi))
            + printk("%s: %d checks for remote, %d OI lookups\n",
            + o->od_svname,
            + atomic_read(&o->od_idc_remotes),
            + atomic_read(&o->od_idc_oi));

            please attach it here if printed.
            thanks in advance.

            I'm working on a followup patch to optimize calls to osd_remote_fid(), this isn't directly related to LU-7053, but hopefully can fix the issue.

            bzzz Alex Zhuravlev added a comment - thanks for the data. I've updated https://review.whamcloud.com/29709 please benchmark master with that. the patch collects additional stats and dump it at umount in the following form: + if (atomic_read(&o->od_idc_remotes) || + atomic_read(&o->od_idc_oi)) + printk("%s: %d checks for remote, %d OI lookups\n", + o->od_svname, + atomic_read(&o->od_idc_remotes), + atomic_read(&o->od_idc_oi)); please attach it here if printed. thanks in advance. I'm working on a followup patch to optimize calls to osd_remote_fid(), this isn't directly related to LU-7053 , but hopefully can fix the issue.

            Results of patch https://review.whamcloud.com/#/c/29821/ which reverts just LU-7053:

            MIB RESULTS
            MDTEST RESULTS
            000: SUMMARY: (of 3 iterations)
            000:    Operation                      Max            Min           Mean        Std Dev
            000:    ---------                      ---            ---           ----        -------
            000:    Directory creation:      21177.729      16443.602      18498.262       1982.554
            000:    Directory stat    :     232409.626     229897.859     230874.272       1098.955
            000:    Directory removal :     111897.307      40906.454      75055.964      29044.331
            000:    File creation     :      44061.464      38438.052      41863.997       2454.596
            000:    File stat         :     225941.782     193254.640     210777.086      13448.211
            000:    File read         :     146598.308      86669.775     126562.125      28208.247
            000:    File removal      :     155512.154     106886.927     136397.357      21168.452
            000:    Tree creation     :        120.178         56.129         96.113         28.468
            000:    Tree removal      :          8.906          8.122          8.620          0.354
            000:
            
            

            Dir removal for this run has been "75055.964" (mean) approx 45% higher in performance in comparison to the patch https://review.whamcloud.com/29709 under exactly same conditions and number of iterations.

            standan Saurabh Tandan (Inactive) added a comment - Results of patch https://review.whamcloud.com/#/c/29821/ which reverts just LU-7053 : MIB RESULTS MDTEST RESULTS 000: SUMMARY: (of 3 iterations) 000: Operation Max Min Mean Std Dev 000: --------- --- --- ---- ------- 000: Directory creation: 21177.729 16443.602 18498.262 1982.554 000: Directory stat : 232409.626 229897.859 230874.272 1098.955 000: Directory removal : 111897.307 40906.454 75055.964 29044.331 000: File creation : 44061.464 38438.052 41863.997 2454.596 000: File stat : 225941.782 193254.640 210777.086 13448.211 000: File read : 146598.308 86669.775 126562.125 28208.247 000: File removal : 155512.154 106886.927 136397.357 21168.452 000: Tree creation : 120.178 56.129 96.113 28.468 000: Tree removal : 8.906 8.122 8.620 0.354 000: Dir removal for this run has been "75055.964" (mean) approx 45% higher in performance in comparison to the patch https://review.whamcloud.com/29709 under exactly same conditions and number of iterations.

            to verify that I added printk() to osd_idc_find_or_init() and got zero calls to osd_remote_fid() and osd_oi_lookup() during rmdir.
            there are few calls to osd_remote_fid() (which I hope to fix in a separate patch), but those weren't introduced with LU-7053

            bzzz Alex Zhuravlev added a comment - to verify that I added printk() to osd_idc_find_or_init() and got zero calls to osd_remote_fid() and osd_oi_lookup() during rmdir. there are few calls to osd_remote_fid() (which I hope to fix in a separate patch), but those weren't introduced with LU-7053

            usually it's initialized from other preceding methods (like osd_declare_ref_

            {add|del}

            in https://review.whamcloud.com/29709
            with no extra lookups.

            bzzz Alex Zhuravlev added a comment - usually it's initialized from other preceding methods (like osd_declare_ref_ {add|del} in https://review.whamcloud.com/29709 with no extra lookups.

            People

              bzzz Alex Zhuravlev
              ihara Shuichi Ihara (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: