Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4367

unlink performance regression on lustre-2.5.52 client

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.7.0
    • Lustre 2.5.0
    • 2
    • 11951

    Description

      lustre-2.5.52 client (and maybe more old client as well) causes metadata performance (unlink files in the single shared directory) regression.
      Here is test results on lustre-2.5.52 clients and lustre-2.4.1 clients. lustre-2.5.52 is running on all servers.

      1 x MDS, 4 x OSS (32 x OST) and 16 clients(64 processs, 20000 files per process)

      lustre-2.4.1 client
      
      4.1-take2.log
      -- started at 12/09/2013 07:31:29 --
      
      mdtest-1.9.1 was launched with 64 total task(s) on 16 node(s)
      Command line used: /work/tools/bin/mdtest -d /lustre/dir.0 -n 20000 -F -i 3
      Path: /lustre
      FS: 1141.8 TiB   Used FS: 0.0%   Inodes: 50.0 Mi   Used Inodes: 0.0%
      
      64 tasks, 1280000 files
      
      SUMMARY: (of 3 iterations)
         Operation                      Max            Min           Mean        Std Dev
         ---------                      ---            ---           ----        -------
         File creation     :      58200.265      56783.559      57589.448        594.589
         File stat         :     123351.857     109571.584     114223.612       6455.043
         File read         :     109917.788      83891.903      99965.718      11472.968
         File removal      :      60825.889      59066.121      59782.774        754.599
         Tree creation     :       4048.556       1971.934       3082.293        853.878
         Tree removal      :         21.269         15.069         18.204          2.532
      
      -- finished at 12/09/2013 07:34:53 --
      
      lustre-2.5.5.2 client
      
      -- started at 12/09/2013 07:13:42 --
      
      mdtest-1.9.1 was launched with 64 total task(s) on 16 node(s)
      Command line used: /work/tools/bin/mdtest -d /lustre/dir.0 -n 20000 -F -i 3
      Path: /lustre
      FS: 1141.8 TiB   Used FS: 0.0%   Inodes: 50.0 Mi   Used Inodes: 0.0%
      
      64 tasks, 1280000 files
      
      SUMMARY: (of 3 iterations)
         Operation                      Max            Min           Mean        Std Dev
         ---------                      ---            ---           ----        -------
         File creation     :      58286.631      56689.423      57298.286        705.112
         File stat         :     127671.818     116429.261     121610.854       4631.684
         File read         :     173527.817     158205.242     166676.568       6359.445
         File removal      :      46818.194      45638.851      46118.111        506.151
         Tree creation     :       3844.458       2576.354       3393.050        578.560
         Tree removal      :         21.383         18.329         19.844          1.247
      
      -- finished at 12/09/2013 07:17:07 --
      

      46K ops/sec (lusre-2.5.52) vs 60K ops/sec (lustre-2.4.1). 25% performance drops on Lustre-2.5.52 compared to Lustre-2.4.1.

      Attachments

        1. debugfile
          512 kB
        2. LU-4367.xlsx
          99 kB
        3. unlinkmany-result.zip
          4 kB

        Issue Links

          Activity

            [LU-4367] unlink performance regression on lustre-2.5.52 client

            Patches landed to Master. Please reopen ticket if more work is needed.

            jlevi Jodi Levi (Inactive) added a comment - Patches landed to Master. Please reopen ticket if more work is needed.

            I ran the patch on Hyperion, 1,32,64,100 clients. Mdtest dir-per-process and single-shared-dir.
            Spreadsheet with graphs attached

            cliffw Cliff White (Inactive) added a comment - I ran the patch on Hyperion, 1,32,64,100 clients. Mdtest dir-per-process and single-shared-dir. Spreadsheet with graphs attached

            All of this is in this patch: http://review.whamcloud.com/11062
            Ihara-san, please give it a try to see if it helps for your workload?

            sure, will test that patches as soon as I can run benchmark. maybe early next week, thanks!

            ihara Shuichi Ihara (Inactive) added a comment - All of this is in this patch: http://review.whamcloud.com/11062 Ihara-san, please give it a try to see if it helps for your workload? sure, will test that patches as soon as I can run benchmark. maybe early next week, thanks!
            green Oleg Drokin added a comment -

            So, it looks like we still can infer if the open originated from vfs or not.

            When we come from do_filp_open (this is for real open path), we go through filename_lookup with LOOKUP_OPEN set, when we go through dentry_open, LOOKUP_OPEN is not set.

            As such the most brute-force way I see to address this is in ll_revalidate_dentry to always return 0 if LOOKUP_OPEN is set and LOOKUP_CONTINUE is NOT set (i.e. we are looking up last component).
            We already do a similar trick for LOOKUP_OPEN|LOOKUP_CONTINUE

            BTW, while looking at the ll_revalidate_dentry logic, I think we can improve it quite a bit too in the area of intermediate path component lookup.

            All of this is in this patch: http://review.whamcloud.com/11062
            Ihara-san, please give it a try to see if it helps for your workload?
            This patch passes medium level of my testing (does not include any performance testing).

            green Oleg Drokin added a comment - So, it looks like we still can infer if the open originated from vfs or not. When we come from do_filp_open (this is for real open path), we go through filename_lookup with LOOKUP_OPEN set, when we go through dentry_open, LOOKUP_OPEN is not set. As such the most brute-force way I see to address this is in ll_revalidate_dentry to always return 0 if LOOKUP_OPEN is set and LOOKUP_CONTINUE is NOT set (i.e. we are looking up last component). We already do a similar trick for LOOKUP_OPEN|LOOKUP_CONTINUE BTW, while looking at the ll_revalidate_dentry logic, I think we can improve it quite a bit too in the area of intermediate path component lookup. All of this is in this patch: http://review.whamcloud.com/11062 Ihara-san, please give it a try to see if it helps for your workload? This patch passes medium level of my testing (does not include any performance testing).
            laisiyao Lai Siyao added a comment -

            Oleg, the cause is simplified revalidate (see 7475), originally revalidate will execute IT_OPEN, but this code is replicate of lookup, and this opened handle can be lost if other client canceled this lock. So 7475 simplified revalidate, which just return 1 if dentry is valid, and let .open to really open file, but this can't be differentiate from NFS export open, so both open after revalidate and NFS export open take open lock.

            laisiyao Lai Siyao added a comment - Oleg, the cause is simplified revalidate (see 7475), originally revalidate will execute IT_OPEN, but this code is replicate of lookup, and this opened handle can be lost if other client canceled this lock. So 7475 simplified revalidate, which just return 1 if dentry is valid, and let .open to really open file, but this can't be differentiate from NFS export open, so both open after revalidate and NFS export open take open lock.

            Sorry about my earlier confusion with 10426 - I thought that was a different patch, but I see now that it is required for 10398 to work.

            It looks like the 10398 patch does improve the unlink performance, but at the expense of almost every other operation. Since unlink is already faster than create, it doesn't make sense to speed it up and slow down create. It looks like there is also some other change(s) that slowed down the create and stat operations on master compared to 2.5.2.

            It doesn't seem reasonable to land 10398 for 2.6.0 at this point.

            adilger Andreas Dilger added a comment - Sorry about my earlier confusion with 10426 - I thought that was a different patch, but I see now that it is required for 10398 to work. It looks like the 10398 patch does improve the unlink performance, but at the expense of almost every other operation. Since unlink is already faster than create, it doesn't make sense to speed it up and slow down create. It looks like there is also some other change(s) that slowed down the create and stat operations on master compared to 2.5.2. It doesn't seem reasonable to land 10398 for 2.6.0 at this point.
            green Oleg Drokin added a comment -

            So it looks like we have all of this extra file handle caching that should not really be happening at all.

            Originally when opencache was implemented - it did cache everything and that resulted in performance drop specifically due to slow lock cancellation.
            That's when we decided to only restrict this caching to nfs-originated requests and some races - by only setting the flag in ll_file_intent_open where we could only get via nfs.
            Now it appears that this assumption is broken?

            I am planning to take a deeper lok to understand what is happening with the cache now.

            green Oleg Drokin added a comment - So it looks like we have all of this extra file handle caching that should not really be happening at all. Originally when opencache was implemented - it did cache everything and that resulted in performance drop specifically due to slow lock cancellation. That's when we decided to only restrict this caching to nfs-originated requests and some races - by only setting the flag in ll_file_intent_open where we could only get via nfs. Now it appears that this assumption is broken? I am planning to take a deeper lok to understand what is happening with the cache now.

            First, I tried only 10398 patch, but build fails since OBD_CONNECT_UNLINK_CLOSE is defined in 10426 patch. So, I needed both patches at same time to compile.

            BTW, here is same mdtest benchmark on same hardware, but lustre version is 2.5.2RC2.

            Unique Directory Operation

            mdtest-1.9.3 was launched with 64 total task(s) on 32 node(s)
            Command line used: ./mdtest -i 3 -n 40000 -u -d /lustre_test/mdtest.out
            Path: /lustre_test
            FS: 39.0 TiB   Used FS: 0.0%   Inodes: 50.0 Mi   Used Inodes: 0.0%
            
            64 tasks, 2560000 files/directories
            
            SUMMARY: (of 3 iterations)
               Operation                      Max            Min           Mean        Std Dev
               ---------                      ---            ---           ----        -------
               Directory creation:      44031.310      41420.993      43125.128       1205.815
               Directory stat    :     346144.788     329854.059     335352.348       7631.863
               Directory removal :      87592.556      86416.906      87118.114        506.033
               File creation     :      82518.567      64962.637      76375.141       8077.749
               File stat         :     215570.997     209551.901     212205.919       2508.198
               File read         :     151377.930     144487.897     147463.085       2890.255
               File removal      :     105964.879      93215.798     101520.782       5877.335
               Tree creation     :        628.925        410.522        542.680         94.889
               Tree removal      :          8.583          8.013          8.284          0.233
            

            Shared Directory Operation

            mdtest-1.9.3 was launched with 64 total task(s) on 32 node(s)
            Command line used: ./mdtest -i 3 -n 40000 -d /lustre_test/mdtest.out
            Path: /lustre_test
            FS: 39.0 TiB   Used FS: 0.0%   Inodes: 50.0 Mi   Used Inodes: 0.0%
            
            64 tasks, 2560000 files/directories
            
            SUMMARY: (of 3 iterations)
               Operation                      Max            Min           Mean        Std Dev
               ---------                      ---            ---           ----        -------
               Directory creation:      39463.778      38496.147      38986.389        395.138
               Directory stat    :     143006.039     134919.914     138809.226       3308.300
               Directory removal :      78711.817      76206.632      77846.563       1160.196
               File creation     :      75154.225      70792.633      72674.025       1830.264
               File stat         :     142431.366     138650.545     140623.793       1547.953
               File read         :     134643.457     132249.733     133383.879        981.251
               File removal      :      94311.826      83231.516      89991.676       4841.388
               Tree creation     :       4048.556       3437.954       3743.808        249.278
               Tree removal      :          9.098          4.048          6.792          2.084
            

            Unique directory metadata operation, overall, the result of master + 10398 + 10426 patches are close to 2.5.2RC2 results except directory stats. (stats operation, 2.5 is better than master)
            However, metadata operations to a shared directory, most of 2.5.2RC2's numbers are still much higher than master or master + 10398 + 10426 patch's resutls. That's oritinal issue on this ticket, but still big performance gap there. Read operation, master branch much improved against 2.5 branch.

            ihara Shuichi Ihara (Inactive) added a comment - First, I tried only 10398 patch, but build fails since OBD_CONNECT_UNLINK_CLOSE is defined in 10426 patch. So, I needed both patches at same time to compile. BTW, here is same mdtest benchmark on same hardware, but lustre version is 2.5.2RC2. Unique Directory Operation mdtest-1.9.3 was launched with 64 total task(s) on 32 node(s) Command line used: ./mdtest -i 3 -n 40000 -u -d /lustre_test/mdtest.out Path: /lustre_test FS: 39.0 TiB Used FS: 0.0% Inodes: 50.0 Mi Used Inodes: 0.0% 64 tasks, 2560000 files/directories SUMMARY: (of 3 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- Directory creation: 44031.310 41420.993 43125.128 1205.815 Directory stat : 346144.788 329854.059 335352.348 7631.863 Directory removal : 87592.556 86416.906 87118.114 506.033 File creation : 82518.567 64962.637 76375.141 8077.749 File stat : 215570.997 209551.901 212205.919 2508.198 File read : 151377.930 144487.897 147463.085 2890.255 File removal : 105964.879 93215.798 101520.782 5877.335 Tree creation : 628.925 410.522 542.680 94.889 Tree removal : 8.583 8.013 8.284 0.233 Shared Directory Operation mdtest-1.9.3 was launched with 64 total task(s) on 32 node(s) Command line used: ./mdtest -i 3 -n 40000 -d /lustre_test/mdtest.out Path: /lustre_test FS: 39.0 TiB Used FS: 0.0% Inodes: 50.0 Mi Used Inodes: 0.0% 64 tasks, 2560000 files/directories SUMMARY: (of 3 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- Directory creation: 39463.778 38496.147 38986.389 395.138 Directory stat : 143006.039 134919.914 138809.226 3308.300 Directory removal : 78711.817 76206.632 77846.563 1160.196 File creation : 75154.225 70792.633 72674.025 1830.264 File stat : 142431.366 138650.545 140623.793 1547.953 File read : 134643.457 132249.733 133383.879 981.251 File removal : 94311.826 83231.516 89991.676 4841.388 Tree creation : 4048.556 3437.954 3743.808 249.278 Tree removal : 9.098 4.048 6.792 2.084 Unique directory metadata operation, overall, the result of master + 10398 + 10426 patches are close to 2.5.2RC2 results except directory stats. (stats operation, 2.5 is better than master) However, metadata operations to a shared directory, most of 2.5.2RC2's numbers are still much higher than master or master + 10398 + 10426 patch's resutls. That's oritinal issue on this ticket, but still big performance gap there. Read operation, master branch much improved against 2.5 branch.

            It appears that the unlink performance has gone up, but the create and stat rate have gone down. Can you please test those two patches separately? If the 10398 patch is fixing the unlink performance without hurting the other performance it could land. It might be that 10426 patch is changing the other performance and needs to be reworked.

            adilger Andreas Dilger added a comment - It appears that the unlink performance has gone up, but the create and stat rate have gone down. Can you please test those two patches separately? If the 10398 patch is fixing the unlink performance without hurting the other performance it could land. It might be that 10426 patch is changing the other performance and needs to be reworked.

            People

              laisiyao Lai Siyao
              ihara Shuichi Ihara (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: