Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4367

unlink performance regression on lustre-2.5.52 client

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.7.0
    • Lustre 2.5.0
    • 2
    • 11951

    Description

      lustre-2.5.52 client (and maybe more old client as well) causes metadata performance (unlink files in the single shared directory) regression.
      Here is test results on lustre-2.5.52 clients and lustre-2.4.1 clients. lustre-2.5.52 is running on all servers.

      1 x MDS, 4 x OSS (32 x OST) and 16 clients(64 processs, 20000 files per process)

      lustre-2.4.1 client
      
      4.1-take2.log
      -- started at 12/09/2013 07:31:29 --
      
      mdtest-1.9.1 was launched with 64 total task(s) on 16 node(s)
      Command line used: /work/tools/bin/mdtest -d /lustre/dir.0 -n 20000 -F -i 3
      Path: /lustre
      FS: 1141.8 TiB   Used FS: 0.0%   Inodes: 50.0 Mi   Used Inodes: 0.0%
      
      64 tasks, 1280000 files
      
      SUMMARY: (of 3 iterations)
         Operation                      Max            Min           Mean        Std Dev
         ---------                      ---            ---           ----        -------
         File creation     :      58200.265      56783.559      57589.448        594.589
         File stat         :     123351.857     109571.584     114223.612       6455.043
         File read         :     109917.788      83891.903      99965.718      11472.968
         File removal      :      60825.889      59066.121      59782.774        754.599
         Tree creation     :       4048.556       1971.934       3082.293        853.878
         Tree removal      :         21.269         15.069         18.204          2.532
      
      -- finished at 12/09/2013 07:34:53 --
      
      lustre-2.5.5.2 client
      
      -- started at 12/09/2013 07:13:42 --
      
      mdtest-1.9.1 was launched with 64 total task(s) on 16 node(s)
      Command line used: /work/tools/bin/mdtest -d /lustre/dir.0 -n 20000 -F -i 3
      Path: /lustre
      FS: 1141.8 TiB   Used FS: 0.0%   Inodes: 50.0 Mi   Used Inodes: 0.0%
      
      64 tasks, 1280000 files
      
      SUMMARY: (of 3 iterations)
         Operation                      Max            Min           Mean        Std Dev
         ---------                      ---            ---           ----        -------
         File creation     :      58286.631      56689.423      57298.286        705.112
         File stat         :     127671.818     116429.261     121610.854       4631.684
         File read         :     173527.817     158205.242     166676.568       6359.445
         File removal      :      46818.194      45638.851      46118.111        506.151
         Tree creation     :       3844.458       2576.354       3393.050        578.560
         Tree removal      :         21.383         18.329         19.844          1.247
      
      -- finished at 12/09/2013 07:17:07 --
      

      46K ops/sec (lusre-2.5.52) vs 60K ops/sec (lustre-2.4.1). 25% performance drops on Lustre-2.5.52 compared to Lustre-2.4.1.

      Attachments

        1. debugfile
          512 kB
        2. LU-4367.xlsx
          99 kB
        3. unlinkmany-result.zip
          4 kB

        Issue Links

          Activity

            [LU-4367] unlink performance regression on lustre-2.5.52 client

            Lai, it looks like the patches http://review.whamcloud.com/9696 and http://review.whamcloud.com/9697 are improving the open performance, but do not address the unlink performance. Is there something that can be done to improve the unlink performance back to the 2.5.0 level so that 2.6.0 does not have a performance regression?

            adilger Andreas Dilger added a comment - Lai, it looks like the patches http://review.whamcloud.com/9696 and http://review.whamcloud.com/9697 are improving the open performance, but do not address the unlink performance. Is there something that can be done to improve the unlink performance back to the 2.5.0 level so that 2.6.0 does not have a performance regression?
            laisiyao Lai Siyao added a comment -

            Thanks Ihara, patches updated, previously I only tested mdtest, and didn't do a full test because they are intended to get mdtest performance data, and may not be final patches yet, sorry for the trouble made.

            laisiyao Lai Siyao added a comment - Thanks Ihara, patches updated, previously I only tested mdtest, and didn't do a full test because they are intended to get mdtest performance data, and may not be final patches yet, sorry for the trouble made.

            this is debugfile when the problem happens.

            echo "+trace" > /proc/sys/lnet/debug
            lctl debug_daemon start /tmp/debuglog 100
            touch /tmp/a
            cp /tmp/a /lustre
            lctl debug_daemon stop
            echo "-trace" > /proc/sys/lnet/debug

            ihara Shuichi Ihara (Inactive) added a comment - this is debugfile when the problem happens. echo "+trace" > /proc/sys/lnet/debug lctl debug_daemon start /tmp/debuglog 100 touch /tmp/a cp /tmp/a /lustre lctl debug_daemon stop echo "-trace" > /proc/sys/lnet/debug

            Lai, these patches are broken. can't copy file from local filesystem to Lustre.

            [root@r21 tmp]# touch /tmp/a
            [root@r21 tmp]# cp /tmp/a /lustre/
            cp: cannot create regular file `/lustre/a': File exists
            

            This worked.

            [root@r21 tmp]# touch /lustre/a
            
            ihara Shuichi Ihara (Inactive) added a comment - Lai, these patches are broken. can't copy file from local filesystem to Lustre. [root@r21 tmp]# touch /tmp/a [root@r21 tmp]# cp /tmp/a /lustre/ cp: cannot create regular file `/lustre/a': File exists This worked. [root@r21 tmp]# touch /lustre/a

            sure, will test those patches very soon and keep you updates! Thanks a lot, again!

            ihara Shuichi Ihara (Inactive) added a comment - sure, will test those patches very soon and keep you updates! Thanks a lot, again!
            laisiyao Lai Siyao added a comment - - edited

            During test I saw other places that can be improved to increase file creation, stat, and maybe read performance, and I composed two patches:
            http://review.whamcloud.com/#/c/9696/
            http://review.whamcloud.com/#/c/9697/

            Would you apply these two patches and get some result?

            laisiyao Lai Siyao added a comment - - edited During test I saw other places that can be improved to increase file creation, stat, and maybe read performance, and I composed two patches: http://review.whamcloud.com/#/c/9696/ http://review.whamcloud.com/#/c/9697/ Would you apply these two patches and get some result?

            Yes, even this might not be bug, we see perforamnce drop under mdtest IO senario at least. mdtest is one of major benchmark tool for metadata and it's one of metadata scenario. we would be keeping (at least) same performance with newer version of Lustre. If you have idea of workaorund, please share with us. I would like to test them.

            ihara Shuichi Ihara (Inactive) added a comment - Yes, even this might not be bug, we see perforamnce drop under mdtest IO senario at least. mdtest is one of major benchmark tool for metadata and it's one of metadata scenario. we would be keeping (at least) same performance with newer version of Lustre. If you have idea of workaorund, please share with us. I would like to test them.
            laisiyao Lai Siyao added a comment -

            command like `mdtest -d /lustre/dir.0 -n 20000 -F -i 3` executed following syscalls on each file:
            1. creat
            2. close
            3. stat
            4. open
            5. I/O
            6. close
            7. unlink

            For old code, syscall open in step 4 called .revalidate(IT_OPEN), which opened file, and close in step 6 called .release and did close the file.
            While for new code, .revalidate doesn't execute intent any more, but return 1 directly, and later ll_intent_file_open() will open file with MDS_INODELOCK_OPEN, so close in step 6 doesn't really close the file because open lock is cached. And in step 7 unlink needs to close file before unlinking file, and this is the cause of unlink performance drop.

            IMHO this is not a real bug, because no extra RPC was sent, but because mdtest opened file twice, so in new code open lock is fetched. A possible fix might be to add a timestamp so .open can know just now .revalidate(IT_OPEN) was called, so no need to fetch open lock. But I'm not sure this is necessary.

            laisiyao Lai Siyao added a comment - command like `mdtest -d /lustre/dir.0 -n 20000 -F -i 3` executed following syscalls on each file: 1. creat 2. close 3. stat 4. open 5. I/O 6. close 7. unlink For old code, syscall open in step 4 called .revalidate(IT_OPEN), which opened file, and close in step 6 called .release and did close the file. While for new code, .revalidate doesn't execute intent any more, but return 1 directly, and later ll_intent_file_open() will open file with MDS_INODELOCK_OPEN, so close in step 6 doesn't really close the file because open lock is cached. And in step 7 unlink needs to close file before unlinking file, and this is the cause of unlink performance drop. IMHO this is not a real bug, because no extra RPC was sent, but because mdtest opened file twice, so in new code open lock is fetched. A possible fix might be to add a timestamp so .open can know just now .revalidate(IT_OPEN) was called, so no need to fetch open lock. But I'm not sure this is necessary.
            laisiyao Lai Siyao added a comment -

            I tested on three testnodes in Toro, one client, one MDS, two OSS on same OST.

            I was suspecting that in LU-3544 patch .revalidate just return 0 if dentry is invalid, and let .lookup to do real lookup, instead of lookup in .revalidate in old code, and this may introduce a small overhead. I'll double check the call trace of unlink to see whether there are extra lookups.

            laisiyao Lai Siyao added a comment - I tested on three testnodes in Toro, one client, one MDS, two OSS on same OST. I was suspecting that in LU-3544 patch .revalidate just return 0 if dentry is invalid, and let .lookup to do real lookup, instead of lookup in .revalidate in old code, and this may introduce a small overhead. I'll double check the call trace of unlink to see whether there are extra lookups.

            Lai,
            what kind of system did you test on? I suspect that this slowdown is only visible with a fast MDS and IB network and enough OSTs so that unlinking the OST objects is not the bottleneck. I don't know that changing the parameters of what is being tested is needed, since there is clearly a slowdown in this test which is significantly larger than the standard deviation between tests (-15000 unlinks/sec with stddev 1000).

            Also, I think it is important to note that this is only an issue during unlink, and in fact with it is +3000 unlink/sec faster than 2.5.0/2.5.51 once the LU-3544 patch is reverted. However, it does appear that the open-for-read ("File read" with 0 bytes read) performance is +60000 open/sec faster with the LU-3544 patch applied, which is also important not to lose.

            I suspect there is some subtle difference in the new ll_revalidate_dentry() code that is only triggering in the unlink case, possibly forcing an extra RPC to the MDS to revalidate the dentry just before it is being unlinked? Rather than spending time trying to reproduce the performance loss, it might make more sense to just get a debug log of unlink with and without the 55989b17c73912 patch applied and see what the difference is in the callpath and RPCs sent. Hopefully, there is just a minor change that can be done to fix the unlink path and not impact the other performance.

            adilger Andreas Dilger added a comment - Lai, what kind of system did you test on? I suspect that this slowdown is only visible with a fast MDS and IB network and enough OSTs so that unlinking the OST objects is not the bottleneck. I don't know that changing the parameters of what is being tested is needed, since there is clearly a slowdown in this test which is significantly larger than the standard deviation between tests (-15000 unlinks/sec with stddev 1000). Also, I think it is important to note that this is only an issue during unlink, and in fact with it is +3000 unlink/sec faster than 2.5.0/2.5.51 once the LU-3544 patch is reverted. However, it does appear that the open-for-read ("File read" with 0 bytes read) performance is +60000 open/sec faster with the LU-3544 patch applied, which is also important not to lose. I suspect there is some subtle difference in the new ll_revalidate_dentry() code that is only triggering in the unlink case, possibly forcing an extra RPC to the MDS to revalidate the dentry just before it is being unlinked? Rather than spending time trying to reproduce the performance loss, it might make more sense to just get a debug log of unlink with and without the 55989b17c73912 patch applied and see what the difference is in the callpath and RPCs sent. Hopefully, there is just a minor change that can be done to fix the unlink path and not impact the other performance.
            laisiyao Lai Siyao added a comment -

            I tested on different setup, but I didn't see the unlink performance drop. If possible, could you use oprofile to find which function consumes more time for 2.5.52 client?

            I noticed that you only tested with a small set of files (20000 total files) and iterated three times. Could you test with more files and only one iteration? And could you also test with one client to see if unlink gets slow?

            laisiyao Lai Siyao added a comment - I tested on different setup, but I didn't see the unlink performance drop. If possible, could you use oprofile to find which function consumes more time for 2.5.52 client? I noticed that you only tested with a small set of files (20000 total files) and iterated three times. Could you test with more files and only one iteration? And could you also test with one client to see if unlink gets slow?

            People

              laisiyao Lai Siyao
              ihara Shuichi Ihara (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: