Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2442

metadata performance degradation on current master

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.4.0, Lustre 2.5.0
    • Lustre 2.4.0
    • None
    • 3
    • 5771

    Description

      During Minh's performance test on Opensfs cluster, we found a quite performance degradation.

      b2_3 test result

      [root@c24 bin]# ./run_mdsrate_create
      0: c01 starting at Thu Dec  6 08:48:39 2012
      Rate: 26902.88 eff 26902.57 aggr 140.12 avg client creates/sec (total: 192 threads 3840000 creates 192 dirs 1 threads/dir 142.74 secs)
      0: c01 finished at Thu Dec  6 08:51:02 2012
      [root@c24 bin]# ./run_mdsrate_stat
      0: c01 starting at Thu Dec  6 08:51:50 2012
      Rate: 169702.53 eff 169703.11 aggr 883.87 avg client stats/sec (total: 192 threads 3840000 stats 192 dirs 1 threads/dir 22.63 secs)
      0: c01 finished at Thu Dec  6 08:52:13 2012
      [root@c24 bin]# ./run_mdsrate_unlink
      0: c01 starting at Thu Dec  6 08:52:28 2012
      Rate: 33486.06 eff 33486.74 aggr 174.41 avg client unlinks/sec (total: 192 threads 3840000 unlinks 192 dirs 1 threads/dir 114.67 secs)
      Warning: only unlinked 3840000 files instead of 20000
      0: c01 finished at Thu Dec  6 08:54:23 2012
      [root@c24 bin]# ./run_mdsrate_mknod
      0: c01 starting at Thu Dec  6 08:54:32 2012
      Rate: 52746.29 eff 52745.00 aggr 274.71 avg client mknods/sec (total: 192 threads 3840000 mknods 192 dirs 1 threads/dir 72.80 secs)
      0: c01 finished at Thu Dec  6 08:55:45 2012
      [root@c24 bin]#
      

      Master test result

      [root@c24 bin]# ./run_mdsrate_create
      0: c01 starting at Tue Dec  4 21:15:40 2012
      Rate: 6031.09 eff 6031.11 aggr 31.41 avg client creates/sec (total: 192 threads 3840000 creates 192 dirs 1 threads/dir 636.70 secs)
      0: c01 finished at Tue Dec  4 21:26:17 2012
      [root@c24 bin]# ./run_mdsrate_stat
      0: c01 starting at Tue Dec  4 21:27:04 2012
      Rate: 177962.00 eff 177964.59 aggr 926.90 avg client stats/sec (total: 192 threads 3840000 stats 192 dirs 1 threads/dir 21.58 secs)
      0: c01 finished at Tue Dec  4 21:27:26 2012
      [root@c24 bin]# ./run_mdsrate_unlink
      0: c01 starting at Tue Dec  4 21:29:47 2012
      Rate: 8076.06 eff 8076.08 aggr 42.06 avg client unlinks/sec (total: 192 threads 3840000 unlinks 192 dirs 1 threads/dir 475.48 secs)
      Warning: only unlinked 3840000 files instead of 20000
      0: c01 finished at Tue Dec  4 21:37:43 2012
      [root@c24 bin]# ./run_mdsrate_mknod
      0: c01 starting at Tue Dec  4 21:48:41 2012
      Rate: 10430.50 eff 10430.61 aggr 54.33 avg client mknods/sec (total: 192 threads 3840000 mknods 192 dirs 1 threads/dir 368.15 secs)
      0: c01 finished at Tue Dec  4 21:54:49 2012
      

      Attachments

        Issue Links

          Activity

            [LU-2442] metadata performance degradation on current master

            The performance regression was resolved for 2.5.0 by applying the quota-replace-dqptr-sem and quota-avoid-dqget-calls patches to the kernel. I'm closing this bug, and opened LU-3966 to track landing those patches into the upstream kernel, which is not strictly related to the 2.5.x release stream.

            adilger Andreas Dilger added a comment - The performance regression was resolved for 2.5.0 by applying the quota-replace-dqptr-sem and quota-avoid-dqget-calls patches to the kernel. I'm closing this bug, and opened LU-3966 to track landing those patches into the upstream kernel, which is not strictly related to the 2.5.x release stream.
            pjones Peter Jones added a comment -

            Good idea!

            pjones Peter Jones added a comment - Good idea!
            simmonsja James A Simmons added a comment - - edited

            The SLES11 SP[1,2] platforms have been fix. We should keep this ticket open until fc18 is addresses for 2.5. Also we really should push this patch upstream and have the distributions pick this up. I recommend linking this ticket to LU-20.

            simmonsja James A Simmons added a comment - - edited The SLES11 SP [1,2] platforms have been fix. We should keep this ticket open until fc18 is addresses for 2.5. Also we really should push this patch upstream and have the distributions pick this up. I recommend linking this ticket to LU-20 .
            simmonsja James A Simmons added a comment - My patch is at http://review.whamcloud.com/6168 .

            I have a patch for SLES11 SP2 which I'm testing right now. Once it passes I will push it to gerrit.

            simmonsja James A Simmons added a comment - I have a patch for SLES11 SP2 which I'm testing right now. Once it passes I will push it to gerrit.

            Oh while we are fixing the kernel quota issues I like to add this into the patch as well.

            http://git.kernel.org/cgit/linux/kernel/git/tytso/ext4.git/commit/?id=c3ad83d9efdfe6a86efd44945a781f00c879b7b4

            simmonsja James A Simmons added a comment - Oh while we are fixing the kernel quota issues I like to add this into the patch as well. http://git.kernel.org/cgit/linux/kernel/git/tytso/ext4.git/commit/?id=c3ad83d9efdfe6a86efd44945a781f00c879b7b4

            Started to look at this patch for both FC18 and SLES11 SP2. For FC18 server support we need several patches to make it build. Because of this I doubt it will make it in the 2.4 release. Now the SLES11 SP2 support works with master. What I was thinking is to break up the LU-1812 patch into the FC18 part and the SLES11SP2 code into this patch. I could add in this fix as well. Would you be okay with that?

            simmonsja James A Simmons added a comment - Started to look at this patch for both FC18 and SLES11 SP2. For FC18 server support we need several patches to make it build. Because of this I doubt it will make it in the 2.4 release. Now the SLES11 SP2 support works with master. What I was thinking is to break up the LU-1812 patch into the FC18 part and the SLES11SP2 code into this patch. I could add in this fix as well. Would you be okay with that?

            My mistake. I thought there were different kernel series for RHEL 6.3 and 6.4, but this is only true for ldiskfs.

            adilger Andreas Dilger added a comment - My mistake. I thought there were different kernel series for RHEL 6.3 and 6.4, but this is only true for ldiskfs.
            laisiyao Lai Siyao added a comment -

            Andreas, this patch is against VFS code, and can support RHEL6.3/6.4 with the same set of code.

            James, it will be great if you can help port to SLES11SP2 and FC18 kernels, and IMO you don't need to include this in LU-1812 patch, because this is a performance improvement patch which doesn't affect functionality.

            laisiyao Lai Siyao added a comment - Andreas, this patch is against VFS code, and can support RHEL6.3/6.4 with the same set of code. James, it will be great if you can help port to SLES11SP2 and FC18 kernels, and IMO you don't need to include this in LU-1812 patch, because this is a performance improvement patch which doesn't affect functionality.

            The question I have is do I included this fix with my LU-1812 patch for SLES11 SP2 kernel support or as a separate patch? It depends on if their are plans to land the LU-1812 patch.

            simmonsja James A Simmons added a comment - The question I have is do I included this fix with my LU-1812 patch for SLES11 SP2 kernel support or as a separate patch? It depends on if their are plans to land the LU-1812 patch.

            People

              laisiyao Lai Siyao
              di.wang Di Wang
              Votes:
              0 Vote for this issue
              Watchers:
              19 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: