Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4193

increase maximum default ldiskfs journal size to 4GB

Details

    • Improvement
    • Resolution: Fixed
    • Major
    • Lustre 2.6.0
    • None
    • None
    • 11351

    Description

      Testing has shown that a larger MDT journal size can increase performance significantly now that SMP scaling allows the MDT code to perform more operations per second. I'd like to increase the default journal size for newly formatted MDTs.

      Performance test results shown with a 4GB journal size on the following hardware:

      Intel(R) Xeon(R) CPU E5-2637 v2 @ 3.50GHz IVY BRIDGE
      128 GB RAM
      6x INTEL SSD S3700 400GB
      
      1x Intel TrueScale Card QDR
      2x Intel 82599EB 10-Gigabit SFI/SFP+
      4x Intel I350 Gigabit
      
      CentOS 6.4
      Lustre 2.4.1 + CLIO simplification patch
      

      Attachments

        Issue Links

          Activity

            [LU-4193] increase maximum default ldiskfs journal size to 4GB

            Seeing as there is no data showing dramatic performance increases with a larger OST journal, I'm going to close this bug. We can always open a new bug for OST journal size changes, if evidence shows us it is needed.

            I also expect that the large journal transaction reservations fixed in LU-4611 may also mitigate the need for such a large journal, but that needs to be tested separately.

            adilger Andreas Dilger added a comment - Seeing as there is no data showing dramatic performance increases with a larger OST journal, I'm going to close this bug. We can always open a new bug for OST journal size changes, if evidence shows us it is needed. I also expect that the large journal transaction reservations fixed in LU-4611 may also mitigate the need for such a large journal, but that needs to be tested separately.

            The testing was done on DDN 12K. They have 32GB of cache for the SAS hard drives.

            simmonsja James A Simmons added a comment - The testing was done on DDN 12K. They have 32GB of cache for the SAS hard drives.

            Hi James,
            I have done my experiments using very fast devices and controllers.
            For my experiments on MDTs, I saw a huge impact increasing from 400 to 4GB in a SSD context. The metadata performance was dominated by disk I/O with small journal size.

            I have also see benefits (not so evident) on OST but only with fast controllers and big cache (>8GB). Which is your hardware context for OSTs?

            thanks

            gabriele.paciucci Gabriele Paciucci (Inactive) added a comment - Hi James, I have done my experiments using very fast devices and controllers. For my experiments on MDTs, I saw a huge impact increasing from 400 to 4GB in a SSD context. The metadata performance was dominated by disk I/O with small journal size. I have also see benefits (not so evident) on OST but only with fast controllers and big cache (>8GB). Which is your hardware context for OSTs? thanks

            Yes I did test with OSTs and 4GB internal journals. I was surprise but going from 400MB to 4GB journals had very little impact on performance.

            simmonsja James A Simmons added a comment - Yes I did test with OSTs and 4GB internal journals. I was surprise but going from 400MB to 4GB journals had very little impact on performance.

            Patch 8111 has landed for 2.6.0 so the MDTs will now have 4GB journals by default, if the MDT is large enough.

            I also note that http://review.whamcloud.com/9258 will reduce the credit reservation for create operations, so it might avoid the need for such a large journal and/or improve performance further.

            James, did you get an test results with OSTs and 4GB journals? If not, I'm inclined to close this bug and you can open a separate bug to track such a change.

            adilger Andreas Dilger added a comment - Patch 8111 has landed for 2.6.0 so the MDTs will now have 4GB journals by default, if the MDT is large enough. I also note that http://review.whamcloud.com/9258 will reduce the credit reservation for create operations, so it might avoid the need for such a large journal and/or improve performance further. James, did you get an test results with OSTs and 4GB journals? If not, I'm inclined to close this bug and you can open a separate bug to track such a change.

            Using your patch I also increased the OST limit to 4GB. Collecting data.

            simmonsja James A Simmons added a comment - Using your patch I also increased the OST limit to 4GB. Collecting data.

            James, no testing has been done for the OST create rate yet. The patch is only changing the journal size limit for the MDT so far. We could assume the OST create rate with the 400MB journal is approximately the same as the MDT with 400MB journal (about 1/2 of 4GB journal) so if there are at least twice as many OSTs as MDTs the object create rate should be enough.

            It would be great to verify actual OST object create performance, since OST object create is single threaded, and it may not be able to keep up with hundreds of MDS create threads running in parallel. In theory it is possible to run echo_client on the OST device to create objects, but this would probably not exercise the batches create that the MDS uses. It is probably worthwhile to file a separate bug for implementing any needed infrastructure and testing this.

            adilger Andreas Dilger added a comment - James, no testing has been done for the OST create rate yet. The patch is only changing the journal size limit for the MDT so far. We could assume the OST create rate with the 400MB journal is approximately the same as the MDT with 400MB journal (about 1/2 of 4GB journal) so if there are at least twice as many OSTs as MDTs the object create rate should be enough. It would be great to verify actual OST object create performance, since OST object create is single threaded, and it may not be able to keep up with hundreds of MDS create threads running in parallel. In theory it is possible to run echo_client on the OST device to create objects, but this would probably not exercise the batches create that the MDS uses. It is probably worthwhile to file a separate bug for implementing any needed infrastructure and testing this.

            Have numbers for OSTs been done as well?

            simmonsja James A Simmons added a comment - Have numbers for OSTs been done as well?

            Hi Andreas,
            on the same hardware, with the same configuration for mrs-survey (32 threads, 32 directories, 400K files each dirs), I have run other benchmarks.
            I'm using the 2.3.11 lustre server version included in IEEL 1.0 and not 2.4.1 as in the previous experiment.

            gabriele.paciucci Gabriele Paciucci (Inactive) added a comment - Hi Andreas, on the same hardware, with the same configuration for mrs-survey (32 threads, 32 directories, 400K files each dirs), I have run other benchmarks. I'm using the 2.3.11 lustre server version included in IEEL 1.0 and not 2.4.1 as in the previous experiment.
            adilger Andreas Dilger added a comment - http://review.whamcloud.com/8111

            People

              adilger Andreas Dilger
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: