Details

    • New Feature
    • Resolution: Fixed
    • Minor
    • Lustre 2.3.0
    • Lustre 2.3.0
    • None
    • 4585

    Description

      Target of new IO engine:
      1. support new grant parameter
      2. generate better IO RPC
      3. if a page has been set for writeback, and if the covering lock is being canceled, raise prio of writeback page
      4. ...

      Attachments

        1. perf-results.xlsx
          55 kB
        2. new-perf-results.xlsx
          53 kB
        3. new_io_engine.pdf
          158 kB

        Issue Links

          Activity

            [LU-1030] new IO engine
            spitzcor Cory Spitz added a comment -

            Bob, unfortunately, it doesn't look like we have apples to apples here, but the > 15 thread I/O seems a lot worse with b2_3 (around 400 MB/s). Was there an explanation? Granted, we should focus on b2_4 now.

            spitzcor Cory Spitz added a comment - Bob, unfortunately, it doesn't look like we have apples to apples here, but the > 15 thread I/O seems a lot worse with b2_3 (around 400 MB/s). Was there an explanation? Granted, we should focus on b2_4 now.

            Testing for b2_3, results file attached. The new curve from 2.3 client/servers looks similar but slightly better than the b2_2+rpc curve in the old results, especially at the mid-range of numbers of thread (4-8). Please note that due to lack of resource I had to slightly reduce the number of OSTs in the tests. With no nodes >16Gb trying to put 2 8G ramdisk OSTs on a node led to lots of OOM. When consulted Jinshan said that numbers from only 15 OSTs were probably good enough for comparison.

            bogl Bob Glossman (Inactive) added a comment - Testing for b2_3, results file attached. The new curve from 2.3 client/servers looks similar but slightly better than the b2_2+rpc curve in the old results, especially at the mid-range of numbers of thread (4-8). Please note that due to lack of resource I had to slightly reduce the number of OSTs in the tests. With no nodes >16Gb trying to put 2 8G ramdisk OSTs on a node led to lots of OOM. When consulted Jinshan said that numbers from only 15 OSTs were probably good enough for comparison.

            all patches have been landed.

            jay Jinshan Xiong (Inactive) added a comment - all patches have been landed.

            Hi Andreas, the patch you're referring to is Chris' testing patch. The original patch is at 2009 and it doesn't change any functionality but move some functions into osc_cache.c; I don't think Chris changed the code instead of adding some test stuff.

            jay Jinshan Xiong (Inactive) added a comment - Hi Andreas, the patch you're referring to is Chris' testing patch. The original patch is at 2009 and it doesn't change any functionality but move some functions into osc_cache.c; I don't think Chris changed the code instead of adding some test stuff.

            It appears this may be causing replay-dual.sh test failures that are NOT LU-482 related:

            https://maloo.whamcloud.com/test_sets/9a9540c6-a964-11e1-ab65-52540035b04c

            Waiting for orphan cleanup...
            Waiting 0 secs for  mds-ost sync done.
            Waiting 2 secs for  mds-ost sync done.
            Waiting for destroy to be done...
            before 717896, after 717896
            

            However, it isn't 100% clear that this patch is causing the problem, or only coincidental that Chris was testing for LU-482 repeatedly using this patch. In any case, I don't see any similar problems on master, though ORI-396 at least looks similar.

            adilger Andreas Dilger added a comment - It appears this may be causing replay-dual.sh test failures that are NOT LU-482 related: https://maloo.whamcloud.com/test_sets/9a9540c6-a964-11e1-ab65-52540035b04c Waiting for orphan cleanup... Waiting 0 secs for mds-ost sync done. Waiting 2 secs for mds-ost sync done. Waiting for destroy to be done... before 717896, after 717896 However, it isn't 100% clear that this patch is causing the problem, or only coincidental that Chris was testing for LU-482 repeatedly using this patch. In any case, I don't see any similar problems on master, though ORI-396 at least looks similar.

            patch list: http://review.whamcloud.com/

            {2009,2460,2270}

            and this one is proven to reduce memory usage a lot: http://review.whamcloud.com/2514

            jay Jinshan Xiong (Inactive) added a comment - patch list: http://review.whamcloud.com/ {2009,2460,2270} and this one is proven to reduce memory usage a lot: http://review.whamcloud.com/2514

            write performance benchmark result.

            jay Jinshan Xiong (Inactive) added a comment - write performance benchmark result.

            current status of new io engine.

            jay Jinshan Xiong (Inactive) added a comment - current status of new io engine.

            People

              jay Jinshan Xiong (Inactive)
              jay Jinshan Xiong (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: