Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.15.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      There is a large degree of inefficiency/wasted time in the DIO/AIO path.  This does not show up for DIO normally because of the waiting model. but it shows up easily in AIO.

      This ticket is to cover a set of improvements to DIO/AIO performance, which will also improve DIO performance once the waiting model is adjusted.  (More on this in LU-13798.)

      There is a grab bag of patches to be submitted here, and some further proposals that will probably end up in other tickets.

      The essence of the improvements is that all pages in a DIO submission are the same, and therefore much of the work done on a per-page basis is irrelevant and can be skipped for DIO.  Note this statement is still compatible with unaligned DIO if it can be implemented in the future - In the case of unaligned DIO, only the first and last pages are different, and that can still be handled.

      Patches and benchmarks on each patch forthcoming.

      The total effect of the initial set of patches on my testbed is to raise AIO/DIO performance from around 5 GiB/s to around 9 GiB/s.  I'll get more in to what else can be done shortly. 

      Attachments

        Issue Links

          Activity

            [LU-13799] DIO/AIO efficiency improvements

            sihara how many OSTs did you have in your system and how many stripes did you use?

            lctl set_param osc..max_pages_per_rpc=16M osc..checksums=0 osc.*.max_rpcs_in_flight=16 

             

            I'm just curious because it's possible some of these were limited by max_rpcs_in_flight.

            Still, that performance is excellent - significantly better than I expected.

            paf0186 Patrick Farrell added a comment - sihara  how many OSTs did you have in your system and how many stripes did you use? lctl set_param osc. .max_pages_per_rpc=16M osc. .checksums=0 osc.*.max_rpcs_in_flight=16    I'm just curious because it's possible some of these were limited by max_rpcs_in_flight. Still, that performance is excellent - significantly better than I expected.

            sihara:
            https://review.whamcloud.com/#/c/44445/ is one of the EXA6 versions of these patches.  Did you mean to link a different one?

            paf0186 Patrick Farrell added a comment - sihara : https://review.whamcloud.com/#/c/44445/  is one of the EXA6 versions of these patches.  Did you mean to link a different one?

            https://jira.whamcloud.com/secure/attachment/40203/LU-13799.xlsx
            This is the performance comparison of master vs master + patch LU-13799 series (https://review.whamcloud.com/#/c/44445/) on the real storage system.
            1.6GB/sec vs 13GB/sec Write, 1.5GB/sec vs 14GB/sec Read at xfersize=64m in Lustre stripesize=1m.

            sihara Shuichi Ihara added a comment - https://jira.whamcloud.com/secure/attachment/40203/LU-13799.xlsx This is the performance comparison of master vs master + patch LU-13799 series ( https://review.whamcloud.com/#/c/44445/ ) on the real storage system. 1.6GB/sec vs 13GB/sec Write, 1.5GB/sec vs 14GB/sec Read at xfersize=64m in Lustre stripesize=1m.

            A comment on performance.  I have not retested this recently, so definitely take this with some salt.

            The currently landed set of patches should put us at around 7 GiB/s.  The lov caching patch should take that to around 8 GiB/s.

            The remaining set of patches should push the rest of the way to around 10 GiB/s.

            Then that's where this ticket stops and the work is picked up in other tickets.

            paf0186 Patrick Farrell added a comment - A comment on performance.  I have not retested this recently, so definitely take this with some salt. The currently landed set of patches should put us at around 7 GiB/s.  The lov caching patch should take that to around 8 GiB/s. The remaining set of patches should push the rest of the way to around 10 GiB/s. Then that's where this ticket stops and the work is picked up in other tickets.
            paf0186 Patrick Farrell added a comment - - edited

            Oleg,

            Thanks for getting those merged in.

            The latest drop to master covers all of the fixes associated with other tickets that I think are required, so this series should be good to go now.

            Here's the current status of the patches here...  I am still hoping to get everything here in to 2.15.  There should not be any more patches added to this ticket at this point.

            This patch needs review again, but hasn't much changed:
            https://review.whamcloud.com/39445/

            After that, patches get a little more complicated.  I've rebased all the remaining patches on to tip of master + the test 398b improvement, to get them some more testing.  (398b improvement: https://review.whamcloud.com/44321/)

            Here's the set:
            https://review.whamcloud.com/44154/
            https://review.whamcloud.com/44153/
            https://review.whamcloud.com/44209/

            https://review.whamcloud.com/39443/

            https://review.whamcloud.com/39438/

            https://review.whamcloud.com/44268/

            https://review.whamcloud.com/44293/

             

            paf0186 Patrick Farrell added a comment - - edited Oleg, Thanks for getting those merged in. The latest drop to master covers all of the fixes associated with other tickets that I think are required, so this series should be good to go now. Here's the current status of the patches here...  I am still hoping to get everything here in to 2.15.  There should not be any more patches added to this ticket at this point. This patch needs review again, but hasn't much changed: https://review.whamcloud.com/39445/ After that, patches get a little more complicated.  I've rebased all the remaining patches on to tip of master + the test 398b improvement, to get them some more testing.  (398b improvement: https://review.whamcloud.com/44321/ ) Here's the set: https://review.whamcloud.com/44154/ https://review.whamcloud.com/44153/ https://review.whamcloud.com/44209/ https://review.whamcloud.com/39443/ https://review.whamcloud.com/39438/ https://review.whamcloud.com/44268/ https://review.whamcloud.com/44293/  

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39482/
            Subject: LU-13799 osc: Improve osc_queue_sync_pages
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 87c4535f7a5d239aad4e936545a72d0199ccd9ba

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39482/ Subject: LU-13799 osc: Improve osc_queue_sync_pages Project: fs/lustre-release Branch: master Current Patch Set: Commit: 87c4535f7a5d239aad4e936545a72d0199ccd9ba

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39448/
            Subject: LU-13799 clio: Skip prep for transients
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: b8553978789ad3dd0776c0543dea4641804d0ac5

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39448/ Subject: LU-13799 clio: Skip prep for transients Project: fs/lustre-release Branch: master Current Patch Set: Commit: b8553978789ad3dd0776c0543dea4641804d0ac5

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39447/
            Subject: LU-13799 llite: Adjust dio refcounting
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 1e4d10af3909452b0eee1f99010d80aeb01d42a7

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39447/ Subject: LU-13799 llite: Adjust dio refcounting Project: fs/lustre-release Branch: master Current Patch Set: Commit: 1e4d10af3909452b0eee1f99010d80aeb01d42a7

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39446/
            Subject: LU-13799 lov: Improve DIO submit
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: d31647c017a390c9553a38d82c02fe7001a33a05

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39446/ Subject: LU-13799 lov: Improve DIO submit Project: fs/lustre-release Branch: master Current Patch Set: Commit: d31647c017a390c9553a38d82c02fe7001a33a05

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39441/
            Subject: LU-13799 llite: Remove transient page counting
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 587e5aa8342980f761930235714add1cca80686b

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39441/ Subject: LU-13799 llite: Remove transient page counting Project: fs/lustre-release Branch: master Current Patch Set: Commit: 587e5aa8342980f761930235714add1cca80686b

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39442/
            Subject: LU-13799 llite: Modify AIO/DIO reference counting
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: b3de247b76b410101e166b024d65e2bf23d401ba

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39442/ Subject: LU-13799 llite: Modify AIO/DIO reference counting Project: fs/lustre-release Branch: master Current Patch Set: Commit: b3de247b76b410101e166b024d65e2bf23d401ba

            People

              paf0186 Patrick Farrell
              paf0186 Patrick Farrell
              Votes:
              0 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: