Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      As Andreas noted in LU-13798, the faster DIO path makes it interesting to switch from buffered i/o to direct i/o at larger sizes.

      This is actually pretty easy:
      If the buffered i/o meets the alignment requirements for DIO (buffers is page aligned and i/o size is a multiple of page size), you can simply set the DIO flag internally in Lustre, and the kernel will direct the i/o to the direct i/o code.  (In newer kernels, this does not require manipulating the O_DIRECT flag on the file, which is good because that's likely unsafe.)

      If the buffered i/o is not valid as direct i/o, the usual "fall back to buffered i/o" mechanism (implemented as part of LU-4198) happens automatically (just return 0 instead of -EINVAL).

       

      The question, then, is how to decide when to switch from buffered i/o to DIO.  I have a proposed solution that I haven't implemented yet*, which I'll describe here.
      *(I have done BIO (buffered i/o) as DIO, but I used a simple "Try all BIO as DIO" patch, not intelligent switching.)

      Essentially, direct i/o performance is a function of how much parallelism we can get by splitting the i/o, and the sync time of the back end storage.

      For example, on my flash back end, I see a benefit from switching 1 MiB BIO to 4x256 KiB DIO (1.9 GiB/s instead of 1.3 GiB/s).  But a spinning disk back end would require a much larger size for this change to make sense.

       

      So the basic question to answer is, what size of i/o do we submit?  How small & in to how many chunks do we split up the i/o?

      Note if our submitted i/o size at the higher levels is larger than stripe size or RPC size, it's automatically split to those boundaries, so if we start submitting at very large sizes, we split on those boundaries instead.

      Here's my thinking.

      We have two basic tunables, one of which has a version for rotational and non-rotational backends.

      The tunables are "preferred minimum i/o size" and "desired submission concurrency" (I'm not proud of the name of the second one, open to suggestions...).

       

      So, consider a situation where we have a preferred minimum size of 256 KiB and a desired submission concurrency of 8.

      If we do a 256 KiB BIO, that is done as buffered i/o.  If we do a 400 KiB BIO, still buffered.  But if we do a 512 KiB BIO, we split it in two 256 KiB DIOs.  A 700 KiB BIO is 2x256 KiB +188 KiB DIOs.  (These thresholds may be too small.)

      Now, consider larger sizes.  1 MiB becomes 4x256 KiB.  Then, 2 MiB 8x256 KiB submissions.

      But at larger sizes, the desired submission concurrency comes in to play.  Consider 4 MiB.  4 MiB/8 = 512 KiB.  So we split 4 MiB in to 8x512 KiB.  This model prevents us from submitting many tiny i/os once the i/o size is large enough.

      Note that I have not tested this much yet - I think 8 might be low for submission concurrency and 16 might be more desirable.  Basically, this is "try to cut the i/o in to this many RPCs", so perhaps concurrency is the wrong word...?

      Also, as I noted earlier, the preferred i/o size will be very different for spinning disk vs non-rotational media.  So we will need two values for this (I am thinking we default spinning disk to some multiple of rotational and let people override), and we will also need to make this info available on the client.

      I'll ask about that in a comment.  I've also got some benchmark info I can share later - But, basically, buffered i/o through this path performs exactly like DIO through this path.

      Attachments

        Issue Links

          Activity

            [LU-13802] New i/o path: Buffered i/o as DIO

            "Patrick Farrell <pfarrell@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/54099
            Subject: LU-13802 ptlrpc: correctly remove inflight request
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: f74b9994bfe48d94ecdadf1f6b9aae0a1df0f8bb

            gerrit Gerrit Updater added a comment - "Patrick Farrell <pfarrell@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/54099 Subject: LU-13802 ptlrpc: correctly remove inflight request Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: f74b9994bfe48d94ecdadf1f6b9aae0a1df0f8bb

            "Patrick Farrell <pfarrell@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/54070
            Subject: LU-13802 tests: hang testing
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: a30663978cd0304ffb64570b65b9f3a03119e798

            gerrit Gerrit Updater added a comment - "Patrick Farrell <pfarrell@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/54070 Subject: LU-13802 tests: hang testing Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: a30663978cd0304ffb64570b65b9f3a03119e798

            "Patrick Farrell <pfarrell@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52822
            Subject: LU-13802 llite: add ZFS check for hybrid IO
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: c5ada8cbb710019c7fe5671db06bb514d173f3ca

            gerrit Gerrit Updater added a comment - "Patrick Farrell <pfarrell@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52822 Subject: LU-13802 llite: add ZFS check for hybrid IO Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: c5ada8cbb710019c7fe5671db06bb514d173f3ca

            Add test using the fail loc to force a switch (It's a good fail loc but doesn't have an obvious use right now)

            paf0186 Patrick Farrell added a comment - Add test using the fail loc to force a switch (It's a good fail loc but doesn't have an obvious use right now)

            Making some notes on what remains to do here.

            Add tests of tweaking the hybrid IO threshold for rotational and non-rotational

            Need to make the existing threshold test more intelligent, using an automatically adjusting IO size for the threshold which is in effect (rotational or non-rotational)

            Need to add two types of racing test: Multiple process on one client, multiple process on two clients (sanityn)

            Bring in the contention code.  The contention detection and management on the client side needs to be split out in to a number of patches.

            paf0186 Patrick Farrell added a comment - Making some notes on what remains to do here. Add tests of tweaking the hybrid IO threshold for rotational and non-rotational Need to make the existing threshold test more intelligent, using an automatically adjusting IO size for the threshold which is in effect (rotational or non-rotational) Need to add two types of racing test: Multiple process on one client, multiple process on two clients (sanityn) Bring in the contention code.  The contention detection and management on the client side needs to be split out in to a number of patches.

            "Patrick Farrell <pfarrell@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52776
            Subject: LU-13802 llite: add file nonrotational check
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: b88060157ab67c90d2724f1de86203b2a4708953

            paf0186 Patrick Farrell added a comment - "Patrick Farrell <pfarrell@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52776 Subject: LU-13802 llite: add file nonrotational check Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: b88060157ab67c90d2724f1de86203b2a4708953

            gerrit added a comment - 7 minutes ago
            "Patrick Farrell <pfarrell@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52777
            Subject: LU-13802 llite: hybrid IO HDD thresholds
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 83f416731d2d5455ce0255202b7aa3c1f872da13
            Edit
            gerrit added a comment - 7 minutes ago
            "Patrick Farrell <pfarrell@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52778
            Subject: LU-13802 tests: hybrid IO consistency test
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1

            paf0186 Patrick Farrell added a comment - gerrit  added a comment - 7 minutes ago "Patrick Farrell <pfarrell@whamcloud.com>" uploaded a new patch:  https://review.whamcloud.com/c/fs/lustre-release/+/52777 Subject:  LU-13802  llite: hybrid IO HDD thresholds Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 83f416731d2d5455ce0255202b7aa3c1f872da13 Edit gerrit  added a comment - 7 minutes ago "Patrick Farrell <pfarrell@whamcloud.com>" uploaded a new patch:  https://review.whamcloud.com/c/fs/lustre-release/+/52778 Subject:  LU-13802  tests: hybrid IO consistency test Project: fs/lustre-release Branch: master Current Patch Set: 1

            "Patrick Farrell <pfarrell@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52703
            Subject: LU-13802 llite: tag switched hybrid IOs
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: bb640da69daa8a65bd1c8fa3a986465ac8d327e3

            gerrit Gerrit Updater added a comment - "Patrick Farrell <pfarrell@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52703 Subject: LU-13802 llite: tag switched hybrid IOs Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: bb640da69daa8a65bd1c8fa3a986465ac8d327e3

            Thanks, Andreas.  (I had accidentally placed these on LU-13804.)

            paf0186 Patrick Farrell added a comment - Thanks, Andreas.  (I had accidentally placed these on LU-13804 .)

            "Patrick Farrell <pfarrell@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52596
            Subject: LU-13802 llite: add hybrid IO switch proc stats
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: b80e468a5f98b433554ebd67610639ed70be8cf7

            adilger Andreas Dilger added a comment - "Patrick Farrell <pfarrell@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52596 Subject: LU-13802 llite: add hybrid IO switch proc stats Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: b80e468a5f98b433554ebd67610639ed70be8cf7

            "Patrick Farrell <pfarrell@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52595
            Subject: LU-13802 llite: add read & write switch thresholds
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 8808647ec827c7c000f414787093c4c168cc5d30

            adilger Andreas Dilger added a comment - "Patrick Farrell <pfarrell@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52595 Subject: LU-13802 llite: add read & write switch thresholds Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 8808647ec827c7c000f414787093c4c168cc5d30

            People

              paf Patrick Farrell (Inactive)
              paf0186 Patrick Farrell
              Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated: