Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17525

Unaligned DIO interop with different page sizes fails

Details

    • 3
    • 9223372036854775807

    Description

      Unaligned DIO interop with differnt page sizes fails

      When doing DIO 4k <-> 64k page unaligned I/O in the brw/ptlrpc bulk ops due to a differing number of pages that can be added to the initial MD while fitting within the LNET_MTU limit.

      One solution is to restrict the initial unaligned MD to the maximum page size of all the interoperable machines. In this case aarch64 and a few other arches have 64k pages.
      Limiting the first I/O to the limit of what can fit in (LNET_MTU - 64k bytes) ensures that MDs are sized to the same maximum that all architectures can support. This is only done when initial MD is unaligned and the vectors are nominally aligned thereafter.

      When client and server page sizes are different then the client and server prepare MDs differently, each based on their local page size. When the offset
      When the system with the larger page size is writing at an offset greater than the smaller page size and the resulting (first) MD number of bytes + larger page size offset is greater then the LNET MTU (1M) the number of bytes that can fit is greater for the smaller page size system. In this case the MD (send or receiving) will match on xid/match_bits and fail on the message length check:

      lnet_try_match_md()
      {
      ....
      	} else if ((md->md_options & LNET_MD_TRUNCATE) == 0) {
      		/* this packet _really_ is too big */
      		CERROR("Matching packet from %s, match %llu"
      		       " length %d too big: %d left, %d allowed\n",
      		       libcfs_idstr(&info->mi_id), info->mi_mbits,
      		       info->mi_rlength, md->md_length - offset, mlength);
      
      		return LNET_MATCHMD_DROP;
      ...
      }
      

      This then triggers a resend, both sides recompute and resend, however the lengths are still wrong so the I/O never completes.

      So adjust the fitting logic in {__ptlrpc_prep_bulk_page()} for the first MD when all of the following is true:

      • I/O is direct-io
      • write is not aligned on the largest allowed page_size (64k) boundary
      • offset is > smallest page size (MD_MIN_INTEROP_PAGE_SIZE)

      For interop the first page is assumed to be 64k which then causes the smaller paged system
      to stop adding pages/bytes to the MD at the same point as the larger pages system except when:

      • number of bytes + 64k offset <= LNET_MTU
        due to the last page # of bytes falling short of the MTU limit, in this case the extraneous MD is
        collapsed back as only a single MD is needed / used for this bulk I/O.

      A quick survey of systems with page sizes (or configurable PAGE_SIZE) that Linux supports shows a few uncommon architectures that support page sizes > 64k however those systems are also configurable for 64k (or smaller) page sizes.
      In addition no current supported platform appears to allow a page size of less than 4k. Therefore restricting lustre to 4k to 64k page sizes (along with the MD_MAX_INTEROP_PAGE_SIZE) should not be controversial.

      Finally to accommodate the possible additional MD needed for a full bulk I/o that is also restricted due to offset and page alignment increase the maximum to PTLRPC_BULK_OPS_COUNT + 1.
      To do this we have to double the theoretical maximum from 6 bits to 7 to correctly deal with the mbits/xid logic.

      Finally to indicate to the server that a client has in fact adjusted the MD size(s) for 64k alignment the unused lower 16 bits of struct obd_ioobj.ioo_max_brw can be used for flags of which one bit can be used to indicate OBD_IOOBJ_INTEROP_PAGE_ALIGNMENT is needed.

      Without this patch a 64k unaligned I/O where the client and server have different native page sizes cannot agree on how big the MD is (one side will abort MD with 'too big' [see: lnet_try_match_md() => LNET_MATCHMD_DROP] for the allocated space and trigger a retry, but the MD math never changes so the effect is a hang.

      Attachments

        Issue Links

          Activity

            [LU-17525] Unaligned DIO interop with different page sizes fails

            stancheff, it looks like these subtests are still failing during interop testing with older servers:
            https://testing.whamcloud.com/test_sets/c2e89b09-33e9-49d2-b82e-61b6bfcbce00
            https://testing.whamcloud.com/test_sets/67fbf2fd-3aa3-4bed-b806-ef5e1542cd5a

            They shouldn't be doing anything fancy ("lfs migrate" doing direct read/write) so it seems like there is still something wrong here...

            adilger Andreas Dilger added a comment - stancheff , it looks like these subtests are still failing during interop testing with older servers: https://testing.whamcloud.com/test_sets/c2e89b09-33e9-49d2-b82e-61b6bfcbce00 https://testing.whamcloud.com/test_sets/67fbf2fd-3aa3-4bed-b806-ef5e1542cd5a They shouldn't be doing anything fancy ("lfs migrate" doing direct read/write) so it seems like there is still something wrong here...
            pjones Peter Jones added a comment -

            Seems to be all merged for 2.16

            pjones Peter Jones added a comment - Seems to be all merged for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/55647/
            Subject: LU-17525 test: re-enable sanity tests: 56x 56xa 56xb
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: c81f140168964db07a30b7102efffc4dfe5b6c5e

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/55647/ Subject: LU-17525 test: re-enable sanity tests: 56x 56xa 56xb Project: fs/lustre-release Branch: master Current Patch Set: Commit: c81f140168964db07a30b7102efffc4dfe5b6c5e

            https://review.whamcloud.com/#/c/fs/lustre-release/+/55647 is left. That patch just enables some sanity test that failed before.

            simmonsja James A Simmons added a comment - https://review.whamcloud.com/#/c/fs/lustre-release/+/55647 is left. That patch just enables some sanity test that failed before.

            stancheff what is the current status of this issue? Is there still a problem with UDIO and 64KB pages, or has that issue been fixed?

            adilger Andreas Dilger added a comment - stancheff what is the current status of this issue? Is there still a problem with UDIO and 64KB pages, or has that issue been fixed?

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/55895/
            Subject: LU-17525 llite: unaligned DIO zfs detection
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: b07b31d44ffee44c9044cdebedf1d42a35c82929

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/55895/ Subject: LU-17525 llite: unaligned DIO zfs detection Project: fs/lustre-release Branch: master Current Patch Set: Commit: b07b31d44ffee44c9044cdebedf1d42a35c82929

            "Shaun Tancheff <shaun.tancheff@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55875
            Subject: LU-17525 test: aarch64 interop
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 62700139df2cb8a614b70b31a61074d53af3871d

            gerrit Gerrit Updater added a comment - "Shaun Tancheff <shaun.tancheff@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55875 Subject: LU-17525 test: aarch64 interop Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 62700139df2cb8a614b70b31a61074d53af3871d
            gerrit Gerrit Updater added a comment - - edited

            "Shaun Tancheff <shaun.tancheff@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55870
            Subject: LU-17525 test: aarch64 interop
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: a35218181c700605040e57960dfad13e0b756aa8

            gerrit Gerrit Updater added a comment - - edited "Shaun Tancheff <shaun.tancheff@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55870 Subject: LU-17525 test: aarch64 interop Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: a35218181c700605040e57960dfad13e0b756aa8
            pjones Peter Jones added a comment -

            As per discussion on the LWG call today, moving tickets that do not appear to be essential to fix version 2.17. If the fix lands before code freeze we will update the fix version to reflect that but we want to focus on activities on the critical path. Please speak up if you think that this issue definitely needs to be fixed before we could issue a 2.16 release.

            pjones Peter Jones added a comment - As per discussion on the LWG call today, moving tickets that do not appear to be essential to fix version 2.17. If the fix lands before code freeze we will update the fix version to reflect that but we want to focus on activities on the critical path. Please speak up if you think that this issue definitely needs to be fixed before we could issue a 2.16 release.

            "Shaun Tancheff <shaun.tancheff@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55647
            Subject: LU-17525 test: re-enable sanity tests: 56x 56xa 56xb
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: d320acd6c75a9c9f7e343537a814f25197bb5ddc

            gerrit Gerrit Updater added a comment - "Shaun Tancheff <shaun.tancheff@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55647 Subject: LU-17525 test: re-enable sanity tests: 56x 56xa 56xb Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: d320acd6c75a9c9f7e343537a814f25197bb5ddc
            gerrit Gerrit Updater added a comment - - edited

            "Shaun Tancheff <shaun.tancheff@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55646
            Subject: LU-17525 test: re-enable sanity tests: 56x 56xa 56xb
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: f9d3f4c672807582c815d70817d6dcc944fd844f

            gerrit Gerrit Updater added a comment - - edited "Shaun Tancheff <shaun.tancheff@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55646 Subject: LU-17525 test: re-enable sanity tests: 56x 56xa 56xb Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: f9d3f4c672807582c815d70817d6dcc944fd844f

            People

              stancheff Shaun Tancheff
              stancheff Shaun Tancheff
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: