Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17525

Unaligned DIO interop with different page sizes fails

Details

    • 3
    • 9223372036854775807

    Description

      Unaligned DIO interop with differnt page sizes fails

      When doing DIO 4k <-> 64k page unaligned I/O in the brw/ptlrpc bulk ops due to a differing number of pages that can be added to the initial MD while fitting within the LNET_MTU limit.

      One solution is to restrict the initial unaligned MD to the maximum page size of all the interoperable machines. In this case aarch64 and a few other arches have 64k pages.
      Limiting the first I/O to the limit of what can fit in (LNET_MTU - 64k bytes) ensures that MDs are sized to the same maximum that all architectures can support. This is only done when initial MD is unaligned and the vectors are nominally aligned thereafter.

      When client and server page sizes are different then the client and server prepare MDs differently, each based on their local page size. When the offset
      When the system with the larger page size is writing at an offset greater than the smaller page size and the resulting (first) MD number of bytes + larger page size offset is greater then the LNET MTU (1M) the number of bytes that can fit is greater for the smaller page size system. In this case the MD (send or receiving) will match on xid/match_bits and fail on the message length check:

      lnet_try_match_md()
      {
      ....
      	} else if ((md->md_options & LNET_MD_TRUNCATE) == 0) {
      		/* this packet _really_ is too big */
      		CERROR("Matching packet from %s, match %llu"
      		       " length %d too big: %d left, %d allowed\n",
      		       libcfs_idstr(&info->mi_id), info->mi_mbits,
      		       info->mi_rlength, md->md_length - offset, mlength);
      
      		return LNET_MATCHMD_DROP;
      ...
      }
      

      This then triggers a resend, both sides recompute and resend, however the lengths are still wrong so the I/O never completes.

      So adjust the fitting logic in {__ptlrpc_prep_bulk_page()} for the first MD when all of the following is true:

      • I/O is direct-io
      • write is not aligned on the largest allowed page_size (64k) boundary
      • offset is > smallest page size (MD_MIN_INTEROP_PAGE_SIZE)

      For interop the first page is assumed to be 64k which then causes the smaller paged system
      to stop adding pages/bytes to the MD at the same point as the larger pages system except when:

      • number of bytes + 64k offset <= LNET_MTU
        due to the last page # of bytes falling short of the MTU limit, in this case the extraneous MD is
        collapsed back as only a single MD is needed / used for this bulk I/O.

      A quick survey of systems with page sizes (or configurable PAGE_SIZE) that Linux supports shows a few uncommon architectures that support page sizes > 64k however those systems are also configurable for 64k (or smaller) page sizes.
      In addition no current supported platform appears to allow a page size of less than 4k. Therefore restricting lustre to 4k to 64k page sizes (along with the MD_MAX_INTEROP_PAGE_SIZE) should not be controversial.

      Finally to accommodate the possible additional MD needed for a full bulk I/o that is also restricted due to offset and page alignment increase the maximum to PTLRPC_BULK_OPS_COUNT + 1.
      To do this we have to double the theoretical maximum from 6 bits to 7 to correctly deal with the mbits/xid logic.

      Finally to indicate to the server that a client has in fact adjusted the MD size(s) for 64k alignment the unused lower 16 bits of struct obd_ioobj.ioo_max_brw can be used for flags of which one bit can be used to indicate OBD_IOOBJ_INTEROP_PAGE_ALIGNMENT is needed.

      Without this patch a 64k unaligned I/O where the client and server have different native page sizes cannot agree on how big the MD is (one side will abort MD with 'too big' [see: lnet_try_match_md() => LNET_MATCHMD_DROP] for the allocated space and trigger a retry, but the MD math never changes so the effect is a hang.

      Attachments

        Issue Links

          Activity

            [LU-17525] Unaligned DIO interop with different page sizes fails
            pjones Peter Jones added a comment -

            Merged for 2.16

            pjones Peter Jones added a comment - Merged for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/56775/
            Subject: LU-17525 llite: soft fail unaligned dio
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: a5546155a87a245f40062b68d26e53bed7dca44f

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/56775/ Subject: LU-17525 llite: soft fail unaligned dio Project: fs/lustre-release Branch: master Current Patch Set: Commit: a5546155a87a245f40062b68d26e53bed7dca44f
            gerrit Gerrit Updater added a comment - - edited

            "Shaun Tancheff <shaun.tancheff@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56789
            Subject: LU-17525 llite: soft fail unaligned dio
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: ab0d6222421e9e5dc68cfa4afbc55da129eebdfe

            Sad that such an auspicious patch number 56789 was abandoned. ;-(

            gerrit Gerrit Updater added a comment - - edited "Shaun Tancheff <shaun.tancheff@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56789 Subject: LU-17525 llite: soft fail unaligned dio Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: ab0d6222421e9e5dc68cfa4afbc55da129eebdfe Sad that such an auspicious patch number 56789 was abandoned. ;-(

            See my comments in LU-18368.

            adilger Andreas Dilger added a comment - See my comments in LU-18368 .
            yujian Jian Yu added a comment -

            After reverting commit ff018bb77a37 (LU-18284 llite: disallow udio exceptions) from master branch, sanity tests passed with 2.15.5 server:
            https://testing.whamcloud.com/test_sets/8a856ba2-26b6-4339-ba14-488bb625ebea

            yujian Jian Yu added a comment - After reverting commit ff018bb77a37 ( LU-18284 llite: disallow udio exceptions) from master branch, sanity tests passed with 2.15.5 server: https://testing.whamcloud.com/test_sets/8a856ba2-26b6-4339-ba14-488bb625ebea

            I'm wondering if this issue is being hit (e.g. for "lfs migrate") when a "normal" DIO read is issued with a proper PAGE_SIZE sized and aligned buffer (so not unaligned), but the file size is not a multiple of this? That would happen when trying to migrate a file that is not an even number of pages in size. It may be enough to determine which check is causing this to be blocked and only allow this one case.

            adilger Andreas Dilger added a comment - I'm wondering if this issue is being hit (e.g. for " lfs migrate ") when a "normal" DIO read is issued with a proper PAGE_SIZE sized and aligned buffer (so not unaligned), but the file size is not a multiple of this? That would happen when trying to migrate a file that is not an even number of pages in size. It may be enough to determine which check is causing this to be blocked and only allow this one case.

            "Shaun Tancheff <shaun.tancheff@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56775
            Subject: LU-17525 llite: soft fail unaligned dio
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: c49d578c87cb165fa909643d54f3801ae6b6a27b

            gerrit Gerrit Updater added a comment - "Shaun Tancheff <shaun.tancheff@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56775 Subject: LU-17525 llite: soft fail unaligned dio Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: c49d578c87cb165fa909643d54f3801ae6b6a27b

            I can't say if "lfs migrate" is the only application that depends on this behavior. At least one other test tool is expecting this to work, and I suspect some user applications may be the same. IMHO, no fix should be needed to "lfs migrate", it should "just work" however it was previously working, and would likely serve as a useful test tool to see if this code is working.

            adilger Andreas Dilger added a comment - I can't say if "lfs migrate" is the only application that depends on this behavior. At least one other test tool is expecting this to work, and I suspect some user applications may be the same. IMHO, no fix should be needed to "lfs migrate", it should "just work" however it was previously working, and would likely serve as a useful test tool to see if this code is working.

            Likely a consequence of the the unaligned dio flag quashing.

            Should we treat these as regressions and re-enable some unaligned dio exceptions?
            Or should we fix lfs migrate and except these tests when unaligned dio is not available?

            I think we want to continue with the migrate fix and add the unaligned dio exceptions.

            stancheff Shaun Tancheff added a comment - Likely a consequence of the the unaligned dio flag quashing. Should we treat these as regressions and re-enable some unaligned dio exceptions? Or should we fix lfs migrate and except these tests when unaligned dio is not available? I think we want to continue with the migrate fix and add the unaligned dio exceptions.

            It looks like sanity test_398o has also started failing in interop testing. It is doing a 1-byte DIO read, which (for better or worse) was previously working with older clients/servers, but not with the master client and any old server:
            https://testing.whamcloud.com/search?horizon=2332800&status%5B%5D=FAIL&test_set_script_id=f9516376-32bc-11e0-aaee-52540025f9ae&sub_test_script_id=a18212de-787b-40c6-8b6c-b63a4e43f85e&source=sub_tests#redirect

            There are several other tests doing "lfs migrate" (e.g. test_389s) that are failing in b2_14 interop testing:
            https://testing.whamcloud.com/test_sets/c2e89b09-33e9-49d2-b82e-61b6bfcbce00

            adilger Andreas Dilger added a comment - It looks like sanity test_398o has also started failing in interop testing. It is doing a 1-byte DIO read, which (for better or worse) was previously working with older clients/servers, but not with the master client and any old server: https://testing.whamcloud.com/search?horizon=2332800&status%5B%5D=FAIL&test_set_script_id=f9516376-32bc-11e0-aaee-52540025f9ae&sub_test_script_id=a18212de-787b-40c6-8b6c-b63a4e43f85e&source=sub_tests#redirect There are several other tests doing "lfs migrate" (e.g. test_389s) that are failing in b2_14 interop testing: https://testing.whamcloud.com/test_sets/c2e89b09-33e9-49d2-b82e-61b6bfcbce00

            People

              stancheff Shaun Tancheff
              stancheff Shaun Tancheff
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: