Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17525

Unaligned DIO interop with different page sizes fails

Details

    • 3
    • 9223372036854775807

    Description

      Unaligned DIO interop with differnt page sizes fails

      When doing DIO 4k <-> 64k page unaligned I/O in the brw/ptlrpc bulk ops due to a differing number of pages that can be added to the initial MD while fitting within the LNET_MTU limit.

      One solution is to restrict the initial unaligned MD to the maximum page size of all the interoperable machines. In this case aarch64 and a few other arches have 64k pages.
      Limiting the first I/O to the limit of what can fit in (LNET_MTU - 64k bytes) ensures that MDs are sized to the same maximum that all architectures can support. This is only done when initial MD is unaligned and the vectors are nominally aligned thereafter.

      When client and server page sizes are different then the client and server prepare MDs differently, each based on their local page size. When the offset
      When the system with the larger page size is writing at an offset greater than the smaller page size and the resulting (first) MD number of bytes + larger page size offset is greater then the LNET MTU (1M) the number of bytes that can fit is greater for the smaller page size system. In this case the MD (send or receiving) will match on xid/match_bits and fail on the message length check:

      lnet_try_match_md()
      {
      ....
      	} else if ((md->md_options & LNET_MD_TRUNCATE) == 0) {
      		/* this packet _really_ is too big */
      		CERROR("Matching packet from %s, match %llu"
      		       " length %d too big: %d left, %d allowed\n",
      		       libcfs_idstr(&info->mi_id), info->mi_mbits,
      		       info->mi_rlength, md->md_length - offset, mlength);
      
      		return LNET_MATCHMD_DROP;
      ...
      }
      

      This then triggers a resend, both sides recompute and resend, however the lengths are still wrong so the I/O never completes.

      So adjust the fitting logic in {__ptlrpc_prep_bulk_page()} for the first MD when all of the following is true:

      • I/O is direct-io
      • write is not aligned on the largest allowed page_size (64k) boundary
      • offset is > smallest page size (MD_MIN_INTEROP_PAGE_SIZE)

      For interop the first page is assumed to be 64k which then causes the smaller paged system
      to stop adding pages/bytes to the MD at the same point as the larger pages system except when:

      • number of bytes + 64k offset <= LNET_MTU
        due to the last page # of bytes falling short of the MTU limit, in this case the extraneous MD is
        collapsed back as only a single MD is needed / used for this bulk I/O.

      A quick survey of systems with page sizes (or configurable PAGE_SIZE) that Linux supports shows a few uncommon architectures that support page sizes > 64k however those systems are also configurable for 64k (or smaller) page sizes.
      In addition no current supported platform appears to allow a page size of less than 4k. Therefore restricting lustre to 4k to 64k page sizes (along with the MD_MAX_INTEROP_PAGE_SIZE) should not be controversial.

      Finally to accommodate the possible additional MD needed for a full bulk I/o that is also restricted due to offset and page alignment increase the maximum to PTLRPC_BULK_OPS_COUNT + 1.
      To do this we have to double the theoretical maximum from 6 bits to 7 to correctly deal with the mbits/xid logic.

      Finally to indicate to the server that a client has in fact adjusted the MD size(s) for 64k alignment the unused lower 16 bits of struct obd_ioobj.ioo_max_brw can be used for flags of which one bit can be used to indicate OBD_IOOBJ_INTEROP_PAGE_ALIGNMENT is needed.

      Without this patch a 64k unaligned I/O where the client and server have different native page sizes cannot agree on how big the MD is (one side will abort MD with 'too big' [see: lnet_try_match_md() => LNET_MATCHMD_DROP] for the allocated space and trigger a retry, but the MD math never changes so the effect is a hang.

      Attachments

        Issue Links

          Activity

            [LU-17525] Unaligned DIO interop with different page sizes fails
            adilger Andreas Dilger made changes -
            Labels Original: always_except arm arm-server ppc64le New: arm arm-server ppc64le
            maloo Maloo made changes -
            Remote Link New: This issue links to "Page (Whamcloud Community Wiki)" [ 39795 ]
            maloo Maloo made changes -
            Remote Link New: This issue links to "Page (Whamcloud Community Wiki)" [ 39454 ]
            pjones Peter Jones made changes -
            Link New: This issue is related to EX-10978 [ EX-10978 ]
            pjones Peter Jones made changes -
            Resolution New: Fixed [ 1 ]
            Status Original: Reopened [ 4 ] New: Resolved [ 5 ]
            pjones Peter Jones added a comment -

            Merged for 2.16

            pjones Peter Jones added a comment - Merged for 2.16
            maloo Maloo made changes -
            Remote Link New: This issue links to "Page (Whamcloud Community Wiki)" [ 39255 ]

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/56775/
            Subject: LU-17525 llite: soft fail unaligned dio
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: a5546155a87a245f40062b68d26e53bed7dca44f

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/56775/ Subject: LU-17525 llite: soft fail unaligned dio Project: fs/lustre-release Branch: master Current Patch Set: Commit: a5546155a87a245f40062b68d26e53bed7dca44f
            gerrit Gerrit Updater added a comment - - edited

            "Shaun Tancheff <shaun.tancheff@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56789
            Subject: LU-17525 llite: soft fail unaligned dio
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: ab0d6222421e9e5dc68cfa4afbc55da129eebdfe

            Sad that such an auspicious patch number 56789 was abandoned. ;-(

            gerrit Gerrit Updater added a comment - - edited "Shaun Tancheff <shaun.tancheff@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56789 Subject: LU-17525 llite: soft fail unaligned dio Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: ab0d6222421e9e5dc68cfa4afbc55da129eebdfe Sad that such an auspicious patch number 56789 was abandoned. ;-(
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-18366 [ LU-18366 ]

            People

              stancheff Shaun Tancheff
              stancheff Shaun Tancheff
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: