Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15092

Fix logic for unaligned transfer with o2iblnd

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.15.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      It's possible for there to be an offset for the first page of a
      transfer. However, there are two bugs with this code in o2iblnd.

      The first is that this use-case will require LNET_MAX_IOV + 1 local
      RDMA fragments, but we do not specify the correct corresponding values
      for the max page list to ib_alloc_fast_reg_page_list(),
      ib_alloc_fast_reg_mr(), etc.

      The second issue is that the logic in kiblnd_setup_rd_iov() and
      kiblnd_setup_rd_kiov() attempts to obtain one more scatterlist entry
      than is actually needed. This causes the transfer to fail with -EFAULT.

      Attachments

        Issue Links

          Activity

            [LU-15092] Fix logic for unaligned transfer with o2iblnd
            cfaber Colin Faber made changes -
            Link New: This issue is related to DDN-3559 [ DDN-3559 ]
            adilger Andreas Dilger made changes -
            Link New: This issue is related to DDN-2844 [ DDN-2844 ]

            ashehata , just FYI, Andreas' most recent comment is a much clearer statement of my question.

            paf0186 Patrick Farrell added a comment - ashehata , just FYI, Andreas' most recent comment is a much clearer statement of my question.

            I actually asked the same question - not stated as clearly - on https://jira.whamcloud.com/browse/LU-13805 

            paf0186 Patrick Farrell added a comment - I actually asked the same question - not stated as clearly - on https://jira.whamcloud.com/browse/LU-13805  
            adilger Andreas Dilger made changes -
            Link New: This issue is related to EX-4347 [ EX-4347 ]
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-13802 [ LU-13802 ]

            Chris, Sereguei,
            we were having a discussion related to LU-13802 (improving buffered read/write efficiency) about whether it is possible to have LNet do RDMA from "very" unaligned buffers on the client (i.e. page-relative memory offset does not match file-relative offset) into page+block-aligned buffers on the server?

            For example, if an application allocates a 1MB buffer in userspace today with glibc malloc(), it is only guaranteed to be aligned on the word size (i.e. 8 bytes). If the client tries to write this unaligned 1MB buffer to a 1MB file-aligned offset, the kernel has to copy all of the data into aligned kernel page cache and then send those page cache pages to LNet for RDMA.

            It would be ideal for large read/write operations if the client LNet could RDMA the unaligned userspace buffer directly into aligned server pages with O_DIRECT, but I don't know if this is a capability that LNet and/or IB/RoCE have, or they require the source/target page alignment to be the same? If this isn't possible, that is totally fine, and we are looking into other solutions to improve performance here, but when I saw this patch recently I just wanted to make sure that there isn't some easy "of course the data does not need to be page aligned" solution that we are missing.

            adilger Andreas Dilger added a comment - Chris, Sereguei, we were having a discussion related to LU-13802 (improving buffered read/write efficiency) about whether it is possible to have LNet do RDMA from "very" unaligned buffers on the client (i.e. page-relative memory offset does not match file-relative offset) into page+block-aligned buffers on the server? For example, if an application allocates a 1MB buffer in userspace today with glibc malloc() , it is only guaranteed to be aligned on the word size (i.e. 8 bytes). If the client tries to write this unaligned 1MB buffer to a 1MB file-aligned offset, the kernel has to copy all of the data into aligned kernel page cache and then send those page cache pages to LNet for RDMA. It would be ideal for large read/write operations if the client LNet could RDMA the unaligned userspace buffer directly into aligned server pages with O_DIRECT , but I don't know if this is a capability that LNet and/or IB/RoCE have, or they require the source/target page alignment to be the same? If this isn't possible, that is totally fine, and we are looking into other solutions to improve performance here, but when I saw this patch recently I just wanted to make sure that there isn't some easy "of course the data does not need to be page aligned" solution that we are missing.
            pjones Peter Jones made changes -
            Link New: This issue is related to DDN-2683 [ DDN-2683 ]
            pjones Peter Jones made changes -
            Resolution New: Fixed [ 1 ]
            Status Original: Open [ 1 ] New: Resolved [ 5 ]
            pjones Peter Jones added a comment -

            Landed for 2.15

            pjones Peter Jones added a comment - Landed for 2.15

            People

              hornc Chris Horn
              hornc Chris Horn
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: