[LU-15092] Fix logic for unaligned transfer with o2iblnd Created: 12/Oct/21  Updated: 23/Jan/23  Resolved: 20/Nov/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.15.0

Type: Bug Priority: Minor
Reporter: Chris Horn Assignee: Chris Horn
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-13802 New i/o path: Buffered i/o as DIO Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

It's possible for there to be an offset for the first page of a
transfer. However, there are two bugs with this code in o2iblnd.

The first is that this use-case will require LNET_MAX_IOV + 1 local
RDMA fragments, but we do not specify the correct corresponding values
for the max page list to ib_alloc_fast_reg_page_list(),
ib_alloc_fast_reg_mr(), etc.

The second issue is that the logic in kiblnd_setup_rd_iov() and
kiblnd_setup_rd_kiov() attempts to obtain one more scatterlist entry
than is actually needed. This causes the transfer to fail with -EFAULT.



 Comments   
Comment by Gerrit Updater [ 12/Oct/21 ]

"Chris Horn <chris.horn@hpe.com>" uploaded a new patch: https://review.whamcloud.com/45216
Subject: LU-15092 o2iblnd: Fix logic for unaligned transfer
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 40786a123e9d6f6142934c1135ad32015b36368b

Comment by Gerrit Updater [ 20/Nov/21 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/45216/
Subject: LU-15092 o2iblnd: Fix logic for unaligned transfer
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 23a2c92f203ff2f39bcc083e6b6220968c17b475

Comment by Peter Jones [ 20/Nov/21 ]

Landed for 2.15

Comment by Andreas Dilger [ 04/Feb/22 ]

Chris, Sereguei,
we were having a discussion related to LU-13802 (improving buffered read/write efficiency) about whether it is possible to have LNet do RDMA from "very" unaligned buffers on the client (i.e. page-relative memory offset does not match file-relative offset) into page+block-aligned buffers on the server?

For example, if an application allocates a 1MB buffer in userspace today with glibc malloc(), it is only guaranteed to be aligned on the word size (i.e. 8 bytes). If the client tries to write this unaligned 1MB buffer to a 1MB file-aligned offset, the kernel has to copy all of the data into aligned kernel page cache and then send those page cache pages to LNet for RDMA.

It would be ideal for large read/write operations if the client LNet could RDMA the unaligned userspace buffer directly into aligned server pages with O_DIRECT, but I don't know if this is a capability that LNet and/or IB/RoCE have, or they require the source/target page alignment to be the same? If this isn't possible, that is totally fine, and we are looking into other solutions to improve performance here, but when I saw this patch recently I just wanted to make sure that there isn't some easy "of course the data does not need to be page aligned" solution that we are missing.

Comment by Patrick Farrell [ 04/Feb/22 ]

I actually asked the same question - not stated as clearly - on https://jira.whamcloud.com/browse/LU-13805 

Comment by Patrick Farrell [ 04/Feb/22 ]

ashehata , just FYI, Andreas' most recent comment is a much clearer statement of my question.

Generated at Sat Feb 10 03:15:22 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.