-
Bug
-
Resolution: Fixed
-
Blocker
-
Lustre 2.18.0
-
None
-
3
-
9223372036854775807
The conversion of the LNet layer to struct iov_iter changed the
LNET_MSG_GET case of kiblnd_send() to describe the RDMA GET sink
from the outbound payload (payload_niov/payload_kiov/payload_nob)
instead of the attached memory descriptor (msg_md). For a GET the
initiator sends no payload: lnet_prep_send() is called as
lnet_prep_send(msg, LNET_MSG_GET, target, 0, 0), so msg_len, and
therefore payload_nob, is always 0. The sink length lives in
msg_md->md_length.
As a result kiblnd_setup_rd_kiov() is now invoked with nob == 0 and
trips its first assertion:
LNetError: (o2iblnd_cb.c:704:kiblnd_setup_rd_kiov()) ASSERTION( nob > 0 ) failed
The IMMEDIATE-vs-RDMA size check immediately above still correctly
uses msg_md->md_length, so small GET replies take the IMMEDIATE path
and escape. Only GET sinks large enough to require RDMA reach
kiblnd_setup_rd_kiov() and hit the assert. This covers every
server-side BULK_GET_SINK transfer over o2ib – client writes and
target_bulk_io() bulk pulls (e.g. the MDS_BATCH buffer pulled by
mdt_batch() during batched statahead) – so a large write or an
mdtest stat phase reliably LBUGs the target.
Restore the GET sink description to come from msg_md, matching the
adjacent size check and the pre-conversion behaviour.