In Lustre, we allow the first fragment in an IOV-based message to be non-page aligned. When we set up the scatter/gather list we properly set the address and page_offset to reflect the non-page alignment.
When we assign a remote address for RDMA purposes, the current code masks the address so it is page aligned. When the page aligned address does not match the address in the scatter/gather list, the mlx5 driver under MOFED 4 is rejecting the IB_RDMA_WRITE operation by doing a "dump_cqe" error message.
That is the main problem to be fixed. However, the code which was doing the masking for page alignment is wrong. Here is the line of code in the routine kiblnd_fmr_map_tx() which is doing the masking incorrectly:
The "~" should not be there. We were setting the rf_addr to the page offset. When pages are aligned, rf_addr becomes zero and that is the remote_addr value we send to the other node. The fact that this works and does not break things sort of implies that the MOFED code is not using the remote_addr field of a IB_RDMA_WRITE work request.
In any case, we need to fix this in case some day some code does actually pay attention to this field.
The question to be answered here: should the remote address we generate be page aligned or not. When I stopped page aligning it, the dump_cqe error stopped and everything worked just fine.