[LU-9500] MOFED 4/mlx5: Aligning non-aligned page addresses trigger dump_cqe Created: 13/May/17 Updated: 01/Sep/20 Resolved: 22/Jul/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.10.1, Lustre 2.11.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Doug Oucharek (Inactive) | Assignee: | Sonia Sharma (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
||||||||||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||||||||||||||||||
| Description |
|
In Lustre, we allow the first fragment in an IOV-based message to be non-page aligned. When we set up the scatter/gather list we properly set the address and page_offset to reflect the non-page alignment. When we assign a remote address for RDMA purposes, the current code masks the address so it is page aligned. When the page aligned address does not match the address in the scatter/gather list, the mlx5 driver under MOFED 4 is rejecting the IB_RDMA_WRITE operation by doing a "dump_cqe" error message. That is the main problem to be fixed. However, the code which was doing the masking for page alignment is wrong. Here is the line of code in the routine kiblnd_fmr_map_tx() which is doing the masking incorrectly: rd->rd_frags[0].rf_addr &= ~hdev->ibh_page_mask; The "~" should not be there. We were setting the rf_addr to the page offset. When pages are aligned, rf_addr becomes zero and that is the remote_addr value we send to the other node. The fact that this works and does not break things sort of implies that the MOFED code is not using the remote_addr field of a IB_RDMA_WRITE work request. In any case, we need to fix this in case some day some code does actually pay attention to this field. The question to be answered here: should the remote address we generate be page aligned or not. When I stopped page aligning it, the dump_cqe error stopped and everything worked just fine. |
| Comments |
| Comment by Gerrit Updater [ 16/May/17 ] |
|
Doug Oucharek (doug.s.oucharek@intel.com) uploaded a new patch: https://review.whamcloud.com/27149 |
| Comment by Alexey Lyashkov [ 24/May/17 ] |
|
Doug, patch looks fine for me. But looks we need to have same for other memory registration modes. But I will be like to ask Jay to review a CLIO code to avoid unaligned address using. |
| Comment by James A Simmons [ 05/Jun/17 ] |
|
Hi Doug. So I tested on our RHEL7 with default OFED using mlx4 driver and the latest patch worked. I need to test it on a few configurations. I have: 1) SLES11 SP3 with OFED 311 stack using mlx4 hardware, maybe mlx5. Have to ask. 2) Power8 RHEL7.3 with MOFED 3.3 with mlx5 hardware 3) Power8 RHEL7.3 with MOFED 4.X with mlx5 hardware (needs to be set up) I will let you know the results. |
| Comment by Gerrit Updater [ 22/Jul/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27149/ |
| Comment by Minh Diep [ 22/Jul/17 ] |
|
landed in lustre 2.11.0 |
| Comment by Gerrit Updater [ 26/Jul/17 ] |
|
Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/28237 |
| Comment by Gerrit Updater [ 07/Aug/17 ] |
|
John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/28237/ |
| Comment by Doug Oucharek (Inactive) [ 17/Aug/17 ] |
|
Has this been pushed upstream yet? |
| Comment by James A Simmons [ 17/Aug/17 ] |
|
Not yet. |