[LU-13181] kiblnd_fmr_pool_map error on the AARCH64 with 64k pages Created: 31/Jan/20 Updated: 16/Feb/21 Resolved: 16/Jun/20 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.0, Lustre 2.13.0, Lustre 2.14.0 |
| Fix Version/s: | Lustre 2.14.0, Lustre 2.12.7 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Alexey Lyashkov | Assignee: | Alexey Lyashkov |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
AARCH64 client don't able to do any bulk transfers with error. [ 339.806240] LNetError: 9537:0:(o2iblnd.c:1926:kiblnd_fmr_pool_map()) Failed to map mr 1/16 elements [ 339.806243] LNetError: 9535:0:(o2iblnd.c:1926:kiblnd_fmr_pool_map()) Failed to map mr 1/16 elements [ 339.806249] LNetError: 9538:0:(o2iblnd_cb.c:613:kiblnd_fmr_map_tx()) Can't map 1048576 pages: -22 [ 339.806251] LNetError: 9535:0:(o2iblnd.c:1926:kiblnd_fmr_pool_map()) Skipped 1 previous similar message [ 339.806255] LNetError: 9536:0:(o2iblnd_cb.c:1841:kiblnd_reply()) Can't setup GET src for 10.149.4.6@o2ib: -22 tracing say an interested info. kiblnd_sd_03_00-9535 [044] d... 488.776602: p_mlx5_set_page_0: (mlx5_set_page+0x0/0x60 [mlx5_ib]) arg1=0xffff8089529a2c00 arg2=0x65260000 kiblnd_sd_03_00-9535 [044] d... 488.776604: r_mlx5_set_page_0: (ib_sg_to_pages+0xc4/0x1b8 [ib_core] <- mlx5_set_page) arg1=0x0 kiblnd_sd_03_00-9535 [044] d... 488.776605: p_mlx5_set_page_0: (mlx5_set_page+0x0/0x60 [mlx5_ib]) arg1=0xffff8089529a2c00 arg2=0x65270000 kiblnd_sd_03_00-9535 [044] d... 488.776607: r_mlx5_set_page_0: (ib_sg_to_pages+0xc4/0x1b8 [ib_core] <- mlx5_set_page) arg1=0x0 kiblnd_sd_03_00-9535 [044] d... 488.776608: p_mlx5_set_page_0: (mlx5_set_page+0x0/0x60 [mlx5_ib]) arg1=0xffff8089529a2c00 arg2=0x65280000 kiblnd_sd_03_00-9535 [044] d... 488.776609: r_mlx5_set_page_0: (ib_sg_to_pages+0xc4/0x1b8 [ib_core] <- mlx5_set_page) arg1=0x0 kiblnd_sd_03_00-9535 [044] d... 488.776610: p_mlx5_set_page_0: (mlx5_set_page+0x0/0x60 [mlx5_ib]) arg1=0xffff8089529a2c00 arg2=0x65290000 kiblnd_sd_03_00-9535 [044] d... 488.776612: r_mlx5_set_page_0: (ib_sg_to_pages+0xc4/0x1b8 [ib_core] <- mlx5_set_page) arg1=0x0 kiblnd_sd_03_00-9535 [044] d... 488.776613: p_mlx5_set_page_0: (mlx5_set_page+0x0/0x60 [mlx5_ib]) arg1=0xffff8089529a2c00 arg2=0x652a0000 kiblnd_sd_03_00-9535 [044] d... 488.776614: r_mlx5_set_page_0: (ib_sg_to_pages+0xc4/0x1b8 [ib_core] <- mlx5_set_page) arg1=0x0 kiblnd_sd_03_00-9535 [044] d... 488.776615: p_mlx5_set_page_0: (mlx5_set_page+0x0/0x60 [mlx5_ib]) arg1=0xffff8089529a2c00 arg2=0x652b0000 kiblnd_sd_03_00-9535 [044] d... 488.776617: r_mlx5_set_page_0: (ib_sg_to_pages+0xc4/0x1b8 [ib_core] <- mlx5_set_page) arg1=0x0 kiblnd_sd_03_00-9535 [044] d... 488.776618: p_mlx5_set_page_0: (mlx5_set_page+0x0/0x60 [mlx5_ib]) arg1=0xffff8089529a2c00 arg2=0x652c0000 kiblnd_sd_03_00-9535 [044] d... 488.776620: r_mlx5_set_page_0: (ib_sg_to_pages+0xc4/0x1b8 [ib_core] <- mlx5_set_page) arg1=0x0 kiblnd_sd_03_00-9535 [044] d... 488.776621: p_mlx5_set_page_0: (mlx5_set_page+0x0/0x60 [mlx5_ib]) arg1=0xffff8089529a2c00 arg2=0x652d0000 kiblnd_sd_03_00-9535 [044] d... 488.776622: r_mlx5_set_page_0: (ib_sg_to_pages+0xc4/0x1b8 [ib_core] <- mlx5_set_page) arg1=0x0 kiblnd_sd_03_00-9535 [044] d... 488.776623: p_mlx5_set_page_0: (mlx5_set_page+0x0/0x60 [mlx5_ib]) arg1=0xffff8089529a2c00 arg2=0x652e0000 kiblnd_sd_03_00-9535 [044] d... 488.776625: r_mlx5_set_page_0: (ib_sg_to_pages+0xc4/0x1b8 [ib_core] <- mlx5_set_page) arg1=0x0 kiblnd_sd_03_00-9535 [044] d... 488.776626: p_mlx5_set_page_0: (mlx5_set_page+0x0/0x60 [mlx5_ib]) arg1=0xffff8089529a2c00 arg2=0x652f0000 kiblnd_sd_03_00-9535 [044] d... 488.776627: r_mlx5_set_page_0: (ib_sg_to_pages+0xc4/0x1b8 [ib_core] <- mlx5_set_page) arg1=0x0 kiblnd_sd_03_00-9535 [044] d... 488.776628: r_ib_sg_to_pages_0: (mlx5_ib_map_mr_sg+0x8c/0x240 [mlx5_ib] <- ib_sg_to_pages) arg1=0x1 arg2=0xffff00001906f7f0 Obtaining an scaterlist info struct scatterlist {
page_link = 0xffff7fe0245c3600,
offset = 0x0,
length = 0x10000,
dma_address = 0x80500000,
dma_length = 0x100000
}
so DMA length covers a while 1Mb transfer as single entry. It's mean - ib_dma_map_sg have merge all pages to the single region - this return stored in the rd->rd_nfrags, but it have checked against tx->tx_nfrags which hold a number fragments before mapping, this incorrect check generates a false error and transfers is stopped. |
| Comments |
| Comment by Gerrit Updater [ 31/Jan/20 ] |
|
Alexey Lyashkov (c17817@cray.com) uploaded a new patch: https://review.whamcloud.com/37388 |
| Comment by Gerrit Updater [ 16/Jun/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37388/ |
| Comment by Peter Jones [ 16/Jun/20 ] |
|
Landed for 2.14 |
| Comment by Gerrit Updater [ 23/Jan/21 ] |
|
Jian Yu (yujian@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41303 |
| Comment by Gerrit Updater [ 16/Feb/21 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/41303/ |