Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
Lustre 2.12.0, Lustre 2.13.0, Lustre 2.14.0
-
None
-
3
-
9223372036854775807
Description
AARCH64 client don't able to do any bulk transfers with error.
[ 339.806240] LNetError: 9537:0:(o2iblnd.c:1926:kiblnd_fmr_pool_map()) Failed to map mr 1/16 elements [ 339.806243] LNetError: 9535:0:(o2iblnd.c:1926:kiblnd_fmr_pool_map()) Failed to map mr 1/16 elements [ 339.806249] LNetError: 9538:0:(o2iblnd_cb.c:613:kiblnd_fmr_map_tx()) Can't map 1048576 pages: -22 [ 339.806251] LNetError: 9535:0:(o2iblnd.c:1926:kiblnd_fmr_pool_map()) Skipped 1 previous similar message [ 339.806255] LNetError: 9536:0:(o2iblnd_cb.c:1841:kiblnd_reply()) Can't setup GET src for 10.149.4.6@o2ib: -22
tracing say an interested info.
kiblnd_sd_03_00-9535 [044] d... 488.776602: p_mlx5_set_page_0: (mlx5_set_page+0x0/0x60 [mlx5_ib]) arg1=0xffff8089529a2c00 arg2=0x65260000 kiblnd_sd_03_00-9535 [044] d... 488.776604: r_mlx5_set_page_0: (ib_sg_to_pages+0xc4/0x1b8 [ib_core] <- mlx5_set_page) arg1=0x0 kiblnd_sd_03_00-9535 [044] d... 488.776605: p_mlx5_set_page_0: (mlx5_set_page+0x0/0x60 [mlx5_ib]) arg1=0xffff8089529a2c00 arg2=0x65270000 kiblnd_sd_03_00-9535 [044] d... 488.776607: r_mlx5_set_page_0: (ib_sg_to_pages+0xc4/0x1b8 [ib_core] <- mlx5_set_page) arg1=0x0 kiblnd_sd_03_00-9535 [044] d... 488.776608: p_mlx5_set_page_0: (mlx5_set_page+0x0/0x60 [mlx5_ib]) arg1=0xffff8089529a2c00 arg2=0x65280000 kiblnd_sd_03_00-9535 [044] d... 488.776609: r_mlx5_set_page_0: (ib_sg_to_pages+0xc4/0x1b8 [ib_core] <- mlx5_set_page) arg1=0x0 kiblnd_sd_03_00-9535 [044] d... 488.776610: p_mlx5_set_page_0: (mlx5_set_page+0x0/0x60 [mlx5_ib]) arg1=0xffff8089529a2c00 arg2=0x65290000 kiblnd_sd_03_00-9535 [044] d... 488.776612: r_mlx5_set_page_0: (ib_sg_to_pages+0xc4/0x1b8 [ib_core] <- mlx5_set_page) arg1=0x0 kiblnd_sd_03_00-9535 [044] d... 488.776613: p_mlx5_set_page_0: (mlx5_set_page+0x0/0x60 [mlx5_ib]) arg1=0xffff8089529a2c00 arg2=0x652a0000 kiblnd_sd_03_00-9535 [044] d... 488.776614: r_mlx5_set_page_0: (ib_sg_to_pages+0xc4/0x1b8 [ib_core] <- mlx5_set_page) arg1=0x0 kiblnd_sd_03_00-9535 [044] d... 488.776615: p_mlx5_set_page_0: (mlx5_set_page+0x0/0x60 [mlx5_ib]) arg1=0xffff8089529a2c00 arg2=0x652b0000 kiblnd_sd_03_00-9535 [044] d... 488.776617: r_mlx5_set_page_0: (ib_sg_to_pages+0xc4/0x1b8 [ib_core] <- mlx5_set_page) arg1=0x0 kiblnd_sd_03_00-9535 [044] d... 488.776618: p_mlx5_set_page_0: (mlx5_set_page+0x0/0x60 [mlx5_ib]) arg1=0xffff8089529a2c00 arg2=0x652c0000 kiblnd_sd_03_00-9535 [044] d... 488.776620: r_mlx5_set_page_0: (ib_sg_to_pages+0xc4/0x1b8 [ib_core] <- mlx5_set_page) arg1=0x0 kiblnd_sd_03_00-9535 [044] d... 488.776621: p_mlx5_set_page_0: (mlx5_set_page+0x0/0x60 [mlx5_ib]) arg1=0xffff8089529a2c00 arg2=0x652d0000 kiblnd_sd_03_00-9535 [044] d... 488.776622: r_mlx5_set_page_0: (ib_sg_to_pages+0xc4/0x1b8 [ib_core] <- mlx5_set_page) arg1=0x0 kiblnd_sd_03_00-9535 [044] d... 488.776623: p_mlx5_set_page_0: (mlx5_set_page+0x0/0x60 [mlx5_ib]) arg1=0xffff8089529a2c00 arg2=0x652e0000 kiblnd_sd_03_00-9535 [044] d... 488.776625: r_mlx5_set_page_0: (ib_sg_to_pages+0xc4/0x1b8 [ib_core] <- mlx5_set_page) arg1=0x0 kiblnd_sd_03_00-9535 [044] d... 488.776626: p_mlx5_set_page_0: (mlx5_set_page+0x0/0x60 [mlx5_ib]) arg1=0xffff8089529a2c00 arg2=0x652f0000 kiblnd_sd_03_00-9535 [044] d... 488.776627: r_mlx5_set_page_0: (ib_sg_to_pages+0xc4/0x1b8 [ib_core] <- mlx5_set_page) arg1=0x0 kiblnd_sd_03_00-9535 [044] d... 488.776628: r_ib_sg_to_pages_0: (mlx5_ib_map_mr_sg+0x8c/0x240 [mlx5_ib] <- ib_sg_to_pages) arg1=0x1 arg2=0xffff00001906f7f0
Obtaining an scaterlist info
struct scatterlist { page_link = 0xffff7fe0245c3600, offset = 0x0, length = 0x10000, dma_address = 0x80500000, dma_length = 0x100000 }
so DMA length covers a while 1Mb transfer as single entry.
It's mean - ib_dma_map_sg have merge all pages to the single region - this return stored in the rd->rd_nfrags, but it have checked against tx->tx_nfrags which hold a number fragments before mapping, this incorrect check generates a false error and transfers is stopped.
Attachments
Issue Links
- is related to
-
LU-10157 LNET_MAX_IOV hard coded to 256
- Resolved