Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13181

kiblnd_fmr_pool_map error on the AARCH64 with 64k pages

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.14.0, Lustre 2.12.7
    • Lustre 2.12.0, Lustre 2.13.0, Lustre 2.14.0
    • None
    • 3
    • 9223372036854775807

    Description

      AARCH64 client don't able to do any bulk transfers with error.

      [  339.806240] LNetError: 9537:0:(o2iblnd.c:1926:kiblnd_fmr_pool_map()) Failed to map mr 1/16 elements
      [  339.806243] LNetError: 9535:0:(o2iblnd.c:1926:kiblnd_fmr_pool_map()) Failed to map mr 1/16 elements
      [  339.806249] LNetError: 9538:0:(o2iblnd_cb.c:613:kiblnd_fmr_map_tx()) Can't map 1048576 pages: -22
      [  339.806251] LNetError: 9535:0:(o2iblnd.c:1926:kiblnd_fmr_pool_map()) Skipped 1 previous similar message
      [  339.806255] LNetError: 9536:0:(o2iblnd_cb.c:1841:kiblnd_reply()) Can't setup GET src for 10.149.4.6@o2ib: -22
      

      tracing say an interested info.

      kiblnd_sd_03_00-9535  [044] d...   488.776602: p_mlx5_set_page_0: (mlx5_set_page+0x0/0x60 [mlx5_ib]) arg1=0xffff8089529a2c00 arg2=0x65260000
       kiblnd_sd_03_00-9535  [044] d...   488.776604: r_mlx5_set_page_0: (ib_sg_to_pages+0xc4/0x1b8 [ib_core] <- mlx5_set_page) arg1=0x0
       kiblnd_sd_03_00-9535  [044] d...   488.776605: p_mlx5_set_page_0: (mlx5_set_page+0x0/0x60 [mlx5_ib]) arg1=0xffff8089529a2c00 arg2=0x65270000
       kiblnd_sd_03_00-9535  [044] d...   488.776607: r_mlx5_set_page_0: (ib_sg_to_pages+0xc4/0x1b8 [ib_core] <- mlx5_set_page) arg1=0x0
       kiblnd_sd_03_00-9535  [044] d...   488.776608: p_mlx5_set_page_0: (mlx5_set_page+0x0/0x60 [mlx5_ib]) arg1=0xffff8089529a2c00 arg2=0x65280000
       kiblnd_sd_03_00-9535  [044] d...   488.776609: r_mlx5_set_page_0: (ib_sg_to_pages+0xc4/0x1b8 [ib_core] <- mlx5_set_page) arg1=0x0
       kiblnd_sd_03_00-9535  [044] d...   488.776610: p_mlx5_set_page_0: (mlx5_set_page+0x0/0x60 [mlx5_ib]) arg1=0xffff8089529a2c00 arg2=0x65290000
       kiblnd_sd_03_00-9535  [044] d...   488.776612: r_mlx5_set_page_0: (ib_sg_to_pages+0xc4/0x1b8 [ib_core] <- mlx5_set_page) arg1=0x0
       kiblnd_sd_03_00-9535  [044] d...   488.776613: p_mlx5_set_page_0: (mlx5_set_page+0x0/0x60 [mlx5_ib]) arg1=0xffff8089529a2c00 arg2=0x652a0000
       kiblnd_sd_03_00-9535  [044] d...   488.776614: r_mlx5_set_page_0: (ib_sg_to_pages+0xc4/0x1b8 [ib_core] <- mlx5_set_page) arg1=0x0
       kiblnd_sd_03_00-9535  [044] d...   488.776615: p_mlx5_set_page_0: (mlx5_set_page+0x0/0x60 [mlx5_ib]) arg1=0xffff8089529a2c00 arg2=0x652b0000
       kiblnd_sd_03_00-9535  [044] d...   488.776617: r_mlx5_set_page_0: (ib_sg_to_pages+0xc4/0x1b8 [ib_core] <- mlx5_set_page) arg1=0x0
       kiblnd_sd_03_00-9535  [044] d...   488.776618: p_mlx5_set_page_0: (mlx5_set_page+0x0/0x60 [mlx5_ib]) arg1=0xffff8089529a2c00 arg2=0x652c0000
       kiblnd_sd_03_00-9535  [044] d...   488.776620: r_mlx5_set_page_0: (ib_sg_to_pages+0xc4/0x1b8 [ib_core] <- mlx5_set_page) arg1=0x0
       kiblnd_sd_03_00-9535  [044] d...   488.776621: p_mlx5_set_page_0: (mlx5_set_page+0x0/0x60 [mlx5_ib]) arg1=0xffff8089529a2c00 arg2=0x652d0000
       kiblnd_sd_03_00-9535  [044] d...   488.776622: r_mlx5_set_page_0: (ib_sg_to_pages+0xc4/0x1b8 [ib_core] <- mlx5_set_page) arg1=0x0
       kiblnd_sd_03_00-9535  [044] d...   488.776623: p_mlx5_set_page_0: (mlx5_set_page+0x0/0x60 [mlx5_ib]) arg1=0xffff8089529a2c00 arg2=0x652e0000
       kiblnd_sd_03_00-9535  [044] d...   488.776625: r_mlx5_set_page_0: (ib_sg_to_pages+0xc4/0x1b8 [ib_core] <- mlx5_set_page) arg1=0x0
       kiblnd_sd_03_00-9535  [044] d...   488.776626: p_mlx5_set_page_0: (mlx5_set_page+0x0/0x60 [mlx5_ib]) arg1=0xffff8089529a2c00 arg2=0x652f0000
       kiblnd_sd_03_00-9535  [044] d...   488.776627: r_mlx5_set_page_0: (ib_sg_to_pages+0xc4/0x1b8 [ib_core] <- mlx5_set_page) arg1=0x0
       kiblnd_sd_03_00-9535  [044] d...   488.776628: r_ib_sg_to_pages_0: (mlx5_ib_map_mr_sg+0x8c/0x240 [mlx5_ib] <- ib_sg_to_pages) arg1=0x1 arg2=0xffff00001906f7f0

      Obtaining an scaterlist info

      struct scatterlist {
        page_link = 0xffff7fe0245c3600,
        offset = 0x0,
        length = 0x10000,
        dma_address = 0x80500000,
        dma_length = 0x100000
      }
      

      so DMA length covers a while 1Mb transfer as single entry.

      It's mean - ib_dma_map_sg have merge all pages to the single region - this return stored in the rd->rd_nfrags, but it have checked against tx->tx_nfrags which hold a number fragments before mapping, this incorrect check generates a false error and transfers is stopped.

      Attachments

        Issue Links

          Activity

            People

              shadow Alexey Lyashkov
              shadow Alexey Lyashkov
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: