Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10089

kiblnd_fmr_pool_map() Failed to map mr 10/11 elements

Details

    • 3
    • 9223372036854775807

    Description

      The following group of messages appear in the console logs of MDTs.

      2017-10-04 08:11:00 [407096.858161] LNetError: 174158:0:(o2iblnd.c:1893:kiblnd_fmr_pool_map()) Failed to map mr 10/11 elements
      2017-10-04 08:11:00 [407096.869697] LNetError: 174158:0:(o2iblnd_cb.c:590:kiblnd_fmr_map_tx()) Can't map 41033 pages: -22
      2017-10-04 08:11:00 [407096.880686] LNetError: 174158:0:(o2iblnd_cb.c:1582:kiblnd_send()) Can't setup GET sink for 172.19.1.112@o2ib100: -22
      2017-10-04 08:11:00 [407096.893504] LustreError: 174158:0:(events.c:449:server_bulk_callback()) event type 5, status -5, desc ffff883ebaa9bb00
      2017-10-04 08:12:40 [407196.901157] LustreError: 174158:0:(ldlm_lib.c:3186:target_bulk_io()) @@@ timeout on bulk WRITE after 100+0s  req@ffff883f27232850 x1579913603429696/t0(0) o1000->lquake-MDT0001-mdtlov_UUID@172.19.1.112@o2ib100:-1/-1 lens 352/0 e 0 to 0 dl 1507130003 ref 1 fl Interpret:/0/ffffffff rc 0/-1
      

      The nodes have Mellanox ConnectX-4 IB adapters.

      Attachments

        Issue Links

          Activity

            [LU-10089] kiblnd_fmr_pool_map() Failed to map mr 10/11 elements
            mdiep Minh Diep added a comment -

            we don't need this in 2.10

            mdiep Minh Diep added a comment - we don't need this in 2.10

            Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/29771
            Subject: LU-10089 o2iblnd: use IB_MR_TYPE_SG_GAPS
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set: 1
            Commit: aacf8a650f495f50faf0135c6201ad0e446faf74

            gerrit Gerrit Updater added a comment - Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/29771 Subject: LU-10089 o2iblnd: use IB_MR_TYPE_SG_GAPS Project: fs/lustre-release Branch: b2_10 Current Patch Set: 1 Commit: aacf8a650f495f50faf0135c6201ad0e446faf74
            pjones Peter Jones added a comment -

            Landed for 2.11

            pjones Peter Jones added a comment - Landed for 2.11

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/29551/
            Subject: LU-10089 o2iblnd: use IB_MR_TYPE_SG_GAPS
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 1b609396e468949f2420f14fed5ebfc999366b62

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/29551/ Subject: LU-10089 o2iblnd: use IB_MR_TYPE_SG_GAPS Project: fs/lustre-release Branch: master Current Patch Set: Commit: 1b609396e468949f2420f14fed5ebfc999366b62

            I updated https://review.whamcloud.com/#/c/29551/ to address the comments.

            One thing to note, if you're using OPA you should use map-on-demand set to 256. I'm still analyzing this issue and hopefully will have a patch soon. This issue is tracked under LU-10129

            ashehata Amir Shehata (Inactive) added a comment - I updated https://review.whamcloud.com/#/c/29551/ to address the comments. One thing to note, if you're using OPA you should use map-on-demand set to 256. I'm still analyzing this issue and hopefully will have a patch soon. This issue is tracked under LU-10129

            Their is a question about querying the IB device to see if it supports IB_MR_TYPE_SG_GAPS instead of assuming that IB_MR_TYPE_SG_GAPS is always the case.

            simmonsja James A Simmons added a comment - Their is a question about querying the IB device to see if it supports IB_MR_TYPE_SG_GAPS instead of assuming that IB_MR_TYPE_SG_GAPS is always the case.
            ofaaland Olaf Faaland added a comment -

            Hi Amir,

            I see https://review.whamcloud.com/#/c/29551/ has status "Ready to land", but hasn't been landed.  Is there further work needed, or is it just waiting for the next time a set of patches get merged?  Thanks.

            ofaaland Olaf Faaland added a comment - Hi Amir, I see https://review.whamcloud.com/#/c/29551/  has status "Ready to land", but hasn't been landed.  Is there further work needed, or is it just waiting for the next time a set of patches get merged?  Thanks.
            ofaaland Olaf Faaland added a comment -

            This appears to be working well in my tests.

            ofaaland Olaf Faaland added a comment - This appears to be working well in my tests.
            ashehata Amir Shehata (Inactive) added a comment - - edited

            Some notes:
            1. Fastreg with MLX4 + OFED + ib_alloc_mr(.., IB_MR_TYPE_MEM_REG): gets BIND_ERR
            2. Fastreg with MLX4 + MOFED 4.1 + ib_alloc_mr(.., IB_MR_TYPE_MEM_REG): works
            3. IB_MR_TYPE_SG_GAPS is not supported for MLX4 on either OFED or MOFED
            4. MLX4 FMR mapping works differently on MOFED vs OFED.

            So the best solution is to:
            1. Always use FMR for MLX4
            2. Always use FMR for OPA
            3. FMR is not available for MLX5 so fastreg will be used

            The three patches I described earlier seems like the ideal solution for now.

            ashehata Amir Shehata (Inactive) added a comment - - edited Some notes: 1. Fastreg with MLX4 + OFED + ib_alloc_mr(.., IB_MR_TYPE_MEM_REG): gets BIND_ERR 2. Fastreg with MLX4 + MOFED 4.1 + ib_alloc_mr(.., IB_MR_TYPE_MEM_REG): works 3. IB_MR_TYPE_SG_GAPS is not supported for MLX4 on either OFED or MOFED 4. MLX4 FMR mapping works differently on MOFED vs OFED. https://review.whamcloud.com/29290 is needed for OFED but not needed for MOFED (although it doesn't seem to hurt) So the best solution is to: 1. Always use FMR for MLX4 2. Always use FMR for OPA 3. FMR is not available for MLX5 so fastreg will be used The three patches I described earlier seems like the ideal solution for now.
            ofaaland Olaf Faaland added a comment -

            Note that we cannot land these patches to our production tree until they are through your review and testing process, and are merged to master at a minimum.  Let me know if there's anything I can do to help that along.

            ofaaland Olaf Faaland added a comment - Note that we cannot land these patches to our production tree until they are through your review and testing process, and are merged to master at a minimum.  Let me know if there's anything I can do to help that along.

            People

              ashehata Amir Shehata (Inactive)
              ofaaland Olaf Faaland
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: