Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9472

FastReg (MLX5) support breaks when map_on_demand > 0

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.10.0, Upstream
    • Lustre 2.10.0, Upstream
    • 3
    • 9223372036854775807

    Description

      When building against MODED 4, the default for map_on_demand switches from 0 to 256.  This is breaking MLX5-based cards which make use of the FastReg support in ko2iblnd.  There are three problems with FastReg which need to be fixed:

      1. In kiblnd_fmr_pool_map() when using elements from the fpo_pool_list, if the list runs out, the current code is setting rc to -EBUSY when it should be -EAGAIN.  EAGAIN triggers the pool to be made bigger.  EBUSY just fails the transfer and connection (not what we want).
      2. Even after I fix the setting of rc in number 1, bringing down the network via "lctl network down" trips this assert: 
        [ 1172.255552] LNetError: 10176:0:(o2iblnd.c:1421:kiblnd_destroy_fmr_pool()) ASSERTION( fpo->fpo_map_count == 0 ) failed: 
      1. Every time the pool size is increased, I keep seeing this annoying log (with neterror on): 
        May  9 00:22:26 trevis-407 kernel: LNet: Using FastReg for registration

      The first 2 items are blockers and must be fixed ASAP.  The 3rd might as well be addressed at the same time.

       

      Attachments

        Issue Links

          Activity

            [LU-9472] FastReg (MLX5) support breaks when map_on_demand > 0

            Not yet.

            simmonsja James A Simmons added a comment - Not yet.

            Has this been pushed upstream yet?

            dougo Doug Oucharek (Inactive) added a comment - Has this been pushed upstream yet?
            pjones Peter Jones added a comment -

            Landed for 2.10

            pjones Peter Jones added a comment - Landed for 2.10

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27015/
            Subject: LU-9472 lnd: Fix FastReg map/unmap for MLX5
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: b436c75d9488222190de8b30f56d720f8ec63d6f

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27015/ Subject: LU-9472 lnd: Fix FastReg map/unmap for MLX5 Project: fs/lustre-release Branch: master Current Patch Set: Commit: b436c75d9488222190de8b30f56d720f8ec63d6f

            Tested fix and it work for me.

            shadow Alexey Lyashkov added a comment - Tested fix and it work for me.

            Doug Oucharek (doug.s.oucharek@intel.com) uploaded a new patch: https://review.whamcloud.com/27015
            Subject: LU-9472 lnd: Fix FastReg map/unmap for MLX5
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: a6a1d45a72360b5cc7e9e3a65c7456fa62c19192

            gerrit Gerrit Updater added a comment - Doug Oucharek (doug.s.oucharek@intel.com) uploaded a new patch: https://review.whamcloud.com/27015 Subject: LU-9472 lnd: Fix FastReg map/unmap for MLX5 Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: a6a1d45a72360b5cc7e9e3a65c7456fa62c19192

            Thanks for finding this.

            simmonsja James A Simmons added a comment - Thanks for finding this.

            People

              doug Doug Oucharek (Inactive)
              doug Doug Oucharek (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: