Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9472

FastReg (MLX5) support breaks when map_on_demand > 0

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: Lustre 2.10.0, Upstream
    • Fix Version/s: Lustre 2.10.0, Upstream
    • Labels:
    • Severity:
      3
    • Rank (Obsolete):
      9223372036854775807

      Description

      When building against MODED 4, the default for map_on_demand switches from 0 to 256.  This is breaking MLX5-based cards which make use of the FastReg support in ko2iblnd.  There are three problems with FastReg which need to be fixed:

      1. In kiblnd_fmr_pool_map() when using elements from the fpo_pool_list, if the list runs out, the current code is setting rc to -EBUSY when it should be -EAGAIN.  EAGAIN triggers the pool to be made bigger.  EBUSY just fails the transfer and connection (not what we want).
      2. Even after I fix the setting of rc in number 1, bringing down the network via "lctl network down" trips this assert: 
        [ 1172.255552] LNetError: 10176:0:(o2iblnd.c:1421:kiblnd_destroy_fmr_pool()) ASSERTION( fpo->fpo_map_count == 0 ) failed: 
      1. Every time the pool size is increased, I keep seeing this annoying log (with neterror on): 
        May  9 00:22:26 trevis-407 kernel: LNet: Using FastReg for registration

      The first 2 items are blockers and must be fixed ASAP.  The 3rd might as well be addressed at the same time.

       

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                doug Doug Oucharek (Inactive)
                Reporter:
                doug Doug Oucharek (Inactive)
              • Votes:
                0 Vote for this issue
                Watchers:
                13 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: