Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9472

FastReg (MLX5) support breaks when map_on_demand > 0

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.10.0, Upstream
    • Lustre 2.10.0, Upstream
    • 3
    • 9223372036854775807

    Description

      When building against MODED 4, the default for map_on_demand switches from 0 to 256.  This is breaking MLX5-based cards which make use of the FastReg support in ko2iblnd.  There are three problems with FastReg which need to be fixed:

      1. In kiblnd_fmr_pool_map() when using elements from the fpo_pool_list, if the list runs out, the current code is setting rc to -EBUSY when it should be -EAGAIN.  EAGAIN triggers the pool to be made bigger.  EBUSY just fails the transfer and connection (not what we want).
      2. Even after I fix the setting of rc in number 1, bringing down the network via "lctl network down" trips this assert: 
        [ 1172.255552] LNetError: 10176:0:(o2iblnd.c:1421:kiblnd_destroy_fmr_pool()) ASSERTION( fpo->fpo_map_count == 0 ) failed: 
      1. Every time the pool size is increased, I keep seeing this annoying log (with neterror on): 
        May  9 00:22:26 trevis-407 kernel: LNet: Using FastReg for registration

      The first 2 items are blockers and must be fixed ASAP.  The 3rd might as well be addressed at the same time.

       

      Attachments

        Issue Links

          Activity

            People

              doug Doug Oucharek (Inactive)
              doug Doug Oucharek (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: