Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9472

FastReg (MLX5) support breaks when map_on_demand > 0

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Critical Critical
    • Lustre 2.10.0, Upstream
    • Lustre 2.10.0, Upstream
    • 3
    • 9223372036854775807

      When building against MODED 4, the default for map_on_demand switches from 0 to 256.  This is breaking MLX5-based cards which make use of the FastReg support in ko2iblnd.  There are three problems with FastReg which need to be fixed:

      1. In kiblnd_fmr_pool_map() when using elements from the fpo_pool_list, if the list runs out, the current code is setting rc to -EBUSY when it should be -EAGAIN.  EAGAIN triggers the pool to be made bigger.  EBUSY just fails the transfer and connection (not what we want).
      2. Even after I fix the setting of rc in number 1, bringing down the network via "lctl network down" trips this assert: 
        [ 1172.255552] LNetError: 10176:0:(o2iblnd.c:1421:kiblnd_destroy_fmr_pool()) ASSERTION( fpo->fpo_map_count == 0 ) failed: 
      1. Every time the pool size is increased, I keep seeing this annoying log (with neterror on): 
        May  9 00:22:26 trevis-407 kernel: LNet: Using FastReg for registration

      The first 2 items are blockers and must be fixed ASAP.  The 3rd might as well be addressed at the same time.

       

            doug Doug Oucharek (Inactive)
            doug Doug Oucharek (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            13 Start watching this issue

              Created:
              Updated:
              Resolved: