Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Lustre 2.10.0, Upstream
-
3
-
9223372036854775807
Description
When building against MODED 4, the default for map_on_demand switches from 0 to 256. This is breaking MLX5-based cards which make use of the FastReg support in ko2iblnd. There are three problems with FastReg which need to be fixed:
- In kiblnd_fmr_pool_map() when using elements from the fpo_pool_list, if the list runs out, the current code is setting rc to -EBUSY when it should be -EAGAIN. EAGAIN triggers the pool to be made bigger. EBUSY just fails the transfer and connection (not what we want).
- Even after I fix the setting of rc in number 1, bringing down the network via "lctl network down" trips this assert:
[ 1172.255552] LNetError: 10176:0:(o2iblnd.c:1421:kiblnd_destroy_fmr_pool()) ASSERTION( fpo->fpo_map_count == 0 ) failed:
- Every time the pool size is increased, I keep seeing this annoying log (with neterror on):
May 9 00:22:26 trevis-407 kernel: LNet: Using FastReg for registration
The first 2 items are blockers and must be fixed ASAP. The 3rd might as well be addressed at the same time.