Details
-
Improvement
-
Resolution: Fixed
-
Major
-
Lustre 2.9.0
-
None
-
9223372036854775807
Description
In order to avoid thrashing the MGS during a bulk configuration update, the nodemap config clients should delay before retrying a config get.
When a nodemap is larger than a single RPC, clients need to use multiple RPCs to get the nodemap config. If the config changes between RPCs, the client needs to drop the config using the previous RPCs and restart the transfer. If there are many configuration changes occurring, it's possible that a config get could be restarted multiple times, causing unnecessary load. The config get clients should wait some time before restarting the transfer, to allow the server to finish updating its config.
It may be possible to re-enqueue the config lock to have the main MGC lock thread restart the transfer, which would add a random delay of between 5-10s.