[LU-8271] nodemap: retrying a large configuration transfer should have a delay Created: 14/Jun/16 Updated: 04/Jan/18 Resolved: 04/Jan/18 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.9.0 |
| Fix Version/s: | Lustre 2.11.0 |
| Type: | Improvement | Priority: | Major |
| Reporter: | Kit Westneat | Assignee: | Emoly Liu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
In order to avoid thrashing the MGS during a bulk configuration update, the nodemap config clients should delay before retrying a config get. When a nodemap is larger than a single RPC, clients need to use multiple RPCs to get the nodemap config. If the config changes between RPCs, the client needs to drop the config using the previous RPCs and restart the transfer. If there are many configuration changes occurring, it's possible that a config get could be restarted multiple times, causing unnecessary load. The config get clients should wait some time before restarting the transfer, to allow the server to finish updating its config. It may be possible to re-enqueue the config lock to have the main MGC lock thread restart the transfer, which would add a random delay of between 5-10s. |
| Comments |
| Comment by Andreas Dilger [ 19/Apr/17 ] |
|
Kit, do you have any cycles to look into this? |
| Comment by Kit Westneat [ 21/Apr/17 ] |
|
Sure, I'll get a patch together. |
| Comment by Gerrit Updater [ 21/Apr/17 ] |
|
Kit Westneat (kit.westneat@gmail.com) uploaded a new patch: https://review.whamcloud.com/26781 |
| Comment by Peter Jones [ 15/Dec/17 ] |
|
Emoly Can you please follow up to get this patch landed? Thanks Peter |
| Comment by Gerrit Updater [ 04/Jan/18 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/26781/ |
| Comment by Peter Jones [ 04/Jan/18 ] |
|
Landed for 2.11 |