Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8271

nodemap: retrying a large configuration transfer should have a delay

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Fixed
    • Major
    • Lustre 2.11.0
    • Lustre 2.9.0
    • None
    • 9223372036854775807

    Description

      In order to avoid thrashing the MGS during a bulk configuration update, the nodemap config clients should delay before retrying a config get.

      When a nodemap is larger than a single RPC, clients need to use multiple RPCs to get the nodemap config. If the config changes between RPCs, the client needs to drop the config using the previous RPCs and restart the transfer. If there are many configuration changes occurring, it's possible that a config get could be restarted multiple times, causing unnecessary load. The config get clients should wait some time before restarting the transfer, to allow the server to finish updating its config.

      It may be possible to re-enqueue the config lock to have the main MGC lock thread restart the transfer, which would add a random delay of between 5-10s.

      Attachments

        Activity

          People

            emoly.liu Emoly Liu
            kit.westneat Kit Westneat (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: