Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13651

Conditionally skip finding compatible HSM requests

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Fixed
    • Major
    • Lustre 2.14.0
    • None
    • None
    • 9223372036854775807

    Description

      The HSM action queue is scanned linearly in hsm_find_compatible_cb() for existing requests on the same file so that duplicate or conflicting requests are not added and cancel requests are assigned the correct cookie, but this can cause a large delay in adding new requests when the action queue is very large, as access to it is locked for the duration of the search. Scanning the queue does not guarantee that duplicate or conflicting requests are not added as scanning (in hsm_find_compatible_cb()) and adding requests (in mdt_agent_record_add()) are distinct operations that are not serialized by a lock and so a race window exists between these two function calls within which duplicate or conflicting requests can be added. This is hopefully not a big problem though, as the CDT thread will not send duplicate archive requests to a copytool serving a different HSM backend (and we could probably prevent it from sending duplicate archive requests to a copytool serving the same backend if we made a small change in mdt_hsm_is_action_compat()) and duplicate restore requests are serialized by taking the layout lock on the file before being added to the action queue which effectively serializes them, afaik (although this blocks the caller, e.g. lfs, so I am not sure if it's ideal). Since calling hsm_find_compatible_cb() does not protect completely against this issue and can cause large delays in adding new requests, we skip calling it for all requests apart from cancel requests that don't specify a cookie (which should be all cancel requests in current code), hopefully safely.

      Attachments

        Issue Links

          Activity

            People

              nangelinas Nikitas Angelinas
              nangelinas Nikitas Angelinas
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: