Details

    • Technical task
    • Resolution: Fixed
    • Blocker
    • Lustre 2.5.0
    • Lustre 2.5.0
    • 10714

    Description

      In 23c197908902183d5f88d3f431da6cde9c290e07 LU-3811 hsm: handle file ownership and timestamps, I added a stat() of the file being restored to the CT's restore path. This is to ensure that the volatile file is given the correct ownership and timestamps before the restore, and is required for the layout swap to succeed. However this introduces a potential for deadlock vs unlink() and other operations. Consider the following sequence of operations on a single file:

      1. Client sends restore, CDT takes and holds EX LAYOUT lock.
      2. Client sends unlink, handler sleeps on EX FULL lock.
      3. CDT sends restore action to CT.
      4. CT begins restore, sends getattr (from stat()), handler sleeps on PR LOOKUP,UPDATE,PERM lock.

      We have a similar deadlock with rename-onto.

      I think the simplest way out of this mess would be to lock fewer bits in the unlink handler. Can anyone say why unlink should invalidate cached layout? An open unlinked file is still valid for IO.

      Attachments

        Issue Links

          Activity

            People

              jhammond John Hammond
              jhammond John Hammond
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: