Details

    • Technical task
    • Resolution: Fixed
    • Blocker
    • Lustre 2.5.0
    • Lustre 2.5.0
    • 10714

    Description

      In 23c197908902183d5f88d3f431da6cde9c290e07 LU-3811 hsm: handle file ownership and timestamps, I added a stat() of the file being restored to the CT's restore path. This is to ensure that the volatile file is given the correct ownership and timestamps before the restore, and is required for the layout swap to succeed. However this introduces a potential for deadlock vs unlink() and other operations. Consider the following sequence of operations on a single file:

      1. Client sends restore, CDT takes and holds EX LAYOUT lock.
      2. Client sends unlink, handler sleeps on EX FULL lock.
      3. CDT sends restore action to CT.
      4. CT begins restore, sends getattr (from stat()), handler sleeps on PR LOOKUP,UPDATE,PERM lock.

      We have a similar deadlock with rename-onto.

      I think the simplest way out of this mess would be to lock fewer bits in the unlink handler. Can anyone say why unlink should invalidate cached layout? An open unlinked file is still valid for IO.

      Attachments

        Issue Links

          Activity

            [LU-4002] HSM restore vs unlink deadlock
            jhammond John Hammond made changes -
            Link New: This issue is related to LU-4727 [ LU-4727 ]
            adilger Andreas Dilger made changes -
            Resolution New: Fixed [ 1 ]
            Status Original: Reopened [ 4 ] New: Resolved [ 5 ]
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-4053 [ LU-4053 ]
            adilger Andreas Dilger made changes -
            Resolution Original: Fixed [ 1 ]
            Status Original: Resolved [ 5 ] New: Reopened [ 4 ]
            jlevi Jodi Levi (Inactive) made changes -
            Resolution New: Fixed [ 1 ]
            Status Original: Open [ 1 ] New: Resolved [ 5 ]
            jlevi Jodi Levi (Inactive) made changes -
            Priority Original: Critical [ 2 ] New: Blocker [ 1 ]
            pjones Peter Jones made changes -
            Assignee Original: WC Triage [ wc-triage ] New: John Hammond [ jhammond ]
            jlevi Jodi Levi (Inactive) made changes -
            Fix Version/s New: Lustre 2.5.0 [ 10295 ]
            Priority Original: Minor [ 4 ] New: Critical [ 2 ]
            jay Jinshan Xiong (Inactive) made changes -
            Parent New: LU-3647 [ 20020 ]
            Severity Original: 3 [ 10022 ]
            Issue Type Original: Bug [ 1 ] New: Technical task [ 7 ]
            jhammond John Hammond created issue -

            People

              jhammond John Hammond
              jhammond John Hammond
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: