Details

    • Technical task
    • Resolution: Fixed
    • Blocker
    • Lustre 2.5.0
    • Lustre 2.5.0
    • 10714

    Description

      In 23c197908902183d5f88d3f431da6cde9c290e07 LU-3811 hsm: handle file ownership and timestamps, I added a stat() of the file being restored to the CT's restore path. This is to ensure that the volatile file is given the correct ownership and timestamps before the restore, and is required for the layout swap to succeed. However this introduces a potential for deadlock vs unlink() and other operations. Consider the following sequence of operations on a single file:

      1. Client sends restore, CDT takes and holds EX LAYOUT lock.
      2. Client sends unlink, handler sleeps on EX FULL lock.
      3. CDT sends restore action to CT.
      4. CT begins restore, sends getattr (from stat()), handler sleeps on PR LOOKUP,UPDATE,PERM lock.

      We have a similar deadlock with rename-onto.

      I think the simplest way out of this mess would be to lock fewer bits in the unlink handler. Can anyone say why unlink should invalidate cached layout? An open unlinked file is still valid for IO.

      Attachments

        Issue Links

          Activity

            [LU-4002] HSM restore vs unlink deadlock
            jhammond John Hammond added a comment -

            Note to self. After http://review.whamcloud.com/13750 (LU-4727 hsm: use IOC_MDC_GETFILEINFO in restore) the copytool uses IOC_MDC_GETFILEINFO/MDS_GETATTR_NAME to get the attributes of the released file which bypasses LLDM and does not have the issue described above.

            jhammond John Hammond added a comment - Note to self. After http://review.whamcloud.com/13750 ( LU-4727 hsm: use IOC_MDC_GETFILEINFO in restore) the copytool uses IOC_MDC_GETFILEINFO/MDS_GETATTR_NAME to get the attributes of the released file which bypasses LLDM and does not have the issue described above.
            jhammond John Hammond added a comment -

            Please see http://review.whamcloud.com/7927 for a sketch of this approach.

            jhammond John Hammond added a comment - Please see http://review.whamcloud.com/7927 for a sketch of this approach.
            jhammond John Hammond added a comment -

            I believe that we could revert this patch if the HSM coordinator would send the ownership and timestamps to use on the volatile file along with the restore action. Then the copytool will not need to stat() the original file. The restoring layout swap will check that the ownerships agree protecting us from a TOCTTOU issue.

            jhammond John Hammond added a comment - I believe that we could revert this patch if the HSM coordinator would send the ownership and timestamps to use on the volatile file along with the restore action. Then the copytool will not need to stat() the original file. The restoring layout swap will check that the ownerships agree protecting us from a TOCTTOU issue.

            Hi Andreas, this is a known issue but we still decided to land the patch because the deadlock issue is more severe.

            The problem is that when the LOOKUP lock is revoked, we don't know if this is because the file is being unlinked or renamed. However, leaving some locks in cache may not be a problem because if the system is active, the locks will be discarded by LRU soon or later.

            jay Jinshan Xiong (Inactive) added a comment - Hi Andreas, this is a known issue but we still decided to land the patch because the deadlock issue is more severe. The problem is that when the LOOKUP lock is revoked, we don't know if this is because the file is being unlinked or renamed. However, leaving some locks in cache may not be a problem because if the system is active, the locks will be discarded by LRU soon or later.

            Might have jumped the gun on this. Closing it again until we know it is the culprit.

            adilger Andreas Dilger added a comment - Might have jumped the gun on this. Closing it again until we know it is the culprit.

            Per recent comments in LU-4053, this is causing a lot of layout locks to be left on the client after an inode is unlinked. Ideally, the layout lock would be revoked if this is the last reference to the inode (last link and file is not opened). The main question is whether it is possible to know this in advance? Unlike the open-unlinked handling, at worst this will revoke an extra lock if there is a race and another thread opens the file just before it is unlinked, so I think it is better to handle the common case more efficiently.

            Is there any way to know in advance if HSM is processing this file and not try to revoke the layout lock in this case?

            adilger Andreas Dilger added a comment - Per recent comments in LU-4053 , this is causing a lot of layout locks to be left on the client after an inode is unlinked. Ideally, the layout lock would be revoked if this is the last reference to the inode (last link and file is not opened). The main question is whether it is possible to know this in advance? Unlike the open-unlinked handling, at worst this will revoke an extra lock if there is a race and another thread opens the file just before it is unlinked, so I think it is better to handle the common case more efficiently. Is there any way to know in advance if HSM is processing this file and not try to revoke the layout lock in this case?

            Patch landed to Master. If more work is needed in this ticket, please let me know and I will reopen this ticket.

            jlevi Jodi Levi (Inactive) added a comment - Patch landed to Master. If more work is needed in this ticket, please let me know and I will reopen this ticket.
            jhammond John Hammond added a comment -

            I suggest that this be considered a blocker for 2.5.0. It is easy to imagine situations where users will trigger this deadlock. Faced with a long running restore on a file (which to the user may just seem like an unresponsive console or FS) the user may logout, login, and unlink the released file (since perhaps it is easy to regenerate anyway).

            jhammond John Hammond added a comment - I suggest that this be considered a blocker for 2.5.0. It is easy to imagine situations where users will trigger this deadlock. Faced with a long running restore on a file (which to the user may just seem like an unresponsive console or FS) the user may logout, login, and unlink the released file (since perhaps it is easy to regenerate anyway).
            jhammond John Hammond added a comment - Please see http://review.whamcloud.com/7792 .
            jhammond John Hammond added a comment -

            In general it won't be enough just to remove LAYOUT from unlink.Assume restore takes EX LAYOUT, some other operation tries to take PR LAYOUT | LOOKUP | ... and waits. Then unlink tries to take EX LOOKUP | .... Then stat (from the CT) tries to take LOOKUP | UPDATE | ... and deadlocks.

            As I keep saying, any operation that requires two lock is dangerous.

            Note that unlink (and rename onto) should take more than just LOOKUP, since it modifies link count and timestamps.

            Could we do something sane seeming like having the MDT send the attributes that the CT is getting from stat?

            jhammond John Hammond added a comment - In general it won't be enough just to remove LAYOUT from unlink.Assume restore takes EX LAYOUT, some other operation tries to take PR LAYOUT | LOOKUP | ... and waits. Then unlink tries to take EX LOOKUP | .... Then stat (from the CT) tries to take LOOKUP | UPDATE | ... and deadlocks. As I keep saying, any operation that requires two lock is dangerous. Note that unlink (and rename onto) should take more than just LOOKUP, since it modifies link count and timestamps. Could we do something sane seeming like having the MDT send the attributes that the CT is getting from stat?

            People

              jhammond John Hammond
              jhammond John Hammond
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: