Details
-
Technical task
-
Resolution: Fixed
-
Blocker
-
Lustre 2.5.0
-
10714
Description
In 23c197908902183d5f88d3f431da6cde9c290e07 LU-3811 hsm: handle file ownership and timestamps, I added a stat() of the file being restored to the CT's restore path. This is to ensure that the volatile file is given the correct ownership and timestamps before the restore, and is required for the layout swap to succeed. However this introduces a potential for deadlock vs unlink() and other operations. Consider the following sequence of operations on a single file:
- Client sends restore, CDT takes and holds EX LAYOUT lock.
- Client sends unlink, handler sleeps on EX FULL lock.
- CDT sends restore action to CT.
- CT begins restore, sends getattr (from stat()), handler sleeps on PR LOOKUP,UPDATE,PERM lock.
We have a similar deadlock with rename-onto.
I think the simplest way out of this mess would be to lock fewer bits in the unlink handler. Can anyone say why unlink should invalidate cached layout? An open unlinked file is still valid for IO.
Per recent comments in
LU-4053, this is causing a lot of layout locks to be left on the client after an inode is unlinked. Ideally, the layout lock would be revoked if this is the last reference to the inode (last link and file is not opened). The main question is whether it is possible to know this in advance? Unlike the open-unlinked handling, at worst this will revoke an extra lock if there is a race and another thread opens the file just before it is unlinked, so I think it is better to handle the common case more efficiently.Is there any way to know in advance if HSM is processing this file and not try to revoke the layout lock in this case?