Details
-
Technical task
-
Resolution: Fixed
-
Blocker
-
Lustre 2.5.0
-
10714
Description
In 23c197908902183d5f88d3f431da6cde9c290e07 LU-3811 hsm: handle file ownership and timestamps, I added a stat() of the file being restored to the CT's restore path. This is to ensure that the volatile file is given the correct ownership and timestamps before the restore, and is required for the layout swap to succeed. However this introduces a potential for deadlock vs unlink() and other operations. Consider the following sequence of operations on a single file:
- Client sends restore, CDT takes and holds EX LAYOUT lock.
- Client sends unlink, handler sleeps on EX FULL lock.
- CDT sends restore action to CT.
- CT begins restore, sends getattr (from stat()), handler sleeps on PR LOOKUP,UPDATE,PERM lock.
We have a similar deadlock with rename-onto.
I think the simplest way out of this mess would be to lock fewer bits in the unlink handler. Can anyone say why unlink should invalidate cached layout? An open unlinked file is still valid for IO.