I agree we need to prioritize these operations, however I don’t believe adding prioritization to the coordinator is right answer here. We are currently on a path that will turn the coordinator into a general purpose request queue, and this is not something that belongs in Lustre code and certainly not in the kernel.
Instead, we should move the HSM request processing out of the kernel and into user space. Although Lustre will still need to keep track of the implicit restore requests triggered by file access, all other operations could be done without using a coordinator. Lustre should provide the mechanisms needed for a correct HSM system, and allow the user space tools manage all of the policies around what and when is copied and their priorities.
I’m still thinking about exactly what this should look like, but at a minimum an Archive operation begins with setting the EXISTS flag, and completes with setting ARCHIVE flag. If the file is modified after EXISTS is set, then the MDT will set the DIRTY flag and reject the ARCHIVE flag when mover attempts to set it later.
A Restore operation is primarily a layout swap, though it may need to be a special case to ensure the RELEASED flag is cleared atomically with the swap.
A Remove operation is done by clearing the EXISTS and ARCHIVE flags.
The existing coordinator should remain in place for some time to continue to support current set of tools, but I would like to discourage adding further complexity, and solve issues like this in a different way.
Quentin Bouget (quentin.bouget@cea.fr) uploaded a new patch: https://review.whamcloud.com/27800
Subject: LU-8324 hsm: ease the development of a different coordinator
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 0ee6e5d71c6f549b48be59607d5f55f28e950f47