Details
-
New Feature
-
Resolution: Fixed
-
Minor
-
None
-
None
-
3
-
9223372036854775807
Description
Summary
For DMF we would like to have the ability to copy files to a Lustre filesystem and set the file state to archived without going through the HSM coordinator and having to impersonate a copytool.
Background
DMF has its own concept of HSM file state. A file is regular (REG) if no archive copy exists, dual state (DUL) if a copy exists and the file data is identical, modified (MOD) if a copy exists and the file data has changed, and offline (OFL) if the data on the filesystem has been released.
On DMAPI-enabled filesystems (EXFS and GPFS) this is implemented by setting a "managed region" on a file, which tells the filesystem that a call to read, write, or truncate needs to go to an user daemon to be approved or denied. For DUL files we trap write and truncate, for OFL we trap all three, and for REG and MOD no managed region needs to be set.
This makes is easy to copy a file on a DMAPI filesystem, and then set the DMF state. But Lustre is very different.
Lustre HSM State
Lustre keeps the HSM state of a file in the trusted.hsm extended attribute. The attribute contains 4 fields: a set of compatibility flags (all zero), a set of HSM state flags, the archive id, and the data version.
The data version is a semi-random integer that changes whenever file contents are modified. When archiving a file completes, the current data version is stored in the hsm attribute.
In addition, there is a special file layout flag to mark the layout of a released file. When an attempt is made to read or write a file, this flag is checked, while the flags in the extended attribute are ignored.
Impersonating a Copytool
The canonical way to copy a file to Lustre and set HSM state would be the following.
- Register the file copy app with a custom archive id.
- Create the file with llapi_hsm_import().
- Force a restore of the file data.
- The Lustre HSM coordinator instructs the file copy app.
- Call llapi_hsm_action_begin() to start the copy.
- Copy the data from another filesystem,
- Call llapi_hsm_action_end() to complete the copy.
This makes the program flow for Lustre very different from EXFS and GPFS.
Directly Setting HSM Flags
An alternative would be to do the file copy and then set the ARCHIVED and possibly RELEASED flags, in addition to the acrhive-id used by DMF. Here the problem comes in: while Lustre does many sanity checks surrounding attempts to set or clear these flags with the LL_IOC_HSM_STATE_SET interface, setting them does not correctly set the HSM filesystem state.
Setting the ARCHIVED flag on a file does not update the data version in trusted.hsm. A request to release the file does compare the data version against the trusted.hem version, and refuses the request if they differ. As a result, a file can be marked ARCHIVED, yet be impossible to release.
Similarly setting or clearing the RELEASED flag on a file does not change the file layout, and as such has no bearing on whether attempts to read or write the file will find it in released state.
RFE: Ability to Set the HSM Data Version of a File
With the above background, what we would like for DMF is the ability to set data version in the trusted.hsm attribute. For this I see several options.
1) Allow User Space Writes to trusted.hsm
Currently user space writes to trusted.hsm are stopped on the server in mdt_reint_setxattr(), and in some Lustre versions, also on the client in ll_setxattr(). No error is returned. Removing these checks would be sufficient, a simple user space application can then read the attribute and write it back with the current data version in place. Since the attribute is in the trusted namespace, only someone with root-level capabilities will be able to do this. Such a read-modify-write cycle is subject to races, and the trusted.hsm extended attribute could be corrupted.
2) Add an ioctl to Set the HSM Data Version
Adding a new ioctl specifically to set the the HSM data version to a provided value would work around concerns for the integrity of the trusted,hsm atribute. But the implementation might be complex, as it seems likely that this would require adding a new RPC to the over-the-wire protocol.
3) Update HSM Data Version when Setting the ARCHIVED Flag
A simplifying assumption is that DMF user space does not need to know the HSM data value, all it needs a way to make it equal to the current value. Updating the HSM data version when the HSM ARCHIVED flag is set would work. However, this does introduce a new, exiting, side effect to an existing operation, which might be considered a disadvantage.
The exact sanity checks that should apply need to be discussed:
- Only sync if the HSM data version is 0?
- Only sync if DIRTY is not also set?
- Other?
4) Update HSM Data Version with a New hsm_user_action
Instead of using a side effect of setting the ARCHIVED flag, requesting a sync of the HSM data version could be its own hsm_user_action. This looks like it has the fewest drawbacks for kernel-side code, while making the user side slightly more complex.
5) Other Options
There may be other plausible approaches that I have not thought of.
Conclusion
The primary goal here is to obtain agreement on a preferred approach, prior to us (HPE) investing resources in implementing something. Our preference at this point is (3), then (4). Neither (1) nor (2) seem very attractive prospects.