Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
HSM tools currently store an external identifier (such as a UUID) in EA when a file is archived. The identifier is used to identify the file in the backend archive, and there may be more than one identifier if the file has been archived to multiple backends. Currently different tools are doing this independently and are not coordinating their EA names or formats.
When a file is deleted, the EA is no longer available, so it would be helpful include the identifier(s) in the Delete changelog record. I suggest we define a standard name and format HSM archive EA, and this data should be included as is in the delete changelog record.
One possible format would be to use JSON to encode a list of endpoint and archive id. Here is a strawman example to begin the discussion:
{ "replicas": [ { "endpoint" : "s3://my-bucket/archve", "id": "UUID" }, { "endpoint" : "wos://address", "id": OID } ] }
Alternatively, to save space the endpoint could just be an index that refers to a specific endpoint in the local configuration.
Yes, the "archive UUID" stored from the archive into the Lustre file is archive-specific, and in the Lustre-on-Lustre case it would likely be the remote FID. That said, rather than just using an arbitrary xattr, or changing the XATTR_NAME_HSM, there are benefits to putting the archive UUID as part of a composite layout (PFL/FLR) in the file.
There are several benefits to storing the HSM identifier in a composite layout:
One candidate for this is patch https://review.whamcloud.com/33755 "
LU-11376lov: new foreign LOV format", which is just a generic Lustre layout, but even with this it would need some infrastructure changes for the code to understand this component type in the context of HSM.