Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
HSM tools currently store an external identifier (such as a UUID) in EA when a file is archived. The identifier is used to identify the file in the backend archive, and there may be more than one identifier if the file has been archived to multiple backends. Currently different tools are doing this independently and are not coordinating their EA names or formats.
When a file is deleted, the EA is no longer available, so it would be helpful include the identifier(s) in the Delete changelog record. I suggest we define a standard name and format HSM archive EA, and this data should be included as is in the delete changelog record.
One possible format would be to use JSON to encode a list of endpoint and archive id. Here is a strawman example to begin the discussion:
{ "replicas": [ { "endpoint" : "s3://my-bucket/archve", "id": "UUID" }, { "endpoint" : "wos://address", "id": OID } ] }
Alternatively, to save space the endpoint could just be an index that refers to a specific endpoint in the local configuration.
If the timeout for these records is a week, then I don't think it is practical to keep this in unlinked inodes in the PENDING directory. Otherwise, there may be far too many inodes created and deleted in that period and PENDING may get too large. In that case I think it is more practical to store the UUID into the ChangeLog record.
In newer releases it is possible to add extensible fields to ChangeLog records as needed, and the lifetime of those records will be exactly as needed. They will only consume a few bytes in a block in the log, and not an inode or increase in the size of the PENDING directory.