[LU-5433] Man page for llapi_hsm_state_get(3) needs some clarification Created: 30/Jul/14  Updated: 22/Dec/15  Resolved: 22/Jun/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.8.0

Type: Bug Priority: Minor
Reporter: Robert Read (Inactive) Assignee: Robert Read (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Attachments: Text File llapi_hsm_state_get.txt    
Severity: 3
Rank (Obsolete): 15134

 Description   

I noticed this manage appeared to be a bit incorrect and could use some editing in general.

One thing that caught my eye as not quite right:

       HS_EXISTS           A file copy exists in HSM backend.

That flag only indicates the file has been assigned an HSM archive id, and does not mean it's been copied yet.

Patch in progress.



 Comments   
Comment by Robert Read (Inactive) [ 31/Jul/14 ]

http://review.whamcloud.com/#/c/11283/

Comment by Aurelien Degremont (Inactive) [ 21/Sep/14 ]

> That flag only indicates the file has been assigned an HSM archive id, and does not mean it's been copied yet.

That's true but you're losing an important point which is: it does mean that a copy could have been started. But failed for some reasons.
When this flag's set, administrators should consider a copy possibly exists in the backend pointed by archive_id. Especially if you are removing file from Lustre, you must take care of this flag. I'm afraid you lose this aspect if you change the documentation this way.

Comment by Robert Read (Inactive) [ 22/Sep/14 ]

I agree that flag indicates a file copy may or may not have been started, but there is no way to tell. If a copy exists in the backend and the ARCHIVE flag is not set, then it should be the backend's responsibility to clean this up, not the admin's.

Comment by Aurelien Degremont (Inactive) [ 23/Sep/14 ]

But the backend has no access to Lustre flags. Actually the backend is mostly Lustre agnostic. Those stale copies should be cleaned when the corresponding Lustre file is removed. This is the purpose of HS_EXISTS.

Comment by Robert Read (Inactive) [ 23/Sep/14 ]

It seems to me the copytool is the arbiter between the backend and Lustre state, and should be responsible for handling errors correctly. My interpretation of the flags is HS_ARCHIVED is set once a complete copy has been made in the backend, so it should only be possible for stale objects to exist of HS_ARCHIVED is also set. If only HS_EXISTS is set, then it seems one of several things could be true:

1) Nothing has been done to this file because copytool hasn't received the action request yet.
2) A copy is currently in progress
3) A copytool received action request but failed, and copytool/backed should arrange for proper cleanup

Comment by Thomas LEIBOVICI - CEA (Inactive) [ 26/Sep/14 ]

In case a problem occurs during the copy (backend outage, Lustre crash, copytool crash...) the copytool may not be able to clean the backend properly by itself.
Making the backend responsible for cleaning may not be appropriate or not possible, as it can be a very expensive to scan the whole backend namespace to search for aborted copy files (also, it must have a way to distinguish a regular copy from an aborted copy...).

Cleaning is not a problem if an entry is archived again: the copytool can do the cleaning of the previously aborted copy at this time. But, there is a leakage if the entry is removed from lustre in the meanwhile. In this case, HS_EXISTS flag is relayed in UNLINK changelog record so the Policy Engine is aware it must trigger a HSM_REMOVE action. This way, even a failed copy is correctly cleaned in the backend after file removal.

Comment by Frederic Saunier [ 18/Jun/15 ]

Does this mean that whenever HS_EXISTS is not set, the archive_id can be safely ignored? And if so, should it be still displayed by "lfs hsm_state" after a "lfs hsm_remove"?

Comment by Aurelien Degremont (Inactive) [ 18/Jun/15 ]

> Does this mean that whenever HS_EXISTS is not set, the archive_id can be safely ignored?

Yes, but in theory this should not happened. archive_id is set at the same time than HS_EXISTS.

> And if so, should it be still displayed by "lfs hsm_state" after a "lfs hsm_remove"?

A successful hsm_remove should clear both of them.
This is strange if you ran this command and archive_id was kept.

Comment by Gerrit Updater [ 19/Jun/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/11283/
Subject: LU-5433 doc: update llapi_hsm_state_get manpage
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 0572733bf6e41b35331358c33289ab46e0182878

Comment by Robert Read (Inactive) [ 22/Jun/15 ]

Landed on master.

Generated at Sat Feb 10 01:51:27 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.