[LU-4640] Last unlink should trigger HSM remove by default Created: 17/Feb/14 Updated: 27/Jun/17 Resolved: 07/Jun/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.0 |
| Fix Version/s: | Lustre 2.10.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Aurelien Degremont (Inactive) | Assignee: | Bruno Faccini (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | HSM, patch | ||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 12691 | ||||||||||||
| Description |
|
In current code, when an archived file is removed from the file system, no HSM request is triggered. We rely on Changelogs and RobinHood reading them to have it sending corresponding hsm_remove requests to clean those orphans in HSM backend. This is done in purpose, to provide a way to implement "soft unlink". RobinHood will remove the file in the backend after a grace time. During this time, Admins could restore the file from the HSM if they want to (using import feature). Requiring a RobinHood setup to handle this cleaning is a too big limitation. We should consider modifying this behaviour. By default, Lustre should automatically add a HSM_REMOVE request for any last unlink. This way, no file will be leaked in the archive. |
| Comments |
| Comment by Andreas Dilger [ 17/Feb/14 ] |
|
Not sure if this is possible, but what about tracking the nlink count in the unlink record? That way, on the last unlink it is clear to the ChangeLog reader, without the need to write a separate record each time. I haven't looked at the code yet, just an idea as I read this bug. |
| Comment by Aurelien Degremont (Inactive) [ 18/Feb/14 ] |
|
The issue is not about which record needs to be added. When I say "add a HSM_REMOVE request", I do mean a Changelog record, but a real HSM request to be processed by coordinator and so on. The problem here is that we need an external tool to remove file from backend when they are unlink in Lustre. For simple setup, the policy engine should be optional. My idea was to track nlink == 0, when this is true, call something like if (nlink == 0 && file has HS_EXIST flag) { if not (policy & soft_remove) { mdt_hsm_request( HUA_REMOVE ); } } |
| Comment by Peter Jones [ 18/Feb/14 ] |
|
Bruno Could you please look into this one? Thanks Peter |
| Comment by Malcolm Cowe (Inactive) [ 18/Feb/14 ] |
|
While this is just my opinion, I would prefer that the default condition is not to remove the file from the backend in order to deliver behaviour consistent with the current case. |
| Comment by Aurelien Degremont (Inactive) [ 19/Feb/14 ] |
|
We can set this policy off by default, in order to be coherent with 2.5.0 behaviour. However, I'm wondering what admins are expecting when they setup a simple Lustre/HSM system. Do they expect file to be removed from backend when they unlink files in Lustre? I raised this issue following lots of discussion with pioneer users of Lustre 2.5.0 and their surprises. |
| Comment by Andreas Dilger [ 19/Feb/14 ] |
|
I agree that tracking nlink == 0 (via a flag in the unlink record?) is probably enough to signal to the policy engine when the last unlink happens. That said, I think it should be up to the policy engine and the system policy to decide when the archive copy is deleted. In some sites files may never be deleted, or the length of time that a file is kept may depend on the owner of the file or what directory it is in. Those are all decisions perfectly suited to the policy engine, and not inside Lustre. This means there would need to be some mechanism in the policy engine or archive to decide how long to keep files after they are deleted from Lustre, and to schedule the deletion afterward. |
| Comment by Aurelien Degremont (Inactive) [ 21/Feb/14 ] |
|
Andreas, all these features are already possible using RobinHood. The idea was to offer a possibility to use Lustre/HSM simply, without a PolicyEngine, and so be able to remove file from backend when needed. |
| Comment by Malcolm Cowe (Inactive) [ 23/Feb/14 ] |
|
From the original hpdd_discuss thread, I thought that there was a limitation: that the archive_id of the unlinked file is not known by RH, so that a deferred remove does not know which archive to remove the file from. So, it's not enough to know the FID of the file that has been deleted from Lustre; one also needs the archive_id in order to locate the remote copy. |
| Comment by Aurelien Degremont (Inactive) [ 24/Feb/14 ] |
|
That's true that RBH does not know this archive_id, but most of the time it will use the defaul_archive_id for that. Anyway, this is not related to this ticket. This ticket try to manage archive removal with a setup WITHOUT Robinhood. |
| Comment by Andreas Dilger [ 25/Feb/14 ] |
|
Sorry, I didn't understand the details before. In the case of HSM without RobinHood then it should delete the file immediately when the last link is dropped. It appears that this is (or should be) already handled with CLF_UNLINK_LAST in the CL_UNLINK record. If there is a simple ChangeLog processing tool (not RH) then it makes sense to have it turn CLF_UNLINK_LAST directly into HUA_REMOVE. I agree with Malcolm that this behaviour should be optional (default to off), since the archive may have its own retention policy. |
| Comment by Bruno Faccini (Inactive) [ 12/Mar/14 ] |
|
Actually, in master code there is even a new CLF_UNLINK_HSM_EXISTS flag to indicate that the unlinked files is archived. |
| Comment by Aurelien Degremont (Inactive) [ 12/Mar/14 ] |
|
It seems I'm very unclear. I do not know where I was so wrong when I explain my view. Forget about Changelogs for this ticket! Do not consider Changelog for this ticket. My whole point could be considered even with Changelog off. > Last, what would be the best way to start a HSM-Remove request, from/in an external thread/context or by artificially create one ? My idea is to call mdt_hsm_add_actions( HSM_REMOVE: FID ) when MDT UNLINK RPC handler detect the file was successfully removed and nlink == 0. This do not required any userspace program nor Changelog record. This behaviour should be controlled by a tunable. I need to check code to be more precise where to put this, but somewhere on mdt_unlink() path code, a call to mdt_hsm_request() |
| Comment by Robert Read (Inactive) [ 12/Mar/14 ] |
|
It would be convenient if the HSM_REMOVE also included the xattrs (or perhaps just the value of a specific user xattr). This would allow copytools that store their own key as an xattr to be able to cleanup after deletion as well. |
| Comment by Bruno Faccini (Inactive) [ 12/Mar/14 ] |
|
Aurelien, |
| Comment by Aurelien Degremont (Inactive) [ 17/Mar/14 ] |
MDS layering was a pain when designing Coordinator. Now, most of coordinator is implemented in MDT layer, as a consequence, unfortunately, you cannot create a new request in mdd_unlink() After looking at the code, here is quick proposal. mdt_reint_unlink() seems to be a good place to add this call. if (rc == 0 && !lu_object_is_dying(&mc->mot_header)) // Also read HSM data here: ma_need |= MA_HSM rc = mdt_attr_get_complex(info, mc, ma); if (rc == 0) // In this function, call mdt_hsm_add_actions() to add a new request mdt_handle_last_unlink(info, mc, ma); Atomicity and error management could be tricky as we are in MDT layer, though. But we do not have other possibilities. |
| Comment by Bruno Faccini (Inactive) [ 09/Mar/15 ] |
|
After more discussions it appears that this new behavior should not be enable by default but upon a new tunable setting. |
| Comment by Gerrit Updater [ 07/Apr/15 ] |
|
Faccini Bruno (bruno.faccini@intel.com) uploaded a new patch: http://review.whamcloud.com/14384 |
| Comment by Bruno Faccini (Inactive) [ 09/Apr/15 ] |
|
Patch-set #1 testing has shown that requesting HSM data was unnecessary due to being already done before for a fully unlinked file via mdt_reint_unlink()->mdo_unlink()->mdd_unlink()->mdd_hsm_archive_exists() call sequence. -EAGAIN returned for a dying object) since object seems already set as dying via mdt_reint_unlink()->mdo_unlink()->mdd_unlink()->mdd_finish_unlink()->mdo_destroy()->... call sequence. I use mdt_agent_record_add() directly instead. |
| Comment by Bruno Faccini (Inactive) [ 23/Apr/15 ] |
|
Now using mdt_agent_record_add() directly raises a new question, should we wake-up (by calling mdt_hsm_cdt_wakeup()) CDT upon each of these hsm_remove requests being created or simply "batch" them in hsm_actions LLOG waiting for CDT to wake-up upon any other event or loop_period timer end ? My feeling is that batching them is a good idea since there is no emergency to handle them ... |
| Comment by Aurelien Degremont (Inactive) [ 28/Apr/15 ] |
|
I agree with you that we do not need to wake up the CDT for each of them. It will be more difficult to write tests however. |
| Comment by Gerrit Updater [ 14/Mar/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/14384/ |
| Comment by Bruno Faccini (Inactive) [ 14/Mar/16 ] |
|
Hello Aurelien, |
| Comment by Gerrit Updater [ 15/Mar/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) uploaded a new patch: http://review.whamcloud.com/18925 |
| Comment by Gerrit Updater [ 15/Mar/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/18925/ |
| Comment by Oleg Drokin [ 15/Mar/16 ] |
|
This patch was reverted due to testing problems discussed in |
| Comment by Bruno Faccini (Inactive) [ 15/Mar/16 ] |
|
Well the problem comes from the time that this patch has waited to be landed, when in the meantime patch for Will push a new version/patch-set of http://review.whamcloud.com/14384/ that will integrate my additional change for |
| Comment by Bruno Faccini (Inactive) [ 15/Mar/16 ] |
|
Humm, but what is the best way to do so? I have tried to first rebase my patch on top of master already containing it and its revert, but nothing/no-change happen. |
| Comment by Bruno Faccini (Inactive) [ 16/Mar/16 ] |
|
Basically and after more thinking about this, seems that the best way is to revert the revert and then to add additional changes ( |
| Comment by Gerrit Updater [ 16/Mar/16 ] |
|
Faccini Bruno (bruno.faccini@intel.com) uploaded a new patch: http://review.whamcloud.com/18946 |
| Comment by Li Xi (Inactive) [ 08/May/17 ] |
|
Hi, Any plan to merge this patch in the near future? I am asking because I realize this is a critical problem for the policy engine that I am working on (Lustre Integrated Policy Engine). This policy engine doesn't require Lustre Changelog, thus the policy engine can not be notified by the Changelog about the unlinking of files. |
| Comment by Gerrit Updater [ 08/May/17 ] |
|
Li Xi (lixi@ddn.com) uploaded a new patch: https://review.whamcloud.com/26980 |
| Comment by Peter Jones [ 08/May/17 ] |
|
Li Xi It is feasible that we could land a fix for this change earliy in the 2.11 dev cycle Peter |
| Comment by Bruno Faccini (Inactive) [ 09/May/17 ] |
|
Ok will try to work again on this patch if there is an urgent need now.
But I had still work to be done in order to address new condition caused by early object removal. By the way Li, why did you create a new #26980 Change-Id instead working on top of mine/existing ?? |
| Comment by Li Xi (Inactive) [ 09/May/17 ] |
|
Hi Bruno, Thanks for helping. Yeah, I create a new patch based on your latest patch. I first improved the test 26c and 26d. And after that, the test results immediately shows that the mdt_attr_get_complex returns -2 because the object is deleted. That means, the MA_HSM and MA_INODE needs to be read before unlink happens. That is why I split mdt_handle_last_unlink to mdt_handle_last_unlink_prepare and mdt_handle_last_unlink_commit. Let's see what is the test result thi time. |
| Comment by Li Xi (Inactive) [ 09/May/17 ] |
|
I could have pushed a new patch using the the same change ID of #18946. But I am still not sure whether my thought is correct or not |
| Comment by Li Xi (Inactive) [ 09/May/17 ] |
|
The patch #26980 was on the top of version 6 of path #18946. And version 8 of patch #18946 looks better than #26980 now. |
| Comment by Gerrit Updater [ 07/Jun/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/18946/ |
| Comment by Peter Jones [ 07/Jun/17 ] |
|
Landed for 2.10 |