[LU-8820] skip LL_IOC_HSM_COPY_START for HSM removes Created: 10/Nov/16 Updated: 24/Apr/17 Resolved: 16/Mar/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.10.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | John Hammond | Assignee: | John Hammond |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | hsm | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
Robert has pointed out that there is no need for the copytool to call the LL_IOC_HSM_COPY_START ioctl (which sends a MDS_HSM_PROGRESS) when processing a remove action. |
| Comments |
| Comment by Gerrit Updater [ 10/Nov/16 ] |
|
John L. Hammond (john.hammond@intel.com) uploaded a new patch: http://review.whamcloud.com/23700 |
| Comment by Gerrit Updater [ 16/Mar/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/23700/ |
| Comment by Peter Jones [ 16/Mar/17 ] |
|
Landed for 2.10 |
| Comment by Quentin Bouget [ 21/Apr/17 ] |
|
Since this patch, test_225 does not pass anymore, has this patch altered the behaviour of changelogs? |
| Comment by John Hammond [ 21/Apr/17 ] |
|
Quentin, test_225() is effectively disabled. The way the CDT is implemented it processes the log sequentially, so it sends the remove, then marks the remove action canceled, then sends the cancel. It's a bit weird, but so is most of the way the CDT code is factored. But I don't see us changing this behavior very soon. Are you running a modified version of test_225()? If this functionality is important to you then could you please explain why? |
| Comment by Quentin Bouget [ 21/Apr/17 ] |
|
Hi John, I am working on LU-8987 which aims at enabling test_225 (I merely removed the return statement at the start of the test). The modified test used to pass before this patch. It would seem this patch affected the content of the changelog when a remove request gets canceled. I am just wondering if the new behaviour is the "correct" one or if a regression was masked by test_225 being disabled. test_225 expects a changelog record with the flag 0x27D, from what I gathered in "include/lustre/lustre_user.h" the 0x200 part is related to the HSM_REMOVE operation and the 0x7D means the "HSM return code" is 125. |
| Comment by John Hammond [ 24/Apr/17 ] |
|
125 is ECANCELED, 0 is success, so this makes sense. I think the new behavior is correct. It's analogous to hitting Ctrl-C just after starting rm vs cp (on a large file). Since rm (unlink()) is atomic, we wouldn't expect the Ctrl-C to prevent the file from being removed. But file copy is not atomic, so we would expect Ctrl-C to do stop the copy. |
| Comment by Quentin Bouget [ 24/Apr/17 ] |
|
I don't get it, "lfs hsm_remove" is not atomic, so why act as if it were? For tools that rely on parsing the changelogs, this can actually be harmful, a cancelled remove would be interpreted as a successful one, wouldn't it? |
| Comment by John Hammond [ 24/Apr/17 ] |
|
> I don't get it, "lfs hsm_remove" is not atomic, so why act as if it were? The action performed by the CT for remove is. > For tools that rely on parsing the changelogs, this can actually be harmful, a cancelled remove would be interpreted as a successful one, wouldn't it? I think there's a race that exists with or without this patch. You can cancel a request after the CT has removed the file but before it has reported success. Then how should the request be marked? |
| Comment by Quentin Bouget [ 24/Apr/17 ] |
|
> The action performed by the CT for remove is. Yes, you are right. >I think there's a race that exists with or without this patch. You can cancel a request after the CT has removed the file but before it has reported success. Then how should the request be marked? I think it is up to the copytool to choose whether and how to act upon receiving a cancel request. So the only "inconsistency" is that the coordinator can report a request as canceled even though it may not be canceled; in which case one can check the changelog records to find out what really happened. |