[LU-5836] Error "Device or resource busy" after attempting release of file cleared from 'dirty' Created: 31/Oct/14 Updated: 14/Sep/15 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.2 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Andrew Moe | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | HSM, patch | ||
| Environment: |
CentOS 6 |
||
| Severity: | 3 |
| Epic: | client, server |
| Project: | HSM |
| Rank (Obsolete): | 16361 |
| Description |
|
I can consistently replicate this error message. # Write a file $ dd if=/dev/zero of=file01 bs=1M count=1 1+0 records in 1+0 records out 1048576 bytes (1.0 MB) copied, 0.00335841 s, 312 MB/s # Archive the file $ sudo lfs hsm_archive file01 # Wait for file to archive # Write to file to make it 'dirty' $ echo "0" >> file01 # Clear the dirty state $ lfs hsm_clear --dirty file01 # Attempt to release the file $ sudo lfs hsm_release file02 Cannot send HSM request (use of file02): Device or resource busy I suppose it might make sense that a file should not be released if it is truly dirty. But the error message does not seem to be appropriate. Perhaps it should not even be possible to clear a 'dirty' state This is a comment from another person I've consulted on this:
|
| Comments |
| Comment by Gerrit Updater [ 20/May/15 ] |
|
Ulka Vaze (ulka.vaze@yahoo.in) uploaded a new patch: http://review.whamcloud.com/14874 |
| Comment by Ulka Vaze (Inactive) [ 20/May/15 ] |
|
Here is Analysis of this issue- 1. Since dirty flag is cleared request goes to coordinator. 00000004:20000000:0.0:1432012500.276456:0:14313:0:(mdt_open.c:1638:mdt_hsm_release()) [0x200000400:0x1:0x0] data_version mismatches: packed=120259085297 and on-disk=68719476768 Here it finds version is mismatched so returns request with error EPERM if (rc == 0 && op_data->op_bias & MDS_HSM_RELEASE) {
struct mdt_body *body;
body = req_capsule_server_get(&req->rq_pill, &RMF_MDT_BODY);
if (!(body->mbo_valid & OBD_MD_FLRELEASED)) {
rc = -EBUSY;
}
4. EBUSY is misleading message and we should propagate EPERM message coming from coordinator So proposed solution is - Code is pushed for review. Test logs - [root@cli-3 ~]# echo "test" >> /mnt/lustre2/testfile2 [root@cli-3 ~]# lfs hsm_state /mnt/lustre2/testfile2 /mnt/lustre2/testfile2: (0x00000000) [root@cli-3 ~]# lfs hsm_archive /mnt/lustre2/testfile2 ***** FD = 3 PATH = /mnt/lustre2/testfile2 ***** [root@cli-3 ~]# lfs hsm_state /mnt/lustre2/testfile2 /mnt/lustre2/testfile2: (0x00000009) exists archived, archive_id:1 [root@cli-3 ~]# echo xxx >> /mnt/lustre2/testfile2 [root@cli-3 ~]# lfs hsm_state /mnt/lustre2/testfile2 /mnt/lustre2/testfile2: (0x0000000b) exists dirty archived, archive_id:1 [root@cli-3 ~]# lfs hsm_clear --dirty /mnt/lustre2/testfile2 [root@cli-3 ~]# lfs hsm_state /mnt/lustre2/testfile2 /mnt/lustre2/testfile2: (0x00000009) exists archived, archive_id:1 [root@cli-3 ~]# lfs hsm_release /mnt/lustre2/testfile2 ***** FD = 3 PATH = /mnt/lustre2/testfile2 ***** Cannot send HSM request (use of /mnt/lustre2/testfile2): Operation not permitted |
| Comment by Aditya Pandit [ 10/Sep/15 ] |
|
Can you please review it? |
| Comment by Patrick Farrell (Inactive) [ 10/Sep/15 ] |
|
This looks like an OK way to get the result Ulka explains in the comment above, but I'm not sure it's the correct result. -EPERM is also not a correct error here, is it? This is a design level question I don't have an answer for: Actually, as I think further, perhaps this is not a bug: I think the ability to clear the dirty flag manually is a debug/emergency recovery option? Use of such a manual intervention tool could be expected to cause problems if it is not used in exactly the right situation. So less than perfect handling of this case may not be something we need to fix. What were you trying to accomplish by clearing the dirty flag manually? |
| Comment by Andrew Moe [ 10/Sep/15 ] |
|
This bug was not preventing me from accomplishing anything in particular. I reported this bug because I performed a sequence of valid operations and experienced an error message that did not seem appropriate for situation. I'm not asserting that the file should be released or not released; I agree that this would be a design level question. |
| Comment by Aditya Pandit [ 11/Sep/15 ] |
|
If you archive the file again. Releasing the file works. I have added that in the test script. Can someone from design team review and post what should be the ideal behavior? |
| Comment by Andrew Moe [ 14/Sep/15 ] |
|
It's been a while since this bug has been reported, so it may take some time to rebuild our test environment. |