[LU-5896] lfs: can't release a migrated file Created: 10/Nov/14 Updated: 30/Nov/23 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Upstream |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Frank Zago (Inactive) | Assignee: | Artem Blagodarenko |
| Resolution: | Unresolved | Votes: | 2 |
| Labels: | CEA | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 16477 | ||||||||
| Description |
|
After a file has been archived and migrated, it's not possible to release it. # dd if=/dev/urandom of=rand1 bs=4096 count=1024 ... # lfs hsm_archive rand1 # lfs migrate -o 0 -S 65536 rand1 # lfs hsm_release rand1 Cannot send HSM request (use of rand1): Device or resource busy The file content is still accessible, and its hsm state is correct. |
| Comments |
| Comment by Henri Doreau (Inactive) [ 24/Aug/17 ] |
|
What happens here is a data version mismatch. The error originally returned by the MDS is EPERM (in mdt_hsm_release) but is translated into EBUSY by ll_close_inode_openhandle. I am unsure how to fix it best. I can think of different approaches:
|
| Comment by Quentin Bouget [ 08/Feb/19 ] |
|
Henri no longer works on this. I will try to publish a fix, but I am not familiar with the affected code base. Currently the approach I am taking is to have lfs migrate send both the initial and final dataversions of the migrated file when it commits the migration (with either llapi_fswap_layouts() / llapi_fswap_layouts_grouplock()). The MDS can then compare the initial dataversion with the one stored in the HSM attributes. If it is the same, it will update the old value with the new one; otherwise (HSM's dirty bit should already be set, but in case it is not) it will set HSM's dirty bit. |
| Comment by Quentin Bouget [ 26/Mar/19 ] |
|
A little update, I have not gotten around to implement the fix suggested in my previous post. I think I need to add a field to the SWAP_LAYOUT RPC and I was warned this is not a trivial change. In the meantime, I have re-implemented lfs migrate to forge the HSM xattr on the volatile file right before it is swapped. This seems to work. There are several shortcomings to this approach:
For now though, this is probably what we will use in production. At least until we can produce an upstream-able solution. |
| Comment by Gerrit Updater [ 31/May/19 ] |
|
Ben Evans (bevans@cray.com) uploaded a new patch: https://review.whamcloud.com/35022 |
| Comment by Quentin Bouget [ 05/Jun/19 ] |
|
I can confirm that there are not enough fields to hold two data_versions in the MDS_SWAP_LAYOUTS RPC. So we have to add them somewher. I think struct mdc_swap_layouts – being the payload of the RPC – is a good candidate:
/** layout swap request structure
* fid1 and fid2 are in mdt_body
*/
struct mdc_swap_layouts {
__u64 msl_flags;
#if TODO
__u64 msl_dvs[2];
#endif
};
On the userspace side, the struct passed to the LL_IOC_LOV_SWAP_LAYOUTS ioctl already has the required fields:
struct lustre_swap_layouts {
__u64 sl_flags;
__u32 sl_fd;
__u32 sl_fd;
__u64 sl_dv1;
__u64 sl_dv2;
};
There also needs to be a new flag, something like: SWAP_LAYOUTS_SAME_DATA to indicate to the MDS that both layouts point at the same data and the HSM xattrs should be updated accordingly. When this flag is used, it should be passed down to the MDS, as well as both data_versions. If the flag is not used, the new default should be to swap the HSM xattrs. There currently is a flag for that (which only the MDT code uses), but even without it, there really is no reason not to swap them: the HSM xattr is part of a file's layout. After that, updating lfs_migrate() should be straightforward. — This fix is not fully satisfying though. Swapping layouts should really just... swap layouts. If there is information that is tied to the layout (eg. HSM xattr), it should also be swapped, shouldn't it? So why am I proposing something that goes the opposite way? (ie. keeping the HSM xattr in some circumstances and update it in-place) Well, it all comes down to how HSM's security works. The HSM xattr lives under the trusted xattr prefix. That means that only root is able to modify it [*]. But any user can run a layout swap (on the condition that they have write access to both files), and by extension a file migration. This feels right, but it is simply incompatible with how HSM works. Ideally, to migrat{{e }}file.txt}}, one woud create a volatile file; copy another {{file.txt's data into the volatile; set the HSM xattr of the volatile; and swap file.txt's layout with the volatile's. Not possible. Personnally, I do not understand why users are not allowed to tell Lustre "there is a copy of <this> file <here>" (read "<this> file" as "<this#1> version of <this#2> file"). I mean, I get that it would open all sorts of scary doors, like "What if <here> points at data the user is not supposed to have access to?". But that really should be solved by copytools, or some piece of software between Lustre and whatever <here> is. I feel like it was convenient not to have to care about this at first, but now it just comes back to bite us. [*] actually, not even root can edit the HSM xattr, Lustre silently ignores setxattr operations on trusted.hsm, but that could be changed, so it does not really matter. — As Ben pointed out on gerrit, sites that have already migrated archived files are still going to need to "repair" the "broken" HSM xattrs. I suggest forcibly setting the dirty bit (with lfs hsm_set --dirty <file>) and schedule ARCHIVE requests on the concerned files (or let the policy engine schedule those if it can). There are probably more efficient ways to do this though. Ways that would not involve re-archiving a lot of data. |
| Comment by Gerrit Updater [ 22/Oct/19 ] |
|
Artem Blagodarenko (c17828@cray.com) uploaded a new patch: https://review.whamcloud.com/36548 |
| Comment by Gerrit Updater [ 22/Oct/19 ] |
|
Artem Blagodarenko (c17828@cray.com) uploaded a new patch: https://review.whamcloud.com/36549 |
| Comment by Qian Yingjin [ 01/Dec/19 ] |
|
When
2. normal layout swap:
3. data restore All these modification must be in MDD layout in a transaction. Otherwise, it may result in an inconsistence. |
| Comment by Qian Yingjin [ 01/Dec/19 ] |
|
But I am worried that an extra flag SWAP_LAYOUTS_DATA_MIGRATE will be used maliciously by a user... if the content of o1 and o2 are not same, then after SWAP_LAYOUTS_DATA_MIGRATE, it would wrongly think the data in Lustre after swapped is same as in HSM archive... To avoid malicious using of SWAP_LAYOUTS_DATA_MIFRATE flag when migrate a file, it would better to move the dat copy phase migrate_copy_data() in migrate_block() to the kernel which can borrow the code pcc_data_copy() in PCC, thus we can ensure the content of two files are exactly same. |
| Comment by Qian Yingjin [ 01/Dec/19 ] |
|
" So if the extra SWAP_LAYOUTS_DATA_MIGRATE flag is Problematic, maybe storing the FID (via it we can locate the HSM copy in archive) into HSM attrs is a solution. |
| Comment by Ben Evans (Inactive) [ 02/Dec/19 ] |
|
HSM solves this by calling ll_data_version to get the current data_version of the fid (see ll_ioc_copy_start) There's no reason we couldn't do a similar thing in the kernel layer for migrate. Get the currrent dv, compare it to the version stored in the HSM xattr. If they're the same, perform the layout swap and update the xattr. If they're different, perform the layout swap, and make sure to mark the xattr with the DIRTY flag, if it's not already. |
| Comment by Quentin Bouget [ 12/Dec/19 ] |
It cannot really be that malicious. To be able to do a swap_layout, one has to have write-access to the file. At that point, you can just truncate the file to 0 or overwrite it.
The archived copy does not need to be named using the file's FID in Lustre. That is just what the lustre copytool does. |
| Comment by Etienne Aujames [ 02/Nov/22 ] |
|
Hi, Is there anyone maintaining the https://review.whamcloud.com/36549 or actively developing something for this issue (or for the non-blocking case LU-13048) ? This become a blocking issue for the CEA. Without a workaround, this forces to re-archive every migrated files (not acceptable on filesystem mixing SSD and rotational pools with HSM). |
| Comment by Gerrit Updater [ 12/Dec/22 ] |
|
"Guillaume Courrier <guillaume.courrier@cea.fr>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49377 |
| Comment by Gerrit Updater [ 30/Nov/23 ] |
|
"Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53296 |