[LU-13397] lfs migrate/mirror extend/resync does not preserve sparse file Created: 28/Mar/20  Updated: 17/Mar/23  Resolved: 26/Jan/22

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.15.0

Type: Bug Priority: Major
Reporter: Andreas Dilger Assignee: Mikhail Pershin
Resolution: Fixed Votes: 0
Labels: HPv3, HPv4p1

Issue Links:
Duplicate
is duplicated by LU-10810 SEEK_HOLE and SEEK_DATA support for l... Resolved
Related
is related to LU-14176 lfs mirror resync may truncate file t... Closed
is related to LU-14160 Implement fallocate FALLOCATE_FL_PUNC... Resolved
is related to LU-3833 HSM POSIX copytool sparse file handling Open
Severity: 3
Rank (Obsolete): 9223372036854775807
Epic Link: Hot Pools - v4 phase 1

 Description   

While testing "lfs migrate", "lfs mirror extend", and "lfs mirror resync", I was doing high-offset writes to initialize later PFL components of the file layout (e.g. at 1GB offset and 32GB offset). When trying to mirror/resync those files to another mirror copy, it resulted in the commands failing due to ENOSPC because there was not enough free space in the test filesystem to write out the data.

These commands should be updated to share a common "data copy" routine (if they don't already) to reduce code duplication when fixing these issues. Then, the code needs to handle sparse input files properly, first by checking for sparse files (e.g. blocks << size) to enable checking the source file, and not copying holes in the file. There should probably be options added for each command like "--sparse=<auto,no,yes>" (default = auto) to force a specific behavior.

Unfortunately, there is no optimal way to handle reading of sparse files in Lustre today. In all cases, it makes little sense to be doing these operations on in-use files, so there are already checks if the file is modified during migrate/mirror/resync.

  • For 1-stripe files, the ioctl(FIEMAP) will return a current map of data for the file (it flushes data if FIEMAP_FLAG_SYNC is used, but doesn't prevent further modification). Multi-stripe and PFL files return multiple maps in per-object offset order, and that is not useful if the files are using different layouts (likely a common case). Also, ZFS does not yet support FIEMAP despite some efforts in that direction.
  • Using SEEK_HOLE and SEEK_DATA would be the preferred solution, but this needs a Lustre-level update to pass these through from the client to the OST (and MDT for DoM). This is described in LU-10801, and may be able to leverage some infrastructure from the patch https://review.whamcloud.com/9275 "LU-3606 fallocate: Implement fallocate preallocate operation". While SEEK_HOLE and SEEK_DATA "work" for Lustre by kernel emulation, they just assume that every block is "data" and the first hole is the end of the file.
  • The simplest (though least efficient) option would be to do zero-block detection during the copy phase. This has quite high CPU and IO overhead, because it requires reading the whole file and checking every byte. It would only be done if the file appears to be very sparse, and the layout is complex (i.e. not single stripe where FIEMAP is useful).


 Comments   
Comment by Andreas Dilger [ 28/Mar/20 ]

Another option here (that could also be very useful for other reasons) might be to allow specifying "--copy-cmd" for these tools to offload the handling of the data movement from "lfs". The "lfs migrate/mirror" command would handle creating the file layout and opening file handles for the source and destination, and then execute "copy-cmd %s %s", e.g. "cp --sparse=auto $src $tgt" or "dcp $src $tgt". It would be possible for "lfs migrate" and "lfs mirror extend" to potentially use a copy command that only worked on filenames instead of file handles, and then use layout swap or layout merge with a temporary victim file (the victim file could be volatile and accessed via $MOUNT/.lustre/fid/$FID).

This is not necessarily possible for "lfs mirror resync" which needs to use O_DIRECT file handles to avoid cache issues on the file. It might be possible to pass the open handles to tools via "/dev/stdin" and "/dev/stdout" as the pathnames, but that isn't clear it would work for all cases (e.g. tool running on another node), but it wouldn't have to support every combination of uses. Alternately, a pathname like "$MOUNT/.lustre/fid/[fid_seq:fid_oid:0].comp_id" could be used to open a specific component of the file if the VFS treated the different open-by-FID paths as different inodes? Just an idea.

Comment by Gerrit Updater [ 03/Nov/20 ]

Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40530
Subject: LU-13397 hsm: lhsmtool to handle sparse files
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: ae180a1080dc1cb3990d8f53caee95e11a160248

Comment by Gerrit Updater [ 26/Nov/20 ]

Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40772
Subject: LU-13397 lfs: mirror extend/copy keeps sparseness
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 5b38421a548116edb5e45d3cdbd0fd6d2f09e314

Comment by Gerrit Updater [ 26/Nov/20 ]

Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40773
Subject: LU-13397 lfs: mirror resync to keep sparseness
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: e916affaabd65e55222431eebfaa92aa6d54989b

Comment by Li Xi [ 03/Mar/21 ]

"LU-14174 lfs: llapi_mirror_find() return code check" is also in https://review.whamcloud.com/#/c/40773/9

Comment by Gerrit Updater [ 30/Mar/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40772/
Subject: LU-13397 lfs: mirror extend/copy keeps sparseness
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 0561c144cc1bb623e05d08b5055009e8d86047f4

Comment by Gerrit Updater [ 06/Apr/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40773/
Subject: LU-13397 lfs: mirror resync to keep sparseness
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 4126fbb30c125050ea2e1fdf3d446201b826ce29

Comment by Mikhail Pershin [ 17/Aug/21 ]

I've found the problem with mirror resync and sparse file, it is not working as intended, so I am keeping ticket opened.

The problem is in fallocate punch operation during resync. If it is done prior write operation then write itself fails due to changed layout version.

Comment by Gerrit Updater [ 22/Aug/21 ]

"Mike Pershin <mpershin@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/44721
Subject: LU-13397 llite: support fallocate() on selected mirror
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 84ab0da804fda75262dd0022528d0043cb3e4558

Comment by Gerrit Updater [ 22/Sep/21 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/44721/
Subject: LU-13397 llite: support fallocate() on selected mirror
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 89736d502cc99f095237dde7520fc4ca86191882

Comment by Gerrit Updater [ 17/Mar/23 ]

"Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50329
Subject: LU-13397 lfs: mirror extend/copy keeps sparseness
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: d36e3f1a5599eb557e07550651d855c19d18ca36

Generated at Sat Feb 10 03:00:55 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.