[LU-10810] SEEK_HOLE and SEEK_DATA support for lseek Created: 13/Mar/18  Updated: 17/Mar/23  Resolved: 07/Nov/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.14.0

Type: Improvement Priority: Minor
Reporter: Lokesh N J Assignee: Mikhail Pershin
Resolution: Fixed Votes: 0
Labels: medium

Issue Links:
Duplicate
duplicates LU-13397 lfs migrate/mirror extend/resync does... Resolved
Related
is related to LU-1941 ZFS FIEMAP support Open
is related to LU-3606 Implement fallocate() support for ldi... Resolved
is related to LU-13155 "lfs migrate" and "lfs mirror" should... Open
is related to LU-14143 SEEK_HOLE returns -ENXIO if file ends... Resolved
is related to LU-11621 Add copy_file_range() API and use it ... Open
is related to LU-14217 better SEEK_HOLE/DATA support in OSD-ZFS Resolved
Rank (Obsolete): 9223372036854775807

 Description   

lseek with SEEK_HOLE and SEEK_DATA are really helpful  and easy to use tools for any usersapce applications like copy and backup. Currently lustre has min support implementation as per the man page which is not that useful however lustre does support fiemap ioctl which can be used for mapping data in the file.
As we already have support for fiemap I guess with some implementation lustre can support SEEK_HOLE and SEEK_DATA flags. I guess having this support will be helpful to deal with sparse files. Any feedback about the implementation will be really helpful.



 Comments   
Comment by Andreas Dilger [ 11/Apr/18 ]

I was originally thinking that SEEK_HOLE and SEEK_DATA would be best implemented by calling FIEMAP internally, but the remapping of FIEMAP data to file-offset extents may be quite complex (and getting worse with PFL, composite layouts, FLR, DoM, etc). It probably makes more sense to have an interface which takes the current file offset, maps it to the object offset(s) (via LOV EA), and then calls SEEK_HOLE/SEEK_DATA on the OST (or DoM MDT) for each of the objects that cover the current component, and then LOV maps those object offsets back to file offsets and finds the lowest mapped offset of any of the objects.

This approach would be relatively straight forward to implement (LOV can already do file<->object mapping), though some care would be needed when e.g. crossing component extent boundaries.

Comment by Andreas Dilger [ 30/Jun/20 ]

I was checking when SEEK_HOLE and SEEK_DATA were added to the kernel (originally kernel commit v3.7-rc3-24-gc8c0df241cc2). This was later fixed by kernel commits v4.12-rc2-3-g7d95eddf313c and v4.14-rc3-4-g545052e9e35a, both of which are already included in RHEL7. On ZFS, this functionality was added to ZPL by commit zfs-0.6.1-56-g802e7b5fe by Dongyang.

Comment by John Hammond [ 01/Jul/20 ]

adilger goes on to say:

John, for the sparse file handling, IMHO the best (and potentially relatively low-effort) implementation of SEEK_{HOLE,DATA} would be as described in LU-10810. Essentially, it would be "GETATTR with a SEEK_{HOLE,DATA} flag" passing the current file offset and returning the start of the next hole/data. This would require very little extra code, and it leverages the SEEK_{HOLE,DATA} implementation of the underlying filesystem plus the existing LOV file offset remapping code. For SEEK_HOLE it would return the maximum hole offset of all allocated objects overlapping the current file offset or in later components. For SEEK_DATA, it would return the minimum data offset of all allocated objects overlapping the current file offset or in later components.

The main drawback of using SEEK_{HOLE,DATA} is that it needs changes on both the client and server, but avoids the (IMHO considerable) overhead of all-zero block detection. However, we may need to do zero-block detection in the copy code anyway for compatibility and ease of fast deployment, since that is what cp --sparse=always does in the end if SEEK_{HOLE,DATA} is not available. It would be possible to have the copytool try SEEK_{HOLE,DATA} first, but if the file is known to be sparse (st_blocks < st_size) but SEEK_HOLE returns >= st_size then --sparse=always would fall back to zero block detection. If SEEK_HOLE returns < st_size then this feature is working. Note that this is not safe for files that are currently being accessed, even if fsync(fd) is called before seek(SEEK_HOLE) because this doesn't exclude other clients from modifying pages in their own cache. That is not an issue with a local filesystem, but is an issue for a distributed filesystem. The client should probably return -EOPNOTSUPP or st_size if the SEEK_HOLE call returns an offset < st_size but the client is not currently holding a PW DLM lock on that extent (i.e. some other client could be concurrently writing into that hole).

Comment by Mikhail Pershin [ 17/Aug/20 ]

Current status on ticket. I am starting to combine all small parts to working code at this week, server code will be first, then client part and tests. I expect to push working code in gerrit at the next week. It is possible to split it for patches on server changes (no tests, just code) and client changes plus tests.

Comment by Andreas Dilger [ 17/Aug/20 ]

Will you need an OBD_CONNECT flag for this?

Comment by Mikhail Pershin [ 17/Aug/20 ]

Andreas, yes I use one already.

#define OBD_CONNECT2_LSEEK        0x40000ULL /* SEEK_HOLE/DATA RPC support */

I didn't check yet if this value is being used by some other patches

Comment by Mikhail Pershin [ 17/Aug/20 ]

I choose to use separate OST_SEEK RPC instead of OST_GETATTR because that would make OST_GETATTR sort of ioctl which does one or another things depending on flags, moreover it is not returning ATTR even just separate information. Also this RPC itself looks obsoleted - it is being used now only to get data_version and would require also new MDT handler and new attr flag. Contrary, OST_SEEK is simple and handled by unified TGT handler for both MDT/OFD.

Comment by Gerrit Updater [ 21/Aug/20 ]

Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/39706
Subject: LU-10810 osd: implement lseek method in OSD
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 24983c305b0c64f42e57ae7d91cac81424099c0f

Comment by Gerrit Updater [ 21/Aug/20 ]

Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/39707
Subject: LU-10810 ptlrpc: introduce OST_SEEK RPC
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 643bbb697e5439bb720acdb5038067b7bec6b83a

Comment by Gerrit Updater [ 21/Aug/20 ]

Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/39708
Subject: LU-10810 clio: SEEK_HOLE/SEEK_DATA on client side
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: c02e19f65667971f8320b20d99cd5db7199b7c8d

Comment by Mikhail Pershin [ 21/Aug/20 ]

Things work with these three patches for normal files. ZFS is not working because needed function dmu_offset_next is not exported.

More work is needed for released files support, considering that tools uses SEEK_DATA/SEEK_HOLE prior data copying I suppose we need to restore released file during lseek operation.

Also more tests are needed for cases with concurrent access to the file while lseek is being performed.

Comment by Andreas Dilger [ 21/Aug/20 ]

Please link to the GitHubissue and  pull request for exporting{{dmu_offset_next()}} when you have it. It would also be possible to use symbol lookup in the kernel until the kernel exports it. 

Comment by Gerrit Updater [ 12/Sep/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39706/
Subject: LU-10810 osd: implement lseek method in OSD
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 947af94817f8eeb5e6108b0b3cde65419b13c8d3

Comment by Gerrit Updater [ 14/Sep/20 ]

Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/39894
Subject: LU-10810 test: extended debug for lseek test
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: f397a37df3b2f8f636686edc37b1720f58aa088a

Comment by John Hammond [ 02/Oct/20 ]

coreutils cp added SEEK_HOLE/DATA support earlier this year. See https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=a6eaee501f6ec0c152abe88640203a64c390993e

Comment by Gerrit Updater [ 31/Oct/20 ]

Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40502
Subject: LU-10810 test: check seek support in tools
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 9dc5e733c58bcdd12a987a4b06eb2f4845597dfb

Comment by Gerrit Updater [ 07/Nov/20 ]

Oleg Drokin (green@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40570
Subject: LU-10810 tests: Fix calls to lseek_test_425
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 14cc826f465679531e352fd3852b115d369f6773

Comment by Gerrit Updater [ 07/Nov/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39707/
Subject: LU-10810 ptlrpc: introduce OST_SEEK RPC
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 6d5fe29066af5f8e40055fd89b285853c363e947

Comment by Gerrit Updater [ 07/Nov/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39708/
Subject: LU-10810 clio: SEEK_HOLE/SEEK_DATA on client side
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: cda353e6efae5013a26aedbe49d8aa6fb8fe456e

Comment by Gerrit Updater [ 07/Nov/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40502/
Subject: LU-10810 test: test lseek support in tools
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: d633172a38519aba2585c2a1fdcdd821cb19010c

Comment by Peter Jones [ 07/Nov/20 ]

Landed for 2.14

Comment by Andreas Dilger [ 08/Dec/20 ]

Mike, I noticed some test runs are reporting no SEEK_HOLE support for test_430a,430b,430c, even though client and server are running on the current tip of master (v2_13_56-149-ge5c8f6670f):
https://testing.whamcloud.com/search?horizon=518400&status%5B%5D=SKIP&test_set_script_id=f9516376-32bc-11e0-aaee-52540025f9ae&sub_test_script_id=4154d1b3-ab47-47e4-9ecd-ea85afa35353&source=sub_tests#redirect

It doesn't look like all test runs are being skipped, but I didn't think this feature is specific to ZFS or ldiskfs, so I'm not sure why the subtests are being skipped.

Comment by Mikhail Pershin [ 08/Dec/20 ]

Andreas, I've check these reports and in fact only couple of them are ldiskfs, in these cases there were old servers used, with just no lseek support. All ZFS tests are expected to be skipped. 

In fact, I was testing recently ZFS 8.5 with exported dmu_offset_mext() to check how it behave and have found that there are problems.
ZFS SEEK_HOLE/DATA support is unreliable because it doesn't work if there are dirty data, that cause all test to fail because we don't do full data sync prior each lseek. I was trying to sync data in tests but with no big success. Moreover I don't think test failures are problems, after all we could sync everything just to make test pass but that would not mean ZFS lseek is reliable until ZFS itself will handle dirty data.

For now it can report HOLE at offset 1000 (correct) and immediately SEEK_DATA(1000) returns also 1000 (incorrectly) just because there are dirty data and it switches to generic approach (all file is data with hole at the end). I am afraid we cannot rely on ZFS until its option zfs_dmu_offset_next_sync is set, which in turn would cause zfs slowdown when SEEK_DATA is performed on it

So basically ZFS server should report LSEEK support only if zfs_dmu_offset_next_sync is set AND dmu_offset_next is exported. And that should be done now I think, before we will start using ZFS 8.5 and find out that lseek is reported as supported but doesn't work as expected

Comment by Gerrit Updater [ 17/Mar/23 ]

"Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50325
Subject: LU-10810 osd: implement lseek method in OSD
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: b21bc7e262aff42aa3fd90966a791de4ad515dd0

Comment by Gerrit Updater [ 17/Mar/23 ]

"Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50326
Subject: LU-10810 ptlrpc: introduce OST_SEEK RPC
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: ff27b1c2da37c1d910da48738e97f7193f4453ef

Comment by Gerrit Updater [ 17/Mar/23 ]

"Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50327
Subject: LU-10810 clio: SEEK_HOLE/SEEK_DATA on client side
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: a196f0c1aaac7d1e58aa85847256bc5f2e24f1c1

Comment by Gerrit Updater [ 17/Mar/23 ]

"Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50328
Subject: LU-10810 test: test lseek support in tools
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 59154a1c86e186c98359b042f96df389f0b91ebe

Generated at Sat Feb 10 02:38:23 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.