[LU-10810] SEEK_HOLE and SEEK_DATA support for lseek Created: 13/Mar/18 Updated: 17/Mar/23 Resolved: 07/Nov/20 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.14.0 |
| Type: | Improvement | Priority: | Minor |
| Reporter: | Lokesh N J | Assignee: | Mikhail Pershin |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | medium | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||||||||||||||||||
| Description |
|
lseek with SEEK_HOLE and SEEK_DATA are really helpful and easy to use tools for any usersapce applications like copy and backup. Currently lustre has min support implementation as per the man page which is not that useful however lustre does support fiemap ioctl which can be used for mapping data in the file. |
| Comments |
| Comment by Andreas Dilger [ 11/Apr/18 ] |
|
I was originally thinking that SEEK_HOLE and SEEK_DATA would be best implemented by calling FIEMAP internally, but the remapping of FIEMAP data to file-offset extents may be quite complex (and getting worse with PFL, composite layouts, FLR, DoM, etc). It probably makes more sense to have an interface which takes the current file offset, maps it to the object offset(s) (via LOV EA), and then calls SEEK_HOLE/SEEK_DATA on the OST (or DoM MDT) for each of the objects that cover the current component, and then LOV maps those object offsets back to file offsets and finds the lowest mapped offset of any of the objects. This approach would be relatively straight forward to implement (LOV can already do file<->object mapping), though some care would be needed when e.g. crossing component extent boundaries. |
| Comment by Andreas Dilger [ 30/Jun/20 ] |
|
I was checking when SEEK_HOLE and SEEK_DATA were added to the kernel (originally kernel commit v3.7-rc3-24-gc8c0df241cc2). This was later fixed by kernel commits v4.12-rc2-3-g7d95eddf313c and v4.14-rc3-4-g545052e9e35a, both of which are already included in RHEL7. On ZFS, this functionality was added to ZPL by commit zfs-0.6.1-56-g802e7b5fe by Dongyang. |
| Comment by John Hammond [ 01/Jul/20 ] |
|
adilger goes on to say: John, for the sparse file handling, IMHO the best (and potentially relatively low-effort) implementation of SEEK_{HOLE,DATA} would be as described in The main drawback of using SEEK_{HOLE,DATA} is that it needs changes on both the client and server, but avoids the (IMHO considerable) overhead of all-zero block detection. However, we may need to do zero-block detection in the copy code anyway for compatibility and ease of fast deployment, since that is what cp --sparse=always does in the end if SEEK_{HOLE,DATA} is not available. It would be possible to have the copytool try SEEK_{HOLE,DATA} first, but if the file is known to be sparse (st_blocks < st_size) but SEEK_HOLE returns >= st_size then --sparse=always would fall back to zero block detection. If SEEK_HOLE returns < st_size then this feature is working. Note that this is not safe for files that are currently being accessed, even if fsync(fd) is called before seek(SEEK_HOLE) because this doesn't exclude other clients from modifying pages in their own cache. That is not an issue with a local filesystem, but is an issue for a distributed filesystem. The client should probably return -EOPNOTSUPP or st_size if the SEEK_HOLE call returns an offset < st_size but the client is not currently holding a PW DLM lock on that extent (i.e. some other client could be concurrently writing into that hole). |
| Comment by Mikhail Pershin [ 17/Aug/20 ] |
|
Current status on ticket. I am starting to combine all small parts to working code at this week, server code will be first, then client part and tests. I expect to push working code in gerrit at the next week. It is possible to split it for patches on server changes (no tests, just code) and client changes plus tests. |
| Comment by Andreas Dilger [ 17/Aug/20 ] |
|
Will you need an OBD_CONNECT flag for this? |
| Comment by Mikhail Pershin [ 17/Aug/20 ] |
|
Andreas, yes I use one already.
#define OBD_CONNECT2_LSEEK 0x40000ULL /* SEEK_HOLE/DATA RPC support */
I didn't check yet if this value is being used by some other patches |
| Comment by Mikhail Pershin [ 17/Aug/20 ] |
|
I choose to use separate OST_SEEK RPC instead of OST_GETATTR because that would make OST_GETATTR sort of ioctl which does one or another things depending on flags, moreover it is not returning ATTR even just separate information. Also this RPC itself looks obsoleted - it is being used now only to get data_version and would require also new MDT handler and new attr flag. Contrary, OST_SEEK is simple and handled by unified TGT handler for both MDT/OFD. |
| Comment by Gerrit Updater [ 21/Aug/20 ] |
|
Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/39706 |
| Comment by Gerrit Updater [ 21/Aug/20 ] |
|
Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/39707 |
| Comment by Gerrit Updater [ 21/Aug/20 ] |
|
Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/39708 |
| Comment by Mikhail Pershin [ 21/Aug/20 ] |
|
Things work with these three patches for normal files. ZFS is not working because needed function dmu_offset_next is not exported. More work is needed for released files support, considering that tools uses SEEK_DATA/SEEK_HOLE prior data copying I suppose we need to restore released file during lseek operation. Also more tests are needed for cases with concurrent access to the file while lseek is being performed. |
| Comment by Andreas Dilger [ 21/Aug/20 ] |
|
Please link to the GitHubissue and pull request for exporting{{dmu_offset_next()}} when you have it. It would also be possible to use symbol lookup in the kernel until the kernel exports it. |
| Comment by Gerrit Updater [ 12/Sep/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39706/ |
| Comment by Gerrit Updater [ 14/Sep/20 ] |
|
Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/39894 |
| Comment by John Hammond [ 02/Oct/20 ] |
|
coreutils cp added SEEK_HOLE/DATA support earlier this year. See https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=a6eaee501f6ec0c152abe88640203a64c390993e |
| Comment by Gerrit Updater [ 31/Oct/20 ] |
|
Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40502 |
| Comment by Gerrit Updater [ 07/Nov/20 ] |
|
Oleg Drokin (green@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40570 |
| Comment by Gerrit Updater [ 07/Nov/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39707/ |
| Comment by Gerrit Updater [ 07/Nov/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39708/ |
| Comment by Gerrit Updater [ 07/Nov/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40502/ |
| Comment by Peter Jones [ 07/Nov/20 ] |
|
Landed for 2.14 |
| Comment by Andreas Dilger [ 08/Dec/20 ] |
|
Mike, I noticed some test runs are reporting no SEEK_HOLE support for test_430a,430b,430c, even though client and server are running on the current tip of master (v2_13_56-149-ge5c8f6670f): It doesn't look like all test runs are being skipped, but I didn't think this feature is specific to ZFS or ldiskfs, so I'm not sure why the subtests are being skipped. |
| Comment by Mikhail Pershin [ 08/Dec/20 ] |
|
Andreas, I've check these reports and in fact only couple of them are ldiskfs, in these cases there were old servers used, with just no lseek support. All ZFS tests are expected to be skipped. In fact, I was testing recently ZFS 8.5 with exported dmu_offset_mext() to check how it behave and have found that there are problems. For now it can report HOLE at offset 1000 (correct) and immediately SEEK_DATA(1000) returns also 1000 (incorrectly) just because there are dirty data and it switches to generic approach (all file is data with hole at the end). I am afraid we cannot rely on ZFS until its option zfs_dmu_offset_next_sync is set, which in turn would cause zfs slowdown when SEEK_DATA is performed on it So basically ZFS server should report LSEEK support only if zfs_dmu_offset_next_sync is set AND dmu_offset_next is exported. And that should be done now I think, before we will start using ZFS 8.5 and find out that lseek is reported as supported but doesn't work as expected |
| Comment by Gerrit Updater [ 17/Mar/23 ] |
|
"Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50325 |
| Comment by Gerrit Updater [ 17/Mar/23 ] |
|
"Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50326 |
| Comment by Gerrit Updater [ 17/Mar/23 ] |
|
"Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50327 |
| Comment by Gerrit Updater [ 17/Mar/23 ] |
|
"Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50328 |