Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10810

SEEK_HOLE and SEEK_DATA support for lseek

Details

    • Improvement
    • Resolution: Fixed
    • Minor
    • Lustre 2.14.0
    • None
    • 9223372036854775807

    Description

      lseek with SEEK_HOLE and SEEK_DATA are really helpful  and easy to use tools for any usersapce applications like copy and backup. Currently lustre has min support implementation as per the man page which is not that useful however lustre does support fiemap ioctl which can be used for mapping data in the file.
      As we already have support for fiemap I guess with some implementation lustre can support SEEK_HOLE and SEEK_DATA flags. I guess having this support will be helpful to deal with sparse files. Any feedback about the implementation will be really helpful.

      Attachments

        Issue Links

          Activity

            [LU-10810] SEEK_HOLE and SEEK_DATA support for lseek

            Please link to the GitHubissue and  pull request for exporting{{dmu_offset_next()}} when you have it. It would also be possible to use symbol lookup in the kernel until the kernel exports it. 

            adilger Andreas Dilger added a comment - Please link to the GitHubissue and  pull request for exporting{{dmu_offset_next()}} when you have it. It would also be possible to use symbol lookup in the kernel until the kernel exports it. 

            Things work with these three patches for normal files. ZFS is not working because needed function dmu_offset_next is not exported.

            More work is needed for released files support, considering that tools uses SEEK_DATA/SEEK_HOLE prior data copying I suppose we need to restore released file during lseek operation.

            Also more tests are needed for cases with concurrent access to the file while lseek is being performed.

            tappro Mikhail Pershin added a comment - Things work with these three patches for normal files. ZFS is not working because needed function dmu_offset_next is not exported. More work is needed for released files support, considering that tools uses SEEK_DATA/SEEK_HOLE prior data copying I suppose we need to restore released file during lseek operation. Also more tests are needed for cases with concurrent access to the file while lseek is being performed.

            Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/39708
            Subject: LU-10810 clio: SEEK_HOLE/SEEK_DATA on client side
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: c02e19f65667971f8320b20d99cd5db7199b7c8d

            gerrit Gerrit Updater added a comment - Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/39708 Subject: LU-10810 clio: SEEK_HOLE/SEEK_DATA on client side Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: c02e19f65667971f8320b20d99cd5db7199b7c8d

            Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/39707
            Subject: LU-10810 ptlrpc: introduce OST_SEEK RPC
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 643bbb697e5439bb720acdb5038067b7bec6b83a

            gerrit Gerrit Updater added a comment - Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/39707 Subject: LU-10810 ptlrpc: introduce OST_SEEK RPC Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 643bbb697e5439bb720acdb5038067b7bec6b83a

            Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/39706
            Subject: LU-10810 osd: implement lseek method in OSD
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 24983c305b0c64f42e57ae7d91cac81424099c0f

            gerrit Gerrit Updater added a comment - Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/39706 Subject: LU-10810 osd: implement lseek method in OSD Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 24983c305b0c64f42e57ae7d91cac81424099c0f
            tappro Mikhail Pershin added a comment - - edited

            I choose to use separate OST_SEEK RPC instead of OST_GETATTR because that would make OST_GETATTR sort of ioctl which does one or another things depending on flags, moreover it is not returning ATTR even just separate information. Also this RPC itself looks obsoleted - it is being used now only to get data_version and would require also new MDT handler and new attr flag. Contrary, OST_SEEK is simple and handled by unified TGT handler for both MDT/OFD.

            tappro Mikhail Pershin added a comment - - edited I choose to use separate OST_SEEK RPC instead of OST_GETATTR because that would make OST_GETATTR sort of ioctl which does one or another things depending on flags, moreover it is not returning ATTR even just separate information. Also this RPC itself looks obsoleted - it is being used now only to get data_version and would require also new MDT handler and new attr flag. Contrary, OST_SEEK is simple and handled by unified TGT handler for both MDT/OFD.
            tappro Mikhail Pershin added a comment - - edited

            Andreas, yes I use one already.

            #define OBD_CONNECT2_LSEEK        0x40000ULL /* SEEK_HOLE/DATA RPC support */
            

            I didn't check yet if this value is being used by some other patches

            tappro Mikhail Pershin added a comment - - edited Andreas, yes I use one already. #define OBD_CONNECT2_LSEEK        0x40000ULL /* SEEK_HOLE/DATA RPC support */ I didn't check yet if this value is being used by some other patches

            Will you need an OBD_CONNECT flag for this?

            adilger Andreas Dilger added a comment - Will you need an OBD_CONNECT flag for this?

            Current status on ticket. I am starting to combine all small parts to working code at this week, server code will be first, then client part and tests. I expect to push working code in gerrit at the next week. It is possible to split it for patches on server changes (no tests, just code) and client changes plus tests.

            tappro Mikhail Pershin added a comment - Current status on ticket. I am starting to combine all small parts to working code at this week, server code will be first, then client part and tests. I expect to push working code in gerrit at the next week. It is possible to split it for patches on server changes (no tests, just code) and client changes plus tests.
            jhammond John Hammond added a comment - - edited

            adilger goes on to say:

            John, for the sparse file handling, IMHO the best (and potentially relatively low-effort) implementation of SEEK_{HOLE,DATA} would be as described in LU-10810. Essentially, it would be "GETATTR with a SEEK_{HOLE,DATA} flag" passing the current file offset and returning the start of the next hole/data. This would require very little extra code, and it leverages the SEEK_{HOLE,DATA} implementation of the underlying filesystem plus the existing LOV file offset remapping code. For SEEK_HOLE it would return the maximum hole offset of all allocated objects overlapping the current file offset or in later components. For SEEK_DATA, it would return the minimum data offset of all allocated objects overlapping the current file offset or in later components.

            The main drawback of using SEEK_{HOLE,DATA} is that it needs changes on both the client and server, but avoids the (IMHO considerable) overhead of all-zero block detection. However, we may need to do zero-block detection in the copy code anyway for compatibility and ease of fast deployment, since that is what cp --sparse=always does in the end if SEEK_{HOLE,DATA} is not available. It would be possible to have the copytool try SEEK_{HOLE,DATA} first, but if the file is known to be sparse (st_blocks < st_size) but SEEK_HOLE returns >= st_size then --sparse=always would fall back to zero block detection. If SEEK_HOLE returns < st_size then this feature is working. Note that this is not safe for files that are currently being accessed, even if fsync(fd) is called before seek(SEEK_HOLE) because this doesn't exclude other clients from modifying pages in their own cache. That is not an issue with a local filesystem, but is an issue for a distributed filesystem. The client should probably return -EOPNOTSUPP or st_size if the SEEK_HOLE call returns an offset < st_size but the client is not currently holding a PW DLM lock on that extent (i.e. some other client could be concurrently writing into that hole).

            jhammond John Hammond added a comment - - edited adilger goes on to say: John, for the sparse file handling, IMHO the best (and potentially relatively low-effort) implementation of SEEK_{HOLE,DATA } would be as described in LU-10810 . Essentially, it would be "GETATTR with a SEEK_{HOLE,DATA } flag" passing the current file offset and returning the start of the next hole/data. This would require very little extra code, and it leverages the SEEK_{HOLE,DATA } implementation of the underlying filesystem plus the existing LOV file offset remapping code. For SEEK_HOLE it would return the maximum hole offset of all allocated objects overlapping the current file offset or in later components. For SEEK_DATA , it would return the minimum data offset of all allocated objects overlapping the current file offset or in later components. The main drawback of using SEEK_{HOLE,DATA } is that it needs changes on both the client and server, but avoids the (IMHO considerable) overhead of all-zero block detection. However, we may need to do zero-block detection in the copy code anyway for compatibility and ease of fast deployment, since that is what cp --sparse=always does in the end if SEEK_{HOLE,DATA } is not available. It would be possible to have the copytool try SEEK_{HOLE,DATA } first, but if the file is known to be sparse ( st_blocks < st_size ) but SEEK_HOLE returns >= st_size then --sparse=always would fall back to zero block detection. If SEEK_HOLE returns < st_size then this feature is working. Note that this is not safe for files that are currently being accessed, even if fsync(fd) is called before seek(SEEK_HOLE) because this doesn't exclude other clients from modifying pages in their own cache. That is not an issue with a local filesystem, but is an issue for a distributed filesystem. The client should probably return -EOPNOTSUPP or st_size if the SEEK_HOLE call returns an offset < st_size but the client is not currently holding a PW DLM lock on that extent (i.e. some other client could be concurrently writing into that hole).

            I was checking when SEEK_HOLE and SEEK_DATA were added to the kernel (originally kernel commit v3.7-rc3-24-gc8c0df241cc2). This was later fixed by kernel commits v4.12-rc2-3-g7d95eddf313c and v4.14-rc3-4-g545052e9e35a, both of which are already included in RHEL7. On ZFS, this functionality was added to ZPL by commit zfs-0.6.1-56-g802e7b5fe by Dongyang.

            adilger Andreas Dilger added a comment - I was checking when SEEK_HOLE and SEEK_DATA were added to the kernel (originally kernel commit v3.7-rc3-24-gc8c0df241cc2). This was later fixed by kernel commits v4.12-rc2-3-g7d95eddf313c and v4.14-rc3-4-g545052e9e35a, both of which are already included in RHEL7. On ZFS, this functionality was added to ZPL by commit zfs-0.6.1-56-g802e7b5fe by Dongyang.

            People

              tappro Mikhail Pershin
              ljaliminche Lokesh N J (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: