Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10810

SEEK_HOLE and SEEK_DATA support for lseek

Details

    • Improvement
    • Resolution: Fixed
    • Minor
    • Lustre 2.14.0
    • None
    • 9223372036854775807

    Description

      lseek with SEEK_HOLE and SEEK_DATA are really helpful  and easy to use tools for any usersapce applications like copy and backup. Currently lustre has min support implementation as per the man page which is not that useful however lustre does support fiemap ioctl which can be used for mapping data in the file.
      As we already have support for fiemap I guess with some implementation lustre can support SEEK_HOLE and SEEK_DATA flags. I guess having this support will be helpful to deal with sparse files. Any feedback about the implementation will be really helpful.

      Attachments

        Issue Links

          Activity

            [LU-10810] SEEK_HOLE and SEEK_DATA support for lseek
            tappro Mikhail Pershin added a comment - - edited

            Andreas, yes I use one already.

            #define OBD_CONNECT2_LSEEK        0x40000ULL /* SEEK_HOLE/DATA RPC support */
            

            I didn't check yet if this value is being used by some other patches

            tappro Mikhail Pershin added a comment - - edited Andreas, yes I use one already. #define OBD_CONNECT2_LSEEK        0x40000ULL /* SEEK_HOLE/DATA RPC support */ I didn't check yet if this value is being used by some other patches

            Will you need an OBD_CONNECT flag for this?

            adilger Andreas Dilger added a comment - Will you need an OBD_CONNECT flag for this?

            Current status on ticket. I am starting to combine all small parts to working code at this week, server code will be first, then client part and tests. I expect to push working code in gerrit at the next week. It is possible to split it for patches on server changes (no tests, just code) and client changes plus tests.

            tappro Mikhail Pershin added a comment - Current status on ticket. I am starting to combine all small parts to working code at this week, server code will be first, then client part and tests. I expect to push working code in gerrit at the next week. It is possible to split it for patches on server changes (no tests, just code) and client changes plus tests.
            jhammond John Hammond added a comment - - edited

            adilger goes on to say:

            John, for the sparse file handling, IMHO the best (and potentially relatively low-effort) implementation of SEEK_{HOLE,DATA} would be as described in LU-10810. Essentially, it would be "GETATTR with a SEEK_{HOLE,DATA} flag" passing the current file offset and returning the start of the next hole/data. This would require very little extra code, and it leverages the SEEK_{HOLE,DATA} implementation of the underlying filesystem plus the existing LOV file offset remapping code. For SEEK_HOLE it would return the maximum hole offset of all allocated objects overlapping the current file offset or in later components. For SEEK_DATA, it would return the minimum data offset of all allocated objects overlapping the current file offset or in later components.

            The main drawback of using SEEK_{HOLE,DATA} is that it needs changes on both the client and server, but avoids the (IMHO considerable) overhead of all-zero block detection. However, we may need to do zero-block detection in the copy code anyway for compatibility and ease of fast deployment, since that is what cp --sparse=always does in the end if SEEK_{HOLE,DATA} is not available. It would be possible to have the copytool try SEEK_{HOLE,DATA} first, but if the file is known to be sparse (st_blocks < st_size) but SEEK_HOLE returns >= st_size then --sparse=always would fall back to zero block detection. If SEEK_HOLE returns < st_size then this feature is working. Note that this is not safe for files that are currently being accessed, even if fsync(fd) is called before seek(SEEK_HOLE) because this doesn't exclude other clients from modifying pages in their own cache. That is not an issue with a local filesystem, but is an issue for a distributed filesystem. The client should probably return -EOPNOTSUPP or st_size if the SEEK_HOLE call returns an offset < st_size but the client is not currently holding a PW DLM lock on that extent (i.e. some other client could be concurrently writing into that hole).

            jhammond John Hammond added a comment - - edited adilger goes on to say: John, for the sparse file handling, IMHO the best (and potentially relatively low-effort) implementation of SEEK_{HOLE,DATA } would be as described in LU-10810 . Essentially, it would be "GETATTR with a SEEK_{HOLE,DATA } flag" passing the current file offset and returning the start of the next hole/data. This would require very little extra code, and it leverages the SEEK_{HOLE,DATA } implementation of the underlying filesystem plus the existing LOV file offset remapping code. For SEEK_HOLE it would return the maximum hole offset of all allocated objects overlapping the current file offset or in later components. For SEEK_DATA , it would return the minimum data offset of all allocated objects overlapping the current file offset or in later components. The main drawback of using SEEK_{HOLE,DATA } is that it needs changes on both the client and server, but avoids the (IMHO considerable) overhead of all-zero block detection. However, we may need to do zero-block detection in the copy code anyway for compatibility and ease of fast deployment, since that is what cp --sparse=always does in the end if SEEK_{HOLE,DATA } is not available. It would be possible to have the copytool try SEEK_{HOLE,DATA } first, but if the file is known to be sparse ( st_blocks < st_size ) but SEEK_HOLE returns >= st_size then --sparse=always would fall back to zero block detection. If SEEK_HOLE returns < st_size then this feature is working. Note that this is not safe for files that are currently being accessed, even if fsync(fd) is called before seek(SEEK_HOLE) because this doesn't exclude other clients from modifying pages in their own cache. That is not an issue with a local filesystem, but is an issue for a distributed filesystem. The client should probably return -EOPNOTSUPP or st_size if the SEEK_HOLE call returns an offset < st_size but the client is not currently holding a PW DLM lock on that extent (i.e. some other client could be concurrently writing into that hole).

            I was checking when SEEK_HOLE and SEEK_DATA were added to the kernel (originally kernel commit v3.7-rc3-24-gc8c0df241cc2). This was later fixed by kernel commits v4.12-rc2-3-g7d95eddf313c and v4.14-rc3-4-g545052e9e35a, both of which are already included in RHEL7. On ZFS, this functionality was added to ZPL by commit zfs-0.6.1-56-g802e7b5fe by Dongyang.

            adilger Andreas Dilger added a comment - I was checking when SEEK_HOLE and SEEK_DATA were added to the kernel (originally kernel commit v3.7-rc3-24-gc8c0df241cc2). This was later fixed by kernel commits v4.12-rc2-3-g7d95eddf313c and v4.14-rc3-4-g545052e9e35a, both of which are already included in RHEL7. On ZFS, this functionality was added to ZPL by commit zfs-0.6.1-56-g802e7b5fe by Dongyang.
            adilger Andreas Dilger added a comment - - edited

            I was originally thinking that SEEK_HOLE and SEEK_DATA would be best implemented by calling FIEMAP internally, but the remapping of FIEMAP data to file-offset extents may be quite complex (and getting worse with PFL, composite layouts, FLR, DoM, etc). It probably makes more sense to have an interface which takes the current file offset, maps it to the object offset(s) (via LOV EA), and then calls SEEK_HOLE/SEEK_DATA on the OST (or DoM MDT) for each of the objects that cover the current component, and then LOV maps those object offsets back to file offsets and finds the lowest mapped offset of any of the objects.

            This approach would be relatively straight forward to implement (LOV can already do file<->object mapping), though some care would be needed when e.g. crossing component extent boundaries.

            adilger Andreas Dilger added a comment - - edited I was originally thinking that SEEK_HOLE and SEEK_DATA would be best implemented by calling FIEMAP internally, but the remapping of FIEMAP data to file-offset extents may be quite complex (and getting worse with PFL, composite layouts, FLR, DoM, etc). It probably makes more sense to have an interface which takes the current file offset, maps it to the object offset(s) (via LOV EA), and then calls SEEK_HOLE/SEEK_DATA on the OST (or DoM MDT) for each of the objects that cover the current component, and then LOV maps those object offsets back to file offsets and finds the lowest mapped offset of any of the objects. This approach would be relatively straight forward to implement (LOV can already do file<->object mapping), though some care would be needed when e.g. crossing component extent boundaries.

            People

              tappro Mikhail Pershin
              ljaliminche Lokesh N J (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: