Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1941

ZFS FIEMAP support

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • 3
    • 3
    • 23,099
    • Orion
    • 2188

    Description

      osd-zfs has lack of fiemap support. That was discussed in bugzilla 23099 originally. This is not blocker for DMU milestone, this task is mostly improvement.

      In sanity.sh test_130* it is verifying that FIEMAP (file extent map) is working properly. This allows clients to determine the disk block allocation layout for a particular file.

      In 1.x and 2.x FIEMAP is supported for ldiskfs filesystems,.

      Once the "fiemap" request is passed through to the OSD it should be trivial to call the ldiskfs ->fiemap() method to fill in the data structure and return it to the caller. For ZFS this will need some code (possibly a new DMU interface?) to walk the file's data blocks and return the block pointer(s?) for each block.

      Open questions include:

      • which blockpointer should be returned in case of ditto blocks? It is possible to return multiple overlapping extents (one for each DVA), but it may be confusing to some users
      • while FIEMAP has space for a "device" for each extent, how will we map different ZFS VDEV devices and Lustre OST devices into the single 32-bit device field?
        • We could use 16-bit "major:minor" with OST index being "major" and VDEV being "minor", but I don't think there is a simple index for the VDEVs.
        • We could use the low 16-bit value of the VDEV UUID (assuming it is largely unique) so that users can identify this fairly easily from "zfs" output if needed.
        • We could try and map the VDEV to the underlying Linux block device major/minor, though it is a major layering violation.
      • should/can the extents be returned to the user in some "device" (VDEV) order so that it is more clear if the extents are contiguous on disk or not, or will we get $((filesize * ditto / 128k)) extents returned to the client, possibly millions for large (128GB) files?

      Even for local ZFS filesystem mounts, FIEMAP (via filefrag) output would provide useful insight into the on-disk allocation of files and would be needed to improve the ZFS allocation policies.

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              tappro Mikhail Pershin
              Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated: