Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13612

efficient DoM->OST component migration

Details

    • Improvement
    • Resolution: Duplicate
    • Minor
    • None
    • None
    • 9223372036854775807

    Description

      For cases when a large file is created with a DoM component in a PFL file, this will result in the maximum amount of space consumed on the MDT for the PFL layout, yet provide little or no benefit of storing a small part of the file on the MDT (no RPC savings, no SOM, or even extra RPCs because the bulk data is transferred separately from the MDT and OST for the first stripe, if < 1MB in size).

      In such cases, it would be desirable to migrate the DoM component to the hole at the start of the first OST to save space on the MDT, especially for large files. The drawback is that currently DoM migration requires having a client copy all of the file data to another file before swapping the layouts. This may mean GB or TB of data movement to remove a 64KB DoM component.

      It would be more efficient to write only the data from the DoM component directly from the MDS into the start of the first OST object on the OSS. The MDS can safely exclude other writers while holding the MDS_INODELOCK_LAYOUT|MDS_INODELOCK_DOM locks for the inode, and use OUT_WRITE to send the data to the OST object. After the write has committed, then the MDS can rewrite the layout to remove the DoM component safely and drop the DLM locks.

      There is no danger if the MDS crashes before the layout is changed, because the "hidden" data on the OST cannot be accessed by the client with the old layout in place.

      Attachments

        Issue Links

          Activity

            [LU-13612] efficient DoM->OST component migration

            I don't think this is done at all, and should be reopened as a useful feature for reclaiming DoM or flash space.

            LU-11421 simply allows a DoM file to be used in a mirror. It does not do what this ticket asks, which is to copy an earlier component's data into the hole at the beginning of the next component (should be generally, not just for DoM.)

            What would need to happen:

            1. Copy the DoM data into the hole at the start of the next component (call the next component "Bob"). We can't actually do this from client code until we have a layout that allows access to the hole. I.e. there's no way to write into the earlier offsets from client side afaict.
            2. Delete the DoM component. lfs setstripe --component-del -I 1 testfile would be nice, but right now that only works on the final component.

            So to address #1 it seems we need to create a "fake" mirror using Bob's component description, but adjusting the starting extent to 0. Then we would need to mirror sync the DoM's extent. Then we can delete the original mirror - but making sure not to delete Bob's objects, which are now also part of mirror 2.

            Alternatively, we give lfs some special permission to write directly into any extent in any component, and avoid the mirror dance.

            nrutman Nathan Rutman added a comment - I don't think this is done at all, and should be reopened as a useful feature for reclaiming DoM or flash space. LU-11421 simply allows a DoM file to be used in a mirror. It does not do what this ticket asks, which is to copy an earlier component's data into the hole at the beginning of the next component (should be generally, not just for DoM.) What would need to happen: Copy the DoM data into the hole at the start of the next component (call the next component "Bob"). We can't actually do this from client code until we have a layout that allows access to the hole. I.e. there's no way to write into the earlier offsets from client side afaict. Delete the DoM component. lfs setstripe --component-del -I 1 testfile would be nice, but right now that only works on the final component. So to address #1 it seems we need to create a "fake" mirror using Bob's component description, but adjusting the starting extent to 0. Then we would need to mirror sync the DoM's extent. Then we can delete the original mirror - but making sure not to delete Bob's objects, which are now also part of mirror 2. Alternatively, we give lfs some special permission to write directly into any extent in any component, and avoid the mirror dance.

            Andreas, I think it is done

            tappro Mikhail Pershin added a comment - Andreas, I think it is done

            Mike, is there anything left to do here, or is this handled properly by the DoM+FLR changes in LU-11421?

            adilger Andreas Dilger added a comment - Mike, is there anything left to do here, or is this handled properly by the DoM+FLR changes in LU-11421 ?

            People

              tappro Mikhail Pershin
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: