Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17863

Creating files in HSM ARCHIVED state

    XMLWordPrintable

Details

    • New Feature
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      Summary

      For DMF we would like to have the ability to copy files to a Lustre filesystem and set the file state to archived without going through the HSM coordinator and having to impersonate a copytool.

      Background

      DMF has its own concept of HSM file state. A file is regular (REG) if no archive copy exists, dual state (DUL) if a copy exists and the file data is identical, modified (MOD) if a copy exists and the file data has changed, and offline (OFL) if the data on the filesystem has been released.

      On DMAPI-enabled filesystems (EXFS and GPFS) this is implemented by setting a "managed region" on a file, which tells the filesystem that a call to read, write, or truncate needs to go to an user daemon to be approved or denied. For DUL files we trap write and truncate, for OFL we trap all three, and for REG and MOD no managed region needs to be set.

      This makes is easy to copy a file on a DMAPI filesystem, and then set the DMF state. But Lustre is very different.

      Lustre HSM State

      Lustre keeps the HSM state of a file in the trusted.hsm extended attribute. The attribute contains 4 fields: a set of compatibility flags (all zero), a set of HSM state flags, the archive id, and the data version.

      The data version is a semi-random integer that changes whenever file contents are modified. When archiving a file completes, the current data version is stored in the hsm attribute.

      In addition, there is a special file layout flag to mark the layout of a released file. When an attempt is made to read or write a file, this flag is checked, while the flags in the extended attribute are ignored.

      Impersonating a Copytool

      The canonical way to copy a file to Lustre and set HSM state would be the following.

      1. Register the file copy app with a custom archive id.
      2. Create the file with llapi_hsm_import().
      3. Force a restore of the file data.
      4. The Lustre HSM coordinator instructs the file copy app.
      5. Call llapi_hsm_action_begin() to start the copy.
      6. Copy the data from another filesystem,
      7. Call llapi_hsm_action_end() to complete the copy.

      This makes the program flow for Lustre very different from EXFS and GPFS.

      Directly Setting HSM Flags

      An alternative would be to do the file copy and then set the ARCHIVED and possibly RELEASED flags, in addition to the acrhive-id used by DMF. Here the problem comes in: while Lustre does many sanity checks surrounding attempts to set or clear these flags with the LL_IOC_HSM_STATE_SET interface, setting them does not correctly set the HSM filesystem state.

      Setting the ARCHIVED flag on a file does not update the data version in trusted.hsm. A request to release the file does compare the data version against the trusted.hem version, and refuses the request if they differ. As a result, a file can be marked ARCHIVED, yet be impossible to release.

      Similarly setting or clearing the RELEASED flag on a file does not change the file layout, and as such has no bearing on whether attempts to read or write the file will find it in released state.

      RFE: Ability to Set the HSM Data Version of a File

      With the above background, what we would like for DMF is the ability to set data version in the trusted.hsm attribute. For this I see several options.

      1) Allow User Space Writes to trusted.hsm

      Currently user space writes to trusted.hsm are stopped on the server in mdt_reint_setxattr(), and in some Lustre versions, also on the client in ll_setxattr(). No error is returned. Removing these checks would be sufficient, a simple user space application can then read the attribute and write it back with the current data version in place. Since the attribute is in the trusted namespace, only someone with root-level capabilities will be able to do this. Such a read-modify-write cycle is subject to races, and the trusted.hsm extended attribute could be corrupted.

      2) Add an ioctl to Set the HSM Data Version

      Adding a new ioctl specifically to set the the HSM data version to a provided value would work around concerns for the integrity of the trusted,hsm atribute. But the implementation might be complex, as it seems likely that this would require adding a new RPC to the over-the-wire protocol.

      3) Update HSM Data Version when Setting the ARCHIVED Flag

      A simplifying assumption is that DMF user space does not need to know the HSM data value, all it needs a way to make it equal to the current value. Updating the HSM data version when the HSM ARCHIVED flag is set would work. However, this does introduce a new, exiting, side effect to an existing operation, which might be considered a disadvantage.

      The exact sanity checks that should apply need to be discussed:

      • Only sync if the HSM data version is 0?
      • Only sync if DIRTY is not also set?
      • Other?

       4) Update HSM Data Version with a New hsm_user_action

      Instead of using a side effect of setting the ARCHIVED flag, requesting a sync of the HSM data version could be its own hsm_user_action. This looks like it has the fewest drawbacks for kernel-side code, while making the user side slightly more complex.

      5) Other Options

      There may be other plausible approaches that I have not thought of.

      Conclusion

      The primary goal here is to obtain agreement on a preferred approach, prior to us (HPE) investing resources in implementing something. Our preference at this point is (3), then (4). Neither (1) nor (2) seem very attractive prospects.

      Attachments

        Activity

          People

            nangelinas Nikitas Angelinas
            olaf Olaf Weber (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: