Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10606

HSM info as part of LOV layout xattr

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None
    • Rank (Obsolete):
      9223372036854775807

      Description

      Motivation

      As mentioned in LU-10092 and discussed in a concall, it seems that treating Lustre's HSM information as a first-class layout type can bring an alignment of common code paths:

      • Conceptually, mirroring or migrating between Lustre pools is very similar to mirroring or migrating to an HSM
      • _lfs migrate, _FLR mirroring, and HSM data movement both are done through a userspace copytool. The same layout-aware copytool might be usable for all these cases.
      • LU-6081 provides an option to pipe lfs migrate requests though the MDS HSM coordinator queue. lfs mirror resync could be treated similarly.
      • There may be policies involved with mirroring, migrating, and archiving 
      • Some polices are internal to Lustre, some may be external (e.g. hsm restore is internal, lfs mirror resync delayed is external, _lfs mirror resync _immediate is internal), blurring the lines of where policy is managed.
      • It may be desirable to expand the idea of striping hints to also include 
      • There may be a desire to keep partial file components in the HSM, for limiting restore extents or for PFL layouts.
      • There may be other types of layouts beyond HSM where clients may not be able to access that layout's format/layout type directly (e.g. RAID6 parity), and would request an HSM-style restore to a more common layout type. 

      From LU-10092:

      This potentially also integrate nicely with composite files and FLR if we enhanced the Lustre layout to include an "HSM layout" component (equivalent to LOV_MAGIC_V1).  The "LOV_MAGIC_HSM" component describes a file in an HSM archive, storing the HSM archive number, "UUID" of the file within the archive, and other parameters (e.g. archive timestamp) needed to identify the file.  The archive timestamp could be useful for storing multiple replicas of the file in HSM and using it for file versioning, along with the FLR mirror_io equivalent to open up a specific component to access an older version of the file.

       

      Implementation

      Every layout should get a set of common parameters

      • stored extent range, offset
      • layout generation
      • timestamp
      • read priority (8b)
      • write priority (8b)
      • policy type (16b, see below)
      • flags:
        • writable (turn off to make immutable)
          ** readable (maybe never want to read very slow devices)
        • data missing (dead OST, or missing HSM file; unreadble)
          ** delay_sync (delayed resync only, not immediate)

      The HSM layout would roughly mirror the contents of today's HSM EA:

      • archive number (32b)
      • archive type (if # might be client ID for PCC, we might want another classifier for different types of archives)
      • archive file key?

      Adding an archive file key might be helpful where an HSM backend can't easily reference files by the Lustre FID. Problematically, this might be large - 1024 char string?

      Layout-as-policy

      In general with FLR we are starting to have "implied policies" in the layout: the presence of an FLR layout implies that the file will be copied to the mirror. It specifies a timeframe as well: delayed or immediate. And the number of mirrors requested. It might be good to embrace this a little bit and think about adding some more explicit policy details to the layouts:

      • Schedule delayed resync on close-after-write
      • Evacuate "primary" mirror after completing resync (for e.g. SSD to HDD tiering)
      • Redundancy goal
      • Restore target striping hint (lov_user_md?)
        Since it is difficult to predict all the use cases here, it may make sense to leave such a policy in a YAML or JSON extensible format.
        I understand that this opens a big can of worms; I think for starters we can just add a small integer "policy number" and leave further definition for the future.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                wc-triage WC Triage
                Reporter:
                nrutman Nathan Rutman
              • Votes:
                0 Vote for this issue
                Watchers:
                10 Start watching this issue

                Dates

                • Created:
                  Updated: