[LU-10606] HSM info as part of LOV layout xattr Created: 06/Feb/18  Updated: 09/Jan/24

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Minor
Reporter: Nathan Rutman Assignee: Qian Yingjin
Resolution: Unresolved Votes: 0
Labels: None

Attachments: Microsoft Word Combine_FLR+HSM+PCC-RW+PCC-RO_HLD.docx    
Issue Links:
Blocker
Duplicate
Related
is related to LU-10092 PCC: Lustre Persistent Client Cache Resolved
is related to LU-11376 Special file/dir to represent DAOS Co... Resolved
is related to LU-10499 Readonly Persistent Client Cache support Closed
is related to LU-6081 hsm: add file migrate support Open
is related to LU-13637 Combining RO-PCC/RW-PCC/HSM with FLR HLD Open
is related to LU-12359 Remote shared burst buffer PCC on a s... Open
is related to LU-16837 interop: client skip unknown componen... Resolved
is related to LU-16700 reserve flags and define data structu... Resolved
Rank (Obsolete): 9223372036854775807

 Description   

Motivation

As mentioned in LU-10092 and discussed in a concall, it seems that treating Lustre's HSM information as a first-class layout type can bring an alignment of common code paths:

  • Conceptually, mirroring or migrating between Lustre pools is very similar to mirroring or migrating to an HSM
  • _lfs migrate, _FLR mirroring, and HSM data movement both are done through a userspace copytool. The same layout-aware copytool might be usable for all these cases.
  • LU-6081 provides an option to pipe lfs migrate requests though the MDS HSM coordinator queue. lfs mirror resync could be treated similarly.
  • There may be policies involved with mirroring, migrating, and archiving 
  • Some polices are internal to Lustre, some may be external (e.g. hsm restore is internal, lfs mirror resync delayed is external, _lfs mirror resync _immediate is internal), blurring the lines of where policy is managed.
  • It may be desirable to expand the idea of striping hints to also include 
  • There may be a desire to keep partial file components in the HSM, for limiting restore extents or for PFL layouts.
  • There may be other types of layouts beyond HSM where clients may not be able to access that layout's format/layout type directly (e.g. RAID6 parity), and would request an HSM-style restore to a more common layout type. 

From LU-10092:

This potentially also integrate nicely with composite files and FLR if we enhanced the Lustre layout to include an "HSM layout" component (equivalent to LOV_MAGIC_V1).  The "LOV_MAGIC_HSM" component describes a file in an HSM archive, storing the HSM archive number, "UUID" of the file within the archive, and other parameters (e.g. archive timestamp) needed to identify the file.  The archive timestamp could be useful for storing multiple replicas of the file in HSM and using it for file versioning, along with the FLR mirror_io equivalent to open up a specific component to access an older version of the file.

 

Implementation

Every layout should get a set of common parameters

  • stored extent range, offset
  • layout generation
  • timestamp
  • read priority (8b)
  • write priority (8b)
  • policy type (16b, see below)
  • flags:
    • writable (turn off to make immutable)
      ** readable (maybe never want to read very slow devices)
    • data missing (dead OST, or missing HSM file; unreadble)
      ** delay_sync (delayed resync only, not immediate)

The HSM layout would roughly mirror the contents of today's HSM EA:

  • archive number (32b)
  • archive type (if # might be client ID for PCC, we might want another classifier for different types of archives)
  • archive file key?

Adding an archive file key might be helpful where an HSM backend can't easily reference files by the Lustre FID. Problematically, this might be large - 1024 char string?

Layout-as-policy

In general with FLR we are starting to have "implied policies" in the layout: the presence of an FLR layout implies that the file will be copied to the mirror. It specifies a timeframe as well: delayed or immediate. And the number of mirrors requested. It might be good to embrace this a little bit and think about adding some more explicit policy details to the layouts:

  • Schedule delayed resync on close-after-write
  • Evacuate "primary" mirror after completing resync (for e.g. SSD to HDD tiering)
  • Redundancy goal
  • Restore target striping hint (lov_user_md?)
    Since it is difficult to predict all the use cases here, it may make sense to leave such a policy in a YAML or JSON extensible format.
    I understand that this opens a big can of worms; I think for starters we can just add a small integer "policy number" and leave further definition for the future.


 Comments   
Comment by Nathan Rutman [ 08/May/18 ]

The HSM layout should address current shortcomings as well, so:

  • archive number (32b)
  • archive type (32b)
  • archive flags (32b)
  • archive file key len (16b)
  • archive file key (max 1024B) (size of S3 keys)

Common layout flags (for all layout types):

  • unavailable (dead OST, or missing HSM file; unreadable)
  • immutable (this mirror is write once)
  • purge_delay (data can be removed after replicating elsewhere)
  • purge_immed (remove data from this layout immediately after replicating)

write_priority would be used to determine an implied preferred layout. E.g. if mirror A and mirror B are both at wr_prio 1, then clients write to them both simultaneously. Mirror C at prio 2 is written only if A or B are unavailable. (The number of simultaneous mirrors to write should be determined by the count of prio 1 items.)

Comment by Nathan Rutman [ 08/May/18 ]

@john.hammond any input/opinion on this? I think I'm going to start pushing it at Cray.

Comment by Nathan Rutman [ 14/May/19 ]

LU-11376 adds "foreign" layout type. "HSM" can just be a different type of foreign layout.

Comment by Gerrit Updater [ 15/Jul/20 ]

Yingjin Qian (qian@ddn.com) uploaded a new patch: https://review.whamcloud.com/39387
Subject: LU-10606 hsm: store HSM xattr as a basic layout
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: bc21703ce6a3fe3eb541bf3800cc2efaeed0af76

Comment by Gerrit Updater [ 07/Aug/20 ]

Yingjin Qian (qian@ddn.com) uploaded a new patch: https://review.whamcloud.com/39599
Subject: LU-10606 hsm: convert old HSM xattr into HSM layout
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 0bf3098f389f03620e915a9e765378d83ac74b36

Comment by Nathan Rutman [ 10/May/22 ]

Seems like some work was started here and then abandoned? After the LUG22 HSM presentation today, I'd like to nudge this again with maybe a clearer list of benefits:

  • partial file restore. HSM layout as one element of a composite layout, maybe even using something like SEPFL.
  • keep file head on disk, archive long tail. Some file types have useful info embedded at the front, the rest isn't regularly used (e.g. file icons, or hdf5 info)
  • multiple archives per file. Current code only lists a single archive, no redundancy.
  • mirror to hsm. FLR layout with HSM as second mirror, allows for tracking archive info even when file is restored in Lustre. Allows for immediate punch of primary mirror on low space (if hsm mirror is synced).
Generated at Sat Feb 10 02:36:35 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.