[LU-11379] HSM Copytool Performance Improvements Created: 14/Sep/18  Updated: 14/Jul/21  Resolved: 14/Jul/21

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: John Hammond Assignee: WC Triage
Resolution: Fixed Votes: 0
Labels: HSM

Issue Links:
Related
is related to LU-11380 IOC_MDC_GETFILEINFO returns garbage s... Closed
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   
  1. Flatten archive hierarchy to 1 directory. /arc1/0001/0000/0401/0000/0002/0000/0x200000401:0x1:0x0 becomes /arc1/0001/0x200000401:0x1:0x0.
  2. Remove/deprecate shadow tree handling.
  3. Stop storing and loading _lov files. For restore, we are already getting file attributes+striping from the MDT (see ct_md_getattr()). But we only use the stat portion of the lmd.
  4. Improve thread handling.


 Comments   
Comment by Andreas Dilger [ 14/Sep/18 ]

1. Flatten archive hierarchy to 1 directory. /arc1/0001/0000/0401/0000/0002/0000/0x200000401:0x1:0x0 becomes /arc1/0001/0x200000401:0x1:0x0.

Presumably the "0001" is based on OID % 0xffff? Is there a desire to have some temporal locality with objects in the archive, rather than all 65k directories being used continually?

One drawback of using the same directories forever is that they get relatively large and fragmented in the hash space, and are updated totally randomly on disk. For MDS->OST object allocation, I've thought about using something like SEQ>>8/OID>>16 or just stick with SEQ/d(OID % 32) but limit OIDs-per-SEQ to 1M or so, which will put concurrently allocated objects relatively close together, but slowly move into new upper-level directories over time. The premise is that concurrently allocated objects are more likely to also be accessed and deleted together, so we can slowly drop those older directories from RAM, and eventually shrink them down as they become empty. Having a single huge directory with all ages of files means the whole directory needs to live in RAM, or be IOPS bound during modification since each insert/delete/lookup will modify a different leaf block.

3. Stop storing and loading _lov files. For restore, we are already getting file attributes+striping from the MDT (see ct_md_getattr()). But we only use the stat portion of the lmd.

One reason for storing the LOV EA in the archive is for disaster recovery/rehydrate. If we instead make full backups of the MDT(s) and/or subtrees, then we don't need the layouts or other xattrs in the archive. Conversely, storing xattrs (more than just lov) with the files makes them more "self contained" and usable even if the MDT backup is unavailable.

Comment by Andreas Dilger [ 14/Sep/18 ]

PS - you are unlikely to get even OID distribution across [0x0000-0xffff]. It is much more likely to get low-numbered OIDs, since clients always start with a new SEQ after mount and will allocate an OID=1 file, but won't necessarily allocate OID=65535 or OID=131071 ever before restart. That will put more pressure on the low-numbered directories and may cause problems in some archives.

Comment by John Hammond [ 17/Sep/18 ]

> One reason for storing the LOV EA in the archive is for disaster recovery/rehydrate. If we instead make full backups of the MDT(s) and/or subtrees, then we don't need the layouts or other xattrs in the archive. Conversely, storing xattrs (more than just lov) with the files makes them more "self contained" and usable even if the MDT backup is unavailable.

HSM (alone) is not a disaster recovery solution. It's a storage tiering mechanism. It LOV is important then either keep full MDT images or store copies of LOV in the policy tool DB.

Comment by John Hammond [ 14/Jul/21 ]

A partial patch for this was pushed for review under LU-11380. See https://review.whamcloud.com/#/c/33215/.

LU-11380 hsm: streamline copytool restore handling

Partial patch.

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I7435766f6f67ba60ac39bf02bc3232316a830387

Comment by John Hammond [ 14/Jul/21 ]

Archive flattening is in progress under LU-144359.

https://review.whamcloud.com/41312 LU-14359 hsm: support a flatter HSM archive format
https://review.whamcloud.com/41366 LU-14359 hsm: support shadow tree in archive upgrade

Comment by John Hammond [ 14/Jul/21 ]

Partially fixed. Partially abandoned.

Generated at Sat Feb 10 02:43:21 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.