Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11379

HSM Copytool Performance Improvements

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      1. Flatten archive hierarchy to 1 directory. /arc1/0001/0000/0401/0000/0002/0000/0x200000401:0x1:0x0 becomes /arc1/0001/0x200000401:0x1:0x0.
      2. Remove/deprecate shadow tree handling.
      3. Stop storing and loading _lov files. For restore, we are already getting file attributes+striping from the MDT (see ct_md_getattr()). But we only use the stat portion of the lmd.
      4. Improve thread handling.

      Attachments

        Issue Links

          Activity

            [LU-11379] HSM Copytool Performance Improvements

            Partially fixed. Partially abandoned.

            jhammond John Hammond added a comment - Partially fixed. Partially abandoned.

            Archive flattening is in progress under LU-144359.

            https://review.whamcloud.com/41312 LU-14359 hsm: support a flatter HSM archive format
            https://review.whamcloud.com/41366 LU-14359 hsm: support shadow tree in archive upgrade

            jhammond John Hammond added a comment - Archive flattening is in progress under LU-144359. https://review.whamcloud.com/41312 LU-14359 hsm: support a flatter HSM archive format https://review.whamcloud.com/41366 LU-14359 hsm: support shadow tree in archive upgrade

            A partial patch for this was pushed for review under LU-11380. See https://review.whamcloud.com/#/c/33215/.

            LU-11380 hsm: streamline copytool restore handling

            Partial patch.

            Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
            Change-Id: I7435766f6f67ba60ac39bf02bc3232316a830387

            jhammond John Hammond added a comment - A partial patch for this was pushed for review under LU-11380 . See https://review.whamcloud.com/#/c/33215/ . LU-11380 hsm: streamline copytool restore handling Partial patch. Signed-off-by: John L. Hammond <jhammond@whamcloud.com> Change-Id: I7435766f6f67ba60ac39bf02bc3232316a830387
            jhammond John Hammond added a comment -

            > One reason for storing the LOV EA in the archive is for disaster recovery/rehydrate. If we instead make full backups of the MDT(s) and/or subtrees, then we don't need the layouts or other xattrs in the archive. Conversely, storing xattrs (more than just lov) with the files makes them more "self contained" and usable even if the MDT backup is unavailable.

            HSM (alone) is not a disaster recovery solution. It's a storage tiering mechanism. It LOV is important then either keep full MDT images or store copies of LOV in the policy tool DB.

            jhammond John Hammond added a comment - > One reason for storing the LOV EA in the archive is for disaster recovery/rehydrate. If we instead make full backups of the MDT(s) and/or subtrees, then we don't need the layouts or other xattrs in the archive. Conversely, storing xattrs (more than just lov) with the files makes them more "self contained" and usable even if the MDT backup is unavailable. HSM (alone) is not a disaster recovery solution. It's a storage tiering mechanism. It LOV is important then either keep full MDT images or store copies of LOV in the policy tool DB.

            PS - you are unlikely to get even OID distribution across [0x0000-0xffff]. It is much more likely to get low-numbered OIDs, since clients always start with a new SEQ after mount and will allocate an OID=1 file, but won't necessarily allocate OID=65535 or OID=131071 ever before restart. That will put more pressure on the low-numbered directories and may cause problems in some archives.

            adilger Andreas Dilger added a comment - PS - you are unlikely to get even OID distribution across [0x0000-0xffff] . It is much more likely to get low-numbered OIDs, since clients always start with a new SEQ after mount and will allocate an OID=1 file, but won't necessarily allocate OID=65535 or OID=131071 ever before restart. That will put more pressure on the low-numbered directories and may cause problems in some archives.

            1. Flatten archive hierarchy to 1 directory. /arc1/0001/0000/0401/0000/0002/0000/0x200000401:0x1:0x0 becomes /arc1/0001/0x200000401:0x1:0x0.

            Presumably the "0001" is based on OID % 0xffff? Is there a desire to have some temporal locality with objects in the archive, rather than all 65k directories being used continually?

            One drawback of using the same directories forever is that they get relatively large and fragmented in the hash space, and are updated totally randomly on disk. For MDS->OST object allocation, I've thought about using something like SEQ>>8/OID>>16 or just stick with SEQ/d(OID % 32) but limit OIDs-per-SEQ to 1M or so, which will put concurrently allocated objects relatively close together, but slowly move into new upper-level directories over time. The premise is that concurrently allocated objects are more likely to also be accessed and deleted together, so we can slowly drop those older directories from RAM, and eventually shrink them down as they become empty. Having a single huge directory with all ages of files means the whole directory needs to live in RAM, or be IOPS bound during modification since each insert/delete/lookup will modify a different leaf block.

            3. Stop storing and loading _lov files. For restore, we are already getting file attributes+striping from the MDT (see ct_md_getattr()). But we only use the stat portion of the lmd.

            One reason for storing the LOV EA in the archive is for disaster recovery/rehydrate. If we instead make full backups of the MDT(s) and/or subtrees, then we don't need the layouts or other xattrs in the archive. Conversely, storing xattrs (more than just lov) with the files makes them more "self contained" and usable even if the MDT backup is unavailable.

            adilger Andreas Dilger added a comment - 1. Flatten archive hierarchy to 1 directory. /arc1/0001/0000/0401/0000/0002/0000/0x200000401:0x1:0x0 becomes /arc1/0001/0x200000401:0x1:0x0 . Presumably the "0001" is based on OID % 0xffff ? Is there a desire to have some temporal locality with objects in the archive, rather than all 65k directories being used continually? One drawback of using the same directories forever is that they get relatively large and fragmented in the hash space, and are updated totally randomly on disk. For MDS->OST object allocation, I've thought about using something like SEQ>>8/OID>>16 or just stick with SEQ/d(OID % 32) but limit OIDs-per-SEQ to 1M or so, which will put concurrently allocated objects relatively close together, but slowly move into new upper-level directories over time. The premise is that concurrently allocated objects are more likely to also be accessed and deleted together, so we can slowly drop those older directories from RAM, and eventually shrink them down as they become empty. Having a single huge directory with all ages of files means the whole directory needs to live in RAM, or be IOPS bound during modification since each insert/delete/lookup will modify a different leaf block. 3. Stop storing and loading _lov files. For restore, we are already getting file attributes+striping from the MDT (see ct_md_getattr() ). But we only use the stat portion of the lmd. One reason for storing the LOV EA in the archive is for disaster recovery/rehydrate. If we instead make full backups of the MDT(s) and/or subtrees, then we don't need the layouts or other xattrs in the archive. Conversely, storing xattrs (more than just lov ) with the files makes them more "self contained" and usable even if the MDT backup is unavailable.

            People

              wc-triage WC Triage
              jhammond John Hammond
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: