Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7207

HSM: Add Archive UUID to delete changelog records

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      HSM tools currently store an external identifier (such as a UUID) in EA when a file is archived. The identifier is used to identify the file in the backend archive, and there may be more than one identifier if the file has been archived to multiple backends. Currently different tools are doing this independently and are not coordinating their EA names or formats.

      When a file is deleted, the EA is no longer available, so it would be helpful include the identifier(s) in the Delete changelog record. I suggest we define a standard name and format HSM archive EA, and this data should be included as is in the delete changelog record.

      One possible format would be to use JSON to encode a list of endpoint and archive id. Here is a strawman example to begin the discussion:

      {
        "replicas": [
          {
            "endpoint" : "s3://my-bucket/archve",
            "id": "UUID"
           },
          { 
            "endpoint" : "wos://address",
            "id": OID
          }
        ]
      }
      

      Alternatively, to save space the endpoint could just be an index that refers to a specific endpoint in the local configuration.

      Attachments

        Issue Links

          Activity

            [LU-7207] HSM: Add Archive UUID to delete changelog records

            My initial thought wouldn't be to store multiple UUIDs per component, but rather to store each archive copy in a separate component, possibly expanding the lov_hsm_attrs_v2 to store an "archive date" so that this could be used for storing multiple versions of the file (in-filesystem versions would store the timestamps on the OST objects as they do now). That makes archive copies and in-filesystem copies more alike.

            The main difference, besides performance, would be that we can't randomly update the archive data copy, though we could do clever things like create new components for parts of the file being written, so long as they are block aligned.

            adilger Andreas Dilger added a comment - My initial thought wouldn't be to store multiple UUIDs per component, but rather to store each archive copy in a separate component, possibly expanding the lov_hsm_attrs_v2 to store an "archive date" so that this could be used for storing multiple versions of the file (in-filesystem versions would store the timestamps on the OST objects as they do now). That makes archive copies and in-filesystem copies more alike. The main difference, besides performance, would be that we can't randomly update the archive data copy, though we could do clever things like create new components for parts of the file being written, so long as they are block aligned.
            rread Robert Read added a comment -

            Will it be possible to support multiple hsm sub_layouts per component?

            UUID has specific meaning and not all the identifiers will be UUIDs, so the field should be a bit more generic, such as hsm_identifier or hsm_data. (I know we've excessively abused "UUID" in Lustre since forever but no reason to continue doing that.)

            YAML output is great, but I'd expect copytools would be using the API to retrieve the layout data and update the identifiers.

            Note, we'll still need to use user xattrs to store UUIDs until this work is completed, so the original idea here would still be a useful interim solution.

            rread Robert Read added a comment - Will it be possible to support multiple hsm sub_layouts per component? UUID has specific meaning and not all the identifiers will be UUIDs, so the field should be a bit more generic, such as hsm_identifier or hsm_data. (I know we've excessively abused "UUID" in Lustre since forever but no reason to continue doing that.) YAML output is great, but I'd expect copytools would be using the API to retrieve the layout data and update the identifiers. Note, we'll still need to use user xattrs to store UUIDs until this work is completed, so the original idea here would still be a useful interim solution.

            For composite layout access by userspace, "lfs getstripe" will be updated as part of the PFL project to format composite layouts in YAML format, so this can be consumed directly by user tools if desired, something like below (still open to suggestions on this):

            $ lfs getstripe -v /mnt/lustre/file
            "/mnt/lustre/file":
              fid: "[0x200000400:0x2c3:0x0]"
              composite_header:
                composite_magic: 0x0BDC0BD0
                composite_size:  536
                composite_gen:   6
                composite_flags: 0
                component_count: 3
              components:
                - component_id:     2
                  component_flags:  stale, version
                  component_start:  0
                  component_end:    18446744073709551615
                  component_offset: 152
                  component_size:   48
                  sub_layout:
                    hsm_magic:      0x45320BD0
                    hsm_flags:      [ exists, archived ] 
                    hsm_arch_id:    1
                    hsm_arch_ver:   0xabcd1234
                    hsm_uuid_len:   16
                    hsm_uuid:      e60649ac-b4e3-453f-88c7-611e78c38d5a
                - component_id:     3
                  component_flags:  0
                  component_start:  20971520
                  component_end:    216777216
                  component_offset: 208
                  component_size:   144
                  sub_layout:
                    lmm_magic:        0x0BD30BD0
                    lmm_pattern:      1
                    lmm_stripe_size:  1048576
                    lmm_stripe_count: 4
                    lmm_stripe_index: 0
                    lmm_layout_gen:   0
                    lmm_layout_pool: flash
                    lmm_obj:
                      - 0: { lmm_ost: 0, lmm_fid: "[0x100000000:0x2:0x0]" }
                      - 1: { lmm_ost: 1, lmm_fid: "[0x100010000:0x3:0x0]" }
                      - 2: { lmm_ost: 2, lmm_fid: "[0x100020000:0x4:0x0]" }
                      - 3: { lmm_ost: 3, lmm_fid: "[0x100030000:0x4:0x0]" }
                - component_id:     4
                  component_flags:  0
                  component_start:  3355443200
                  component_end:    3367108864
                  component_offset: 352
                  component_size:   144
                  sub_layout:
                    lmm_magic:        0x0BD30BD0
                    lmm_pattern:      1
                    lmm_stripe_size:  4194304
                    lmm_stripe_count: 4
                    lmm_stripe_index: 5
                    lmm_pool:         flash
                    lmm_layout_gen:   0
                    lmm_obj:
                      - 0: { lmm_ost: 5, lmm_fid: "[0x100050000:0x2:0x0]" }
                      - 1: { lmm_ost: 6, lmm_fid: "[0x100060000:0x2:0x0]" }
                      - 2: { lmm_ost: 7, lmm_fid: "[0x100070000:0x3:0x0]" }
                      - 3: { lmm_ost: 0, lmm_fid: "[0x100000000:0x3:0x0]" }
            

            This describes a file that was originally written (as a normal RAID-0 file), then archived (creating component_id #2 on the same file), and then two disjoint parts of the file (offsets at 21MB and 3.3GB) were read back in from tape to create component_id's #3 and #4. The actual policy decisions of when to read in partial files is up to the policy engine and copytool, and outside the scope of the on-disk format.

            adilger Andreas Dilger added a comment - For composite layout access by userspace, "lfs getstripe" will be updated as part of the PFL project to format composite layouts in YAML format, so this can be consumed directly by user tools if desired, something like below (still open to suggestions on this): $ lfs getstripe -v /mnt/lustre/file "/mnt/lustre/file": fid: "[0x200000400:0x2c3:0x0]" composite_header: composite_magic: 0x0BDC0BD0 composite_size: 536 composite_gen: 6 composite_flags: 0 component_count: 3 components: - component_id: 2 component_flags: stale, version component_start: 0 component_end: 18446744073709551615 component_offset: 152 component_size: 48 sub_layout: hsm_magic: 0x45320BD0 hsm_flags: [ exists, archived ] hsm_arch_id: 1 hsm_arch_ver: 0xabcd1234 hsm_uuid_len: 16 hsm_uuid: e60649ac-b4e3-453f-88c7-611e78c38d5a - component_id: 3 component_flags: 0 component_start: 20971520 component_end: 216777216 component_offset: 208 component_size: 144 sub_layout: lmm_magic: 0x0BD30BD0 lmm_pattern: 1 lmm_stripe_size: 1048576 lmm_stripe_count: 4 lmm_stripe_index: 0 lmm_layout_gen: 0 lmm_layout_pool: flash lmm_obj: - 0: { lmm_ost: 0, lmm_fid: "[0x100000000:0x2:0x0]" } - 1: { lmm_ost: 1, lmm_fid: "[0x100010000:0x3:0x0]" } - 2: { lmm_ost: 2, lmm_fid: "[0x100020000:0x4:0x0]" } - 3: { lmm_ost: 3, lmm_fid: "[0x100030000:0x4:0x0]" } - component_id: 4 component_flags: 0 component_start: 3355443200 component_end: 3367108864 component_offset: 352 component_size: 144 sub_layout: lmm_magic: 0x0BD30BD0 lmm_pattern: 1 lmm_stripe_size: 4194304 lmm_stripe_count: 4 lmm_stripe_index: 5 lmm_pool: flash lmm_layout_gen: 0 lmm_obj: - 0: { lmm_ost: 5, lmm_fid: "[0x100050000:0x2:0x0]" } - 1: { lmm_ost: 6, lmm_fid: "[0x100060000:0x2:0x0]" } - 2: { lmm_ost: 7, lmm_fid: "[0x100070000:0x3:0x0]" } - 3: { lmm_ost: 0, lmm_fid: "[0x100000000:0x3:0x0]" } This describes a file that was originally written (as a normal RAID-0 file), then archived (creating component_id #2 on the same file), and then two disjoint parts of the file (offsets at 21MB and 3.3GB) were read back in from tape to create component_id's #3 and #4. The actual policy decisions of when to read in partial files is up to the policy engine and copytool, and outside the scope of the on-disk format.
            adilger Andreas Dilger added a comment - - edited

            It is my goal that the HSM archive xattr also be usable as a component in a composite file (http://wiki.lustre.org/Layout_Enhancement#2.1._Composite_Layouts). That would mean there is no need to have an HSM structure that allows multiple archive IDs to be expressed directly, since this could be handled by the composite layouts rather than as a separate xattr.

            The main reason for to put the HSM archive ID as part of a composite file is to allow partial file restore to be implemented. The HSM archived file would typically cover the whole file (though it could also cover a subset if that was really needed by specifying the extents of the component, and possibly allowing an "offset" of the archived file within the component). Partial restores from tape would get separate OST-based components that "mirror" the archive copy (i.e. overlapping extents) for the part of the file that is restored.

            Also, having a binary data structure along the lines of lov_mds_md would be easier to manage in the kernel, and in particular the structure needs to have a unique magic value at the start so that the component type can be identified (e.g. HSM archive component vs. RAID-0 on OST(s) vs. RAID-N parity).

            My strawman would allow the direct use of older HSM xattrs as a sub-layout to allow converting over existing files, something like:

            struct lov_hsm_attrs_v1 {
                    __u32 hsm_magic;                /* LOV_MAGIC_HSM_V1, replaces hsm_compat */
                    __u32 hsm_flags;            /* HS_* states from enum hsm_states */
                    __u64 hsm_arch_id;           /* integer archive number the data is in */
                    __u64 hsm_arch_ver;          /* data version of file in archive */
            };
            

            The new HSM sub-layout that includes the archive UUID would look something like:

            struct lov_hsm_attrs_v2 {
                    __u32 hsm_magic;             /* LOV_MAGIC_HSM_V2, replaces hsm_compat */
                    __u32 hsm_flags;            /* HS_* states from enum hsm_states */
                    __u64 hsm_arch_id;           /* integer archive number the data is in */
                    __u64 hsm_arch_ver;          /* data version of file in archive */
                    __u16 hsm_file_id_len;          /* length of archive-unique identifier hsm_uuid */
                    __u16 hsm_padding2;
                    unsigned char hsm_file_id[0];  /* identifier for file data within "hsm_arch_num" archive */
            };
            
            • hsm_magic is LOV_HSM_MAGIC_V2 = 0x45320BD0 ("4532" => "HSM2")
            • hsm_flags is one of the HS_* flags from enum hsm_states*
            • hsm_arch_id would continue to be as it is today - an integer identifier for the archive in which the data exists. Normally this would be a small integer that is an index in a table to identify which copytool should be used, but might map directly to some other identifier (e.g. tape volume?) in some implementations.
            • hsm_arch_ver is a hash that identifies the version of data stored in the archived file. There is no relationship assumed between different hsm_arch_ver values, other than equality indicating that the data is identical.
            • hsm_file_id_len is the length of hsm_file_id in bytes.
            • hsm_file_id is an archive-specific identifier for the file in the archive identified by hsm_arch_id. (Open question - should this be ASCII? With a trailing NUL? Or is e.g. a binary 16-byte UUID preferable to save space inside the inode (with one of { HS_UUID | HS_U64 | HS_ASCII | HS_BIN }

              so that they could be formatted correctly for printing?), instead of a 36-byte ASCII UUID?)

            * enum hsm_states should be renamed enum hsm_flags to match the comment at struct hsm_attrs (or vice versa in the rest of the code), and enum hsm_flags should be used for all of the variables and functions that hold HS_* values.

            adilger Andreas Dilger added a comment - - edited It is my goal that the HSM archive xattr also be usable as a component in a composite file ( http://wiki.lustre.org/Layout_Enhancement#2.1._Composite_Layouts ). That would mean there is no need to have an HSM structure that allows multiple archive IDs to be expressed directly, since this could be handled by the composite layouts rather than as a separate xattr. The main reason for to put the HSM archive ID as part of a composite file is to allow partial file restore to be implemented. The HSM archived file would typically cover the whole file (though it could also cover a subset if that was really needed by specifying the extents of the component, and possibly allowing an "offset" of the archived file within the component). Partial restores from tape would get separate OST-based components that "mirror" the archive copy (i.e. overlapping extents) for the part of the file that is restored. Also, having a binary data structure along the lines of lov_mds_md would be easier to manage in the kernel, and in particular the structure needs to have a unique magic value at the start so that the component type can be identified (e.g. HSM archive component vs. RAID-0 on OST(s) vs. RAID-N parity). My strawman would allow the direct use of older HSM xattrs as a sub-layout to allow converting over existing files, something like: struct lov_hsm_attrs_v1 { __u32 hsm_magic; /* LOV_MAGIC_HSM_V1, replaces hsm_compat */ __u32 hsm_flags; /* HS_* states from enum hsm_states */ __u64 hsm_arch_id; /* integer archive number the data is in */ __u64 hsm_arch_ver; /* data version of file in archive */ }; The new HSM sub-layout that includes the archive UUID would look something like: struct lov_hsm_attrs_v2 { __u32 hsm_magic; /* LOV_MAGIC_HSM_V2, replaces hsm_compat */ __u32 hsm_flags; /* HS_* states from enum hsm_states */ __u64 hsm_arch_id; /* integer archive number the data is in */ __u64 hsm_arch_ver; /* data version of file in archive */ __u16 hsm_file_id_len; /* length of archive-unique identifier hsm_uuid */ __u16 hsm_padding2; unsigned char hsm_file_id[0]; /* identifier for file data within "hsm_arch_num" archive */ }; hsm_magic is LOV_HSM_MAGIC_V2 = 0x45320BD0 ("4532" => "HSM2") hsm_flags is one of the HS_* flags from enum hsm_states * hsm_arch_id would continue to be as it is today - an integer identifier for the archive in which the data exists. Normally this would be a small integer that is an index in a table to identify which copytool should be used, but might map directly to some other identifier (e.g. tape volume?) in some implementations. hsm_arch_ver is a hash that identifies the version of data stored in the archived file. There is no relationship assumed between different hsm_arch_ver values, other than equality indicating that the data is identical. hsm_file_id_len is the length of hsm_file_id in bytes. hsm_file_id is an archive-specific identifier for the file in the archive identified by hsm_arch_id . (Open question - should this be ASCII? With a trailing NUL? Or is e.g. a binary 16-byte UUID preferable to save space inside the inode (with one of { HS_UUID | HS_U64 | HS_ASCII | HS_BIN } so that they could be formatted correctly for printing?), instead of a 36-byte ASCII UUID?) * enum hsm_states should be renamed enum hsm_flags to match the comment at struct hsm_attrs (or vice versa in the rest of the code), and enum hsm_flags should be used for all of the variables and functions that hold HS_* values.

            Is there any body working/will work on this? If no, I'd like to do a little bit research on this at least.

            lixi Li Xi (Inactive) added a comment - Is there any body working/will work on this? If no, I'd like to do a little bit research on this at least.

            This is really a good idea. Vote +1

            lixi Li Xi (Inactive) added a comment - This is really a good idea. Vote +1

            People

              wc-triage WC Triage
              rread Robert Read
              Votes:
              0 Vote for this issue
              Watchers:
              19 Start watching this issue

              Dates

                Created:
                Updated: