Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10911 FLR2: Erasure coding
  3. LU-12187

FLR-EC: erasure coding layout handling

    XMLWordPrintable

Details

    • Technical task
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 9223372036854775807

    Description

      This ticket covers basic support for erasure-coded layouts. This begins with adding support for expressing erasure-coded layouts in the LLAPI, followed by support for creating them in the LOV and LOD code.

      An FLR-EC file will consist of one or more RAID-0 striped "data mirrors" (of nearly arbitrary stripe count) typical with current Lustre composite/PFL files, plus one or more RAID-0 "EC mirrors" that will each hold the parity stripes for a single "data mirror". A single RAID-0 "data mirror" will be paired with a single RAID-0 "EC mirror" to ensure that the data stripes and their corresponding EC stripes do not reside on overlapping OSTs and introduce a single point of failure. The data and EC stripes that make up a single consistent copy of the file data will consist of one or more "RAID Sets" (depending on the total number of data stripes) that group specific data+EC stripes of the file with a "reasonable" RAID geometry (e.g. 4+2, 8+2, or 16+3) so that there is not a huge large read inflation when having to read all of the data+EC stripes of a RAID Set in order to reconstruct a missing data stripe.

      The FLR-EC layout should consist of two separate sets of components, one set containing only the RAID-0 "data mirror", which is exactly the same as regular files being created and written today (possibly in multiple PFL components with different stripe counts, pools, etc.), and a separate "EC mirror" that contains only the parity stripes for the corresponding "data mirror". In industry terminology this would be considered "RAID-4", which has static data and parity "devices" (though each "device" in this case is an OST object). The data and parity mirrors would not be combined or interleaved into a single component.

      Having separate data and EC components serves several important purposes:

      • it allows the EC component to easily be marked STALE if the corresponding data component is modified
      • older clients that do not understand EC can safely ignore the EC mirror and read/write only the "data mirror" (with appropriate MDS support, and corresponding inability to reconstruct the data if one of the data OSTs becomes unavailable)
      • it allows an "EC mirror" to easily be added to an existing data mirror, which may have been created long before the FLR-EC feature existed

      This ticket also includes adding EC support to setstripe via a new "-L ec[:D+P]" layout pattern (and possibly "--ec[:D+P]" shortcut). If the "[:D+P]" part of the EC layout is not specified, then a reasonable default such as 8+2 should be selected by the MDS based on the requested/available OST stripe count for the file (see LU-19480 for details). The default EC geometry (e.g. 8+2) should be a tunable on the MDS, something like lod.*.ec_default_data_stripe and lod.*.ec_default_parity_stripe or similar, but this should not enable EC for all files, only set the default geometry for files with "LOV_PATTERN_EC" or "LCME_FL_PARITY" set but no lcme_cstripe or lcme_dstripe specified.

      There should be reasonable guardrails in the userspace tools to prevent bad RAID geometries from being specified by users (e.g. P > D, or D > 21 or P > 4), but we might consider to have an (undocumented?) "--ec-expert" option that bypasses these limits and allows testing with RAID geometries up to the limits of the configuration (e.g. 255+32 or something absurd) for testing purposes or extreme configurations that we are not aware of at the time of writing.

      This task will be considered complete once erasure-coded layouts can be created, but it does not include support for performing I/O to these layouts, which will be handled in other tasks.

      Attachments

        Issue Links

          Activity

            People

              bobijam Zhenyu Xu
              bobijam Zhenyu Xu
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated: