Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-20258

lfs setstripe accepts EC k+m > 256 which Cauchy/GF(2^8) cannot support

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      Description

      The Lustre EC layer uses a Cauchy matrix construction over GF(2^8) (see `gf_gen_cauchy1_matrix` in `lustre/ec/ec_base.c:128`). Cauchy matrix theory requires k+m distinct field elements, and GF(2^8) has exactly 256, so the construction is mathematically bounded by {}k + m ≤ 256{}.

      However, the user-space validation layers enforce k and m {}independently{} without checking their sum:

      • `lustre/utils/lfs.c:4319` `parse_ec_stripe_count()` validates `data <= 255`, `parity <= 255`, `parity <= data`, and (without `--ec-expert`) recommended limits — but never `data + parity <= 256`.
      • `lustre/utils/liblustreapi_layout.c:2585-2594` validates `dstripe_count <= LOV_EC_MAX_DATA_STRIPES (255)` and `cstripe_count <= LOV_EC_MAX_CODING_STRIPES (15)` — but never their sum.
      • `lustre/doc/lfs-setstripe.1` documents the per-axis limits but not the joint bound.
      • MDS-side validation (if any) needs to be located and similarly tightened.

      As a result, a user can today run e.g.:

      lfs setstripe --ec-expert --ec 255+15 ...

      and have the layout accepted, but the underlying EC encoder cannot build a valid MDS matrix (k+m = 270 > 256). This will fail at runtime in `gf_gen_cauchy1_matrix` (out-of-range field-element indexing) or yield a non-MDS code if the implementation silently truncates.

      Reproduction

      # On a Lustre client with EC support:
      lfs setstripe --ec-expert --ec 255+15 -E -1 -c 270 /mnt/testfs/badfile
      dd if=/dev/urandom of=/mnt/testfs/badfile bs=1M count=1
      # Expected: setstripe should fail with EINVAL (k+m > 256 invalid)
      # Observed: setstripe succeeds; the dd then fails or produces garbage parity

      Proposed fix

      1. parse_ec_stripe_count() in lustre/utils/lfs.c — add after the existing parity > data check:

         if (data + parity > 256) {
                 fprintf(stderr,
                         "error: data+parity (%lu+%lu=%lu) exceeds 256 "
                         "(Cauchy-matrix-over-GF(2^8) limit)\n",
                         data, parity, data + parity);
                 return -EINVAL;
         }

       

      2. llapi_layout_add_ec_component() in lustre/utils/liblustreapi_layout.c — after the existing `cstripe_count > LOV_EC_MAX_CODING_STRIPES` check, add:

         if (dstripe_count + cstripe_count > 256) {
                 errno = EINVAL;
                 return -1;
         }

      3. Introduce a `LOV_EC_MAX_TOTAL_STRIPES 256` macro{} in `lustre/include/uapi/linux/lustre/lustre_user.h` so the bound is named once and reused.

      4. MDS-side validation — investigate `lod`/`mdt` layout-acceptance paths and add the same check if not implicit via llapi.

      5. Man page (`lustre/doc/lfs-setstripe.1`) — document the k+m ≤ 256 constraint alongside the existing per-axis limits, with a brief note on the Cauchy/GF(2^8) origin.

      6. Test coverage — add a test (e.g. in `lustre/tests/sanity-ec.sh` or a new `sanity-ec-validate.sh`) that confirms `lfs setstripe --ec 255+15` is rejected with a clear error.

      Background

      This was surfaced during review of LU-20238 (GFNI EC acceleration) PS6 on Gerrit 65798, where Andreas Dilger asked whether k+m ≤ 256 was a GFNI, ISA-L, or generic Lustre limit (review comment, line 29 of `gfni_ec_test.c`). The answer is that it is a theoretical Cauchy-matrix-over-GF(2^8) bound — independent of GFNI or any specific implementation — and lifting it would require switching to a larger field (e.g. GF(2^16)), which the current EC code does not support. Since the math caps at 256, user-space validation should match.

      This ticket is scoped to {}tightening validation only{} and does not change the EC encoder itself.

      Related

      • LU-20238 — GFNI-accelerated EC primitives (where the question surfaced)
      • LU-19905 — broader EC consolidation umbrella
      • LU-20016 — ISA-L EC SIMD work

      Attachments

        Issue Links

          Activity

            People

              hnishida Hiroshi Nishida
              hnishida Hiroshi Nishida
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: