Details
-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
Description
The Lustre EC layer uses a Cauchy matrix construction over GF(2^8) (see `gf_gen_cauchy1_matrix` in `lustre/ec/ec_base.c:128`). Cauchy matrix theory requires k+m distinct field elements, and GF(2^8) has exactly 256, so the construction is mathematically bounded by {}k + m ≤ 256{}.
However, the user-space validation layers enforce k and m {}independently{} without checking their sum:
- `lustre/utils/lfs.c:4319` `parse_ec_stripe_count()` validates `data <= 255`, `parity <= 255`, `parity <= data`, and (without `--ec-expert`) recommended limits — but never `data + parity <= 256`.
- `lustre/utils/liblustreapi_layout.c:2585-2594` validates `dstripe_count <= LOV_EC_MAX_DATA_STRIPES (255)` and `cstripe_count <= LOV_EC_MAX_CODING_STRIPES (15)` — but never their sum.
- `lustre/doc/lfs-setstripe.1` documents the per-axis limits but not the joint bound.
- MDS-side validation (if any) needs to be located and similarly tightened.
As a result, a user can today run e.g.:
lfs setstripe --ec-expert --ec 255+15 ...
and have the layout accepted, but the underlying EC encoder cannot build a valid MDS matrix (k+m = 270 > 256). This will fail at runtime in `gf_gen_cauchy1_matrix` (out-of-range field-element indexing) or yield a non-MDS code if the implementation silently truncates.
Reproduction
# On a Lustre client with EC support:
lfs setstripe --ec-expert --ec 255+15 -E -1 -c 270 /mnt/testfs/badfile
dd if=/dev/urandom of=/mnt/testfs/badfile bs=1M count=1
# Expected: setstripe should fail with EINVAL (k+m > 256 invalid)
# Observed: setstripe succeeds; the dd then fails or produces garbage parity
Proposed fix
1. parse_ec_stripe_count() in lustre/utils/lfs.c — add after the existing parity > data check:
if (data + parity > 256) {
fprintf(stderr,
"error: data+parity (%lu+%lu=%lu) exceeds 256 "
"(Cauchy-matrix-over-GF(2^8) limit)\n",
data, parity, data + parity);
return -EINVAL;
}
2. llapi_layout_add_ec_component() in lustre/utils/liblustreapi_layout.c — after the existing `cstripe_count > LOV_EC_MAX_CODING_STRIPES` check, add:
if (dstripe_count + cstripe_count > 256) {
errno = EINVAL;
return -1;
}
3. Introduce a `LOV_EC_MAX_TOTAL_STRIPES 256` macro{} in `lustre/include/uapi/linux/lustre/lustre_user.h` so the bound is named once and reused.
4. MDS-side validation — investigate `lod`/`mdt` layout-acceptance paths and add the same check if not implicit via llapi.
5. Man page (`lustre/doc/lfs-setstripe.1`) — document the k+m ≤ 256 constraint alongside the existing per-axis limits, with a brief note on the Cauchy/GF(2^8) origin.
6. Test coverage — add a test (e.g. in `lustre/tests/sanity-ec.sh` or a new `sanity-ec-validate.sh`) that confirms `lfs setstripe --ec 255+15` is rejected with a clear error.
Background
This was surfaced during review of LU-20238 (GFNI EC acceleration) PS6 on Gerrit 65798, where Andreas Dilger asked whether k+m ≤ 256 was a GFNI, ISA-L, or generic Lustre limit (review comment, line 29 of `gfni_ec_test.c`). The answer is that it is a theoretical Cauchy-matrix-over-GF(2^8) bound — independent of GFNI or any specific implementation — and lifting it would require switching to a larger field (e.g. GF(2^16)), which the current EC code does not support. Since the math caps at 256, user-space validation should match.
This ticket is scoped to {}tightening validation only{} and does not change the EC encoder itself.