Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-20238

ec: x86 GFNI/AVX-{2,512} accelerated EC primitives

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Unresolved
    • Medium
    • None
    • None
    • None
    • RHEL9.7/10.1
    • 3
    • 9223372036854775807

    Description

      Background

      LU-19465 landed a plain-C ISA-L erasure-code library (libec.a, kernel module ec.ko) as the foundation for the EC feature work in flight.
      LU-19929 added a RISC-V optimised path on top of that base.  No acceleration is currently shipped for x86_64.

      This ticket tracks adding an x86_64 fast path using the GFNI / AVX2 / AVX-512 instruction set extensions, byte-compatible with the existing ec_*_base() implementation, runtime-dispatched, and patent-clean.

      Changes

       Add a runtime-dispatched GFNI fast path inside libec.a:

      • AVX-512 + GFNI: ec_encode_data() / ec_encode_data_update() ec_init_tables() route to AVX-512 GFNI kernels.  Encode peels parity rows off in groups of up to 6 per FPU pass; any m is supported via repeated passes.
      • AVX2 + GFNI: same dispatcher, AVX2 GFNI kernels. Encode peels in groups of up to 3 per pass.
      • Everywhere else (and when GFNI is not compiled in): the existing ec_*_base() path is used, unchanged.

      The kernels are written in C using compiler intrinsics (_mm{256,512}_gf2p8affine_epi64_epi8) gated per-function with _attribute_((target("avx2,gfni"))) or target("avx512f,avx512bw, gfni")). The build requires GCC >= 8 / clang >= 8 and does NOT depend on NASM.

      A new autoconf macro EC_GFNI_SUPPORT in config/lustre-erasurecode.m4 compile-tests the affine intrinsic. On success, it defines LUSTRE_EC_GFNI for libec.a and sets the EC_GFNI automake conditional that pulls in the new sources. On failure (non-x86_64, older toolchain) no GFNI objects are built and ec_base_aliases.c collapses to the existing base path.

      A small standalone smoke test (gfni_ec_test, noinst_PROGRAMS) is built inside the tree and verifies that the dispatcher and base path produce byte-identical output on randomised input.

      Scope

      This change is strictly userspace-scoped:

      • libec.a (userspace static archive) gains the GFNI fast path.
      • ec.ko (kernel module) is unchanged at runtime — its Makefile does not define LUSTRE_EC_GFNI, so the dispatcher in ec_base_aliases.c collapses to the base path for the kernel build. A future ticket can wire kernel-mode GFNI (with kernel_fpu_begin/end() bracketing, which is already prepared in gfni_glue.c) if/when that is wanted.

      Patent considerations

      The encode path uses only the GF(2^8) affine intrinsic (vgf2p8affineqb) plus standard XOR / load / store.  No PSHUFB-style nibble-table lookup is used in any of the GFNI kernels. The fallback in ec_base.c is the plain byte-indexed table lookup (gf_mul_table_base[256*256]) already present in master, which is itself patent-clean.

      Validation

      Built and runtime-tested at HEAD against current Lustre master:

      • Non-GFNI x86_64 client (RHEL 10.1, kernel 6.12, GCC 14.3.1, KVM): builds cleanly, configure reports checking whether to build GFNI EC primitives... yes (compile-test based on toolchain), gfni_ec_test correctly detects no runtime GFNI and routes through the base path; parity output matches.
      • AVX2-GFNI x86_64 client (Raptor Lake i5-1340P, RHEL 10.1, kernel 6.12, GCC 14.3.1): builds cleanly, gfni_ec_test detects AVX2-GFNI and routes through ec_encode_data_avx2_gfni; parity output is byte-identical to ec_encode_data_base for k=4 m=2 on randomised input.

      AVX-512-GFNI kernels are compile-tested but not runtime-tested (no AVX-512 hardware available locally). They follow the same intrinsic pattern as the AVX2-GFNI kernels and the same code path in the dispatcher, so this is a hardware coverage gap rather than a correctness gap.

      Files added

          lustre/utils/erasurecode/gf_vect_dot_prod_avx2_gfni.c
          lustre/utils/erasurecode/gf_2vect_dot_prod_avx2_gfni.c
          lustre/utils/erasurecode/gf_3vect_dot_prod_avx2_gfni.c
          lustre/utils/erasurecode/gf_vect_dot_prod_avx512_gfni.c
          lustre/utils/erasurecode/gf_2vect_dot_prod_avx512_gfni.c
          lustre/utils/erasurecode/gf_3vect_dot_prod_avx512_gfni.c
          lustre/utils/erasurecode/gf_4vect_dot_prod_avx512_gfni.c
          lustre/utils/erasurecode/gf_5vect_dot_prod_avx512_gfni.c
          lustre/utils/erasurecode/gf_6vect_dot_prod_avx512_gfni.c
          lustre/utils/erasurecode/gf_vect_mad_avx2_gfni.c
          lustre/utils/erasurecode/gf_2vect_mad_avx2_gfni.c
          lustre/utils/erasurecode/gf_2vect_mad_avx512_gfni.c
          lustre/utils/erasurecode/gfni_glue.c
          lustre/utils/erasurecode/gfni_dispatch.h
          lustre/utils/erasurecode/gfni_ec_test.c
      

      Files modified

          lustre/ec/ec_base_aliases.c          (+#ifdef LUSTRE_EC_GFNI dispatch)
          lustre/utils/erasurecode/autoMakefile.am   (+if EC_GFNI block)
          config/lustre-erasurecode.m4         (+EC_GFNI_SUPPORT macro)
      

      Diffstat: 18 files, +1863 / -2.

      Related

          LU-19465 — adds the base libec.a / ec.ko this patch extends.
          LU-19929 — adds the RISC-V optimised path (parallel architecture).

      Attachments

        Issue Links

          Activity

            People

              hnishida Hiroshi Nishida
              hnishida Hiroshi Nishida
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: