[LU-14673] panic: crc32-table: crc32 alg self test failed in fips mode! Created: 06/May/21 Updated: 13/Jul/21 Resolved: 27/May/21 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.6 |
| Fix Version/s: | Lustre 2.12.7, Lustre 2.15.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Olaf Faaland | Assignee: | Sebastien Buisson |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | llnl | ||
| Environment: |
lustre-2.12.6_4.llnl-2.t4.x86_64 |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
upon loading LNet, node panics with this: libcfs: loading out-of-tree module taints kernel. libcfs: module verification failed: signature and/or required key missing - tainting kernel LNet: HW NUMA nodes: 2, HW CPU cores: 64, npartitions: 2 alg: No test for adler32 (adler32-zlib) alg: hash: digest failed on test 1 for crc32-table: ret=126 Kernel panic - not syncing: crc32-table: crc32 alg self test failed in fips mode! CPU: 11 PID: 70553 Comm: cryptomgr_test Tainted: G OE --------- - - 4.18.0-240.22.1.1toss.t4.x86_64 #1 Hardware name: HPE ProLiant DL385 Gen10 Plus/ProLiant DL385 Gen10 Plus, BIOS A42 10/30/2020 Call Trace: dump_stack+0x5c/0x80 panic+0xe7/0x2a9 ? __alg_test_hash+0x55/0x80 alg_test.cold.21+0x13/0x44 ? __switch_to_asm+0x41/0x70 ? __switch_to_asm+0x35/0x70 ? __switch_to_asm+0x41/0x70 ? __switch_to+0x7a/0x400 ? __schedule+0x2cf/0x720 ? crypto_acomp_scomp_free_ctx+0x30/0x30 cryptomgr_test+0x27/0x50 kthread+0x11d/0x140 ? kthread_flush_work_fn+0x10/0x10 ret_from_fork+0x22/0x40 Kernel Offset: 0x18c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) Loading 2.14.0 LNet under fips=1 does not cause the panic |
| Comments |
| Comment by Olaf Faaland [ 06/May/21 ] |
|
For my records, my internal ticket is TOSS5190 |
| Comment by Andreas Dilger [ 06/May/21 ] |
|
This looks related to Possibly something has changed in how FIPS is being checked in the 4.18 kernel? |
| Comment by Andreas Dilger [ 06/May/21 ] |
|
Olaf, since it looks like you are building your own patched "1toss" client kernel, you could potentially disable this check until the problem is understood and fixed. |
| Comment by Sebastien Buisson [ 07/May/21 ] |
|
The fact that it does not crash with 2.14 is due to patch https://review.whamcloud.com/35342, only landed to master in early January 2020. This patch is a (very) big one, whose objective was to simplify the Lustre code by removing obsolete config checks. Among those was cfs_crypto_crc32_register() and all the Lustre specific crc32 implementation done in libcfs/libcfs/linux/linux-crypto-crc32.c. No crc32, no crash I can try to see why the digest test is failing (the fact that it fails is not due to FIPS, it always fails, but with FIPS enabled it triggers a panic). But maybe the most obvious move would be to remove the call to cfs_crypto_crc32_register(). Any suggestion adilger? |
| Comment by Olaf Faaland [ 07/May/21 ] |
|
Hi Sebastian, When you say "it [the digest test] always fails", do you mean it always fails under the RHEL 8.3 kernel, but succeeds under the RHEL 7 kernel? thanks |
| Comment by Sebastien Buisson [ 07/May/21 ] |
|
Hi Olaf, Sorry for the confusion, I meant:
|
| Comment by James A Simmons [ 07/May/21 ] |
|
The special crc32 handling is left overs from the RHEL6 days which is why it was removed in newer lustre versions. All the special crc32 handling Lustre did is now apart of the supported kernels. |
| Comment by Sebastien Buisson [ 10/May/21 ] |
|
Maybe the most straightforward way to get rid of this problem is to not call LIBCFS_HAVE_CRC32 config check in libcfs/autoconf/lustre-libcfs.m4, resulting in built-in crc32 not being used. I tested this quick solution, it works. A cleaner approach is to backport patch https://review.whamcloud.com/35342 to b2_12, I just pushed a patch for this: |
| Comment by Sebastien Buisson [ 11/May/21 ] |
|
Patch https://review.whamcloud.com/35342 has been abandoned as we cannot drop RHEL6 support in 2.12. An even cleaner approach is to fix the root cause of the error: alg: hash: digest failed on test 1 for crc32-table: ret=126 Errno 126 is: #define ENOKEY 126 /* Required key not available */ And it appears that crc32 needs to set the CRYPTO_ALG_OPTIONAL_KEY flag to work properly. I will push a patch to fix this. |
| Comment by Gerrit Updater [ 11/May/21 ] |
|
Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/43653 |
| Comment by Gerrit Updater [ 11/May/21 ] |
|
Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/43656 |
| Comment by Gerrit Updater [ 27/May/21 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/43656/ |
| Comment by Peter Jones [ 27/May/21 ] |
|
Landed for 2.15 |
| Comment by Olaf Faaland [ 01/Jun/21 ] |
|
Keeping the topllnl label until the patch lands to b2_12 |
| Comment by Gerrit Updater [ 27/Jun/21 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/43653/ |