[LU-14673] panic: crc32-table: crc32 alg self test failed in fips mode! Created: 06/May/21  Updated: 13/Jul/21  Resolved: 27/May/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.6
Fix Version/s: Lustre 2.12.7, Lustre 2.15.0

Type: Bug Priority: Major
Reporter: Olaf Faaland Assignee: Sebastien Buisson
Resolution: Fixed Votes: 0
Labels: llnl
Environment:

lustre-2.12.6_4.llnl-2.t4.x86_64
4.18.0-240.22.1.1toss.t4.x86_64
fips=1


Issue Links:
Related
is related to LU-13355 adler32 wrapper in libcfs Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

upon loading LNet, node panics with this:

libcfs: loading out-of-tree module taints kernel.
libcfs: module verification failed: signature and/or required key missing - tainting kernel
LNet: HW NUMA nodes: 2, HW CPU cores: 64, npartitions: 2
alg: No test for adler32 (adler32-zlib)
alg: hash: digest failed on test 1 for crc32-table: ret=126
Kernel panic - not syncing: crc32-table: crc32 alg self test failed in fips mode!

CPU: 11 PID: 70553 Comm: cryptomgr_test Tainted: G           OE    --------- -  - 4.18.0-240.22.1.1toss.t4.x86_64 #1
Hardware name: HPE ProLiant DL385 Gen10 Plus/ProLiant DL385 Gen10 Plus, BIOS A42 10/30/2020
Call Trace:
 dump_stack+0x5c/0x80
 panic+0xe7/0x2a9
 ? __alg_test_hash+0x55/0x80
 alg_test.cold.21+0x13/0x44
 ? __switch_to_asm+0x41/0x70
 ? __switch_to_asm+0x35/0x70
 ? __switch_to_asm+0x41/0x70
 ? __switch_to+0x7a/0x400
 ? __schedule+0x2cf/0x720
 ? crypto_acomp_scomp_free_ctx+0x30/0x30
 cryptomgr_test+0x27/0x50
 kthread+0x11d/0x140
 ? kthread_flush_work_fn+0x10/0x10
 ret_from_fork+0x22/0x40
Kernel Offset: 0x18c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

Loading 2.14.0 LNet under fips=1 does not cause the panic



 Comments   
Comment by Olaf Faaland [ 06/May/21 ]

For my records, my internal ticket is TOSS5190

Comment by Andreas Dilger [ 06/May/21 ]

This looks related to LU-13355, but according to patch https://review.whamcloud.com/38205 "LU-13355 crypto: crypto engine wrappers in libcfs" the crc32 crypto wrapper should be fixed since 2.12.5.

Possibly something has changed in how FIPS is being checked in the 4.18 kernel?

Comment by Andreas Dilger [ 06/May/21 ]

Olaf, since it looks like you are building your own patched "1toss" client kernel, you could potentially disable this check until the problem is understood and fixed.

Comment by Sebastien Buisson [ 07/May/21 ]

The fact that it does not crash with 2.14 is due to patch https://review.whamcloud.com/35342, only landed to master in early January 2020. This patch is a (very) big one, whose objective was to simplify the Lustre code by removing obsolete config checks. Among those was cfs_crypto_crc32_register() and all the Lustre specific crc32 implementation done in libcfs/libcfs/linux/linux-crypto-crc32.c.

No crc32, no crash

I can try to see why the digest test is failing (the fact that it fails is not due to FIPS, it always fails, but with FIPS enabled it triggers a panic). But maybe the most obvious move would be to remove the call to cfs_crypto_crc32_register(). Any suggestion adilger?

Comment by Olaf Faaland [ 07/May/21 ]

Hi Sebastian,

When you say "it [the digest test] always fails", do you mean it always fails under the RHEL 8.3 kernel, but succeeds under the RHEL 7 kernel?

thanks

Comment by Sebastien Buisson [ 07/May/21 ]

Hi Olaf,

Sorry for the confusion, I meant:

  • it fails on RHEL8.3, no matter FIPS is enabled or not, but only panics when FIPS is enabled;
  • it does not fail on RHEL7.
Comment by James A Simmons [ 07/May/21 ]

The special crc32 handling is left overs from the RHEL6 days which is why it was removed in newer lustre versions. All the special crc32 handling Lustre did is now apart of the supported kernels.

Comment by Sebastien Buisson [ 10/May/21 ]

Maybe the most straightforward way to get rid of this problem is to not call LIBCFS_HAVE_CRC32 config check in libcfs/autoconf/lustre-libcfs.m4, resulting in built-in crc32 not being used. I tested this quick solution, it works.

A cleaner approach is to backport patch https://review.whamcloud.com/35342 to b2_12, I just pushed a patch for this:
https://review.whamcloud.com/43623

Comment by Sebastien Buisson [ 11/May/21 ]

Patch https://review.whamcloud.com/35342 has been abandoned as we cannot drop RHEL6 support in 2.12.

An even cleaner approach is to fix the root cause of the error:

alg: hash: digest failed on test 1 for crc32-table: ret=126

Errno 126 is:

#define ENOKEY          126     /* Required key not available */

And it appears that crc32 needs to set the CRYPTO_ALG_OPTIONAL_KEY flag to work properly. I will push a patch to fix this.

Comment by Gerrit Updater [ 11/May/21 ]

Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/43653
Subject: LU-14673 sec: annotate algorithms taking optional key
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 2
Commit: cc066c79a26d927e11dcfdef96eb5ab77dc7025a

Comment by Gerrit Updater [ 11/May/21 ]

Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/43656
Subject: LU-14673 sec: annotate algorithms taking optional key
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 9b32694f26424030024a16c91a7e4575d5b281c2

Comment by Gerrit Updater [ 27/May/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/43656/
Subject: LU-14673 sec: annotate algorithms taking optional key
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: b161e7b777e63bb4328aeab9e50560f919fedc31

Comment by Peter Jones [ 27/May/21 ]

Landed for 2.15

Comment by Olaf Faaland [ 01/Jun/21 ]

Keeping the topllnl label until the patch lands to b2_12

Comment by Gerrit Updater [ 27/Jun/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/43653/
Subject: LU-14673 sec: annotate algorithms taking optional key
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 0aefae53fb03eaca7229d9d4b3e48c4a8c33de1a

Generated at Sat Feb 10 03:11:47 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.