[LU-17418] Lustre crashed immediately after load with debug kernel Created: 11/Jan/24  Updated: 25/Jan/24

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Blocker
Reporter: Alexey Lyashkov Assignee: James A Simmons
Resolution: Unresolved Votes: 0
Labels: None
Environment:

RHEL 8.5 debug kernel


Issue Links:
Related
is related to LU-14428 Convert tracefile to use ring_buffer ... Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

LU-9859 libcfs: refactor libcfs initialization. had removed libcfs debug init from libcfs module init itself.
it caused a panic if debug settings will applied after module load.

crash7> bt
PID: 11558  TASK: ffff888113890000  CPU: 1   COMMAND: "lt-lctl"
 #0 [ffff88810ab5f848] __show_regs at ffffffff8186a0c5
 #1 [ffff88810ab5f8d0] general_protection at ffffffff83a0115e
    [exception RIP: cfs_trace_lock_tcd+37]
    RIP: ffffffffc1495a95  RSP: ffff88810ab5f988  RFLAGS: 00010296
    RAX: dffffc0000000000  RBX: 00000000000000c0  RCX: ffffffffc14d3580
    RDX: 0000000000000029  RSI: 0000000000000000  RDI: 000000000000014c
    RBP: ffff88810ab5fbb0   R8: 0000000000000001   R9: 00000000130e0580
    R10: ffff88810ab5fbc8  R11: ffffed102270b633  R12: 00000000000000c0
    R13: ffffffffc14902a0  R14: 0000000000000001  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #2 [ffff88810ab5f998] libcfs_debug_msg at ffffffffc1496bec [libcfs]
 #3 [ffff88810ab5fbc0] cfs_str2mask at ffffffffc149e1b5 [libcfs]
 #4 [ffff88810ab5fc30] libcfs_debug_str2mask at ffffffffc149117b [libcfs]
 #5 [ffff88810ab5fd20] proc_dobitmasks at ffffffffc1492c90 [libcfs]
 #6 [ffff88810ab5fd70] lnet_debugfs_write at ffffffffc1491d44 [libcfs]
 #7 [ffff88810ab5fe00] full_proxy_write at ffffffff8242f903
 #8 [ffff88810ab5fe48] vfs_write at ffffffff8220ca57
 #9 [ffff88810ab5fe88] ksys_write at ffffffff8220d218
#10 [ffff88810ab5ff28] do_syscall_64 at ffffffff81809865
#11 [ffff88810ab5ff50] entry_SYSCALL_64_after_hwframe at ffffffff83a000b2
    RIP: 00007fed5c5e5915  RSP: 00007ffdaaa3f9c8  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 00007ffdaaa43422  RCX: 00007fed5c5e5915
    RDX: 000000000000001c  RSI: 00007ffdaaa43422  RDI: 0000000000000003
    RBP: 0000000000000003   R8: 0000000000000800   R9: 0000000000000003
    R10: 000000000000000b  R11: 0000000000000246  R12: 00007ffdaaa41bb8
    R13: 00000000017252a0  R14: 00000000017252e0  R15: 0000000000000000
    ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b

lets restore old behavior.



 Comments   
Comment by Peter Jones [ 11/Jan/24 ]

James

I think that this was your change being referenced. Would this same crash still happen on a more current kernel? RHEL 9.3 is the primary kernel targeted for the 2.16 release...

Peter

Comment by Alexey Lyashkov [ 11/Jan/24 ]

Peter,

I have a patch and it posted today.

From: "Mr. NeilBrown" <neilb@suse.de>
Date: Wed, 8 Nov 2023 21:15:09 -0500
Subject: [PATCH] LU-9859 libcfs: refactor libcfs initialization.
Linux-commit: 64bf0b1a079d61e9e059b9dc7a58e064c7d994ae

Change-Id: I6b5ecdba0defc6e033f78d8fc2b9be9e26c7f720
Signed-off-by: Mr. NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52700
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>

Comment by Peter Jones [ 11/Jan/24 ]

Sorry I am not following - can you please post a link to the patch in Gerrit? Nothing has been auto-commented in Jira and I cannot find anything searching on the LU-17418 reference...

Comment by Gerrit Updater [ 11/Jan/24 ]

"Alexey Lyashkov <alexey.lyashkov@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53650
Subject: LU-17418 libcfs: fix libcfs init.
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 49837ac0c06e559aa81917cc92add6382aa3b78d

Comment by Gerrit Updater [ 25/Jan/24 ]

"Alexey Lyashkov <alexey.lyashkov@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53821
Subject: LU-17418 libcfs: fix libcfs init.
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 1174e98b1d7ac4047ab3e45bb4b3f1af9414ac4d

Comment by James A Simmons [ 25/Jan/24 ]

Patch https://review.whamcloud.com/c/fs/lustre-release/+/41664 does resolve this issue.

Comment by Gerrit Updater [ 25/Jan/24 ]

"jsimmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53825
Subject: LU-17418 tests: test lctl set_param when only libcfs.ko loaded
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: dc82487403c6de947237764ae40a16f10f9d064e

Generated at Sat Feb 10 03:35:17 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.