[LU-17418] Lustre crashed immediately after load with debug kernel Created: 11/Jan/24 Updated: 25/Jan/24 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Alexey Lyashkov | Assignee: | James A Simmons |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Environment: |
RHEL 8.5 debug kernel |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
LU-9859 libcfs: refactor libcfs initialization. had removed libcfs debug init from libcfs module init itself. crash7> bt
PID: 11558 TASK: ffff888113890000 CPU: 1 COMMAND: "lt-lctl"
#0 [ffff88810ab5f848] __show_regs at ffffffff8186a0c5
#1 [ffff88810ab5f8d0] general_protection at ffffffff83a0115e
[exception RIP: cfs_trace_lock_tcd+37]
RIP: ffffffffc1495a95 RSP: ffff88810ab5f988 RFLAGS: 00010296
RAX: dffffc0000000000 RBX: 00000000000000c0 RCX: ffffffffc14d3580
RDX: 0000000000000029 RSI: 0000000000000000 RDI: 000000000000014c
RBP: ffff88810ab5fbb0 R8: 0000000000000001 R9: 00000000130e0580
R10: ffff88810ab5fbc8 R11: ffffed102270b633 R12: 00000000000000c0
R13: ffffffffc14902a0 R14: 0000000000000001 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#2 [ffff88810ab5f998] libcfs_debug_msg at ffffffffc1496bec [libcfs]
#3 [ffff88810ab5fbc0] cfs_str2mask at ffffffffc149e1b5 [libcfs]
#4 [ffff88810ab5fc30] libcfs_debug_str2mask at ffffffffc149117b [libcfs]
#5 [ffff88810ab5fd20] proc_dobitmasks at ffffffffc1492c90 [libcfs]
#6 [ffff88810ab5fd70] lnet_debugfs_write at ffffffffc1491d44 [libcfs]
#7 [ffff88810ab5fe00] full_proxy_write at ffffffff8242f903
#8 [ffff88810ab5fe48] vfs_write at ffffffff8220ca57
#9 [ffff88810ab5fe88] ksys_write at ffffffff8220d218
#10 [ffff88810ab5ff28] do_syscall_64 at ffffffff81809865
#11 [ffff88810ab5ff50] entry_SYSCALL_64_after_hwframe at ffffffff83a000b2
RIP: 00007fed5c5e5915 RSP: 00007ffdaaa3f9c8 RFLAGS: 00000246
RAX: ffffffffffffffda RBX: 00007ffdaaa43422 RCX: 00007fed5c5e5915
RDX: 000000000000001c RSI: 00007ffdaaa43422 RDI: 0000000000000003
RBP: 0000000000000003 R8: 0000000000000800 R9: 0000000000000003
R10: 000000000000000b R11: 0000000000000246 R12: 00007ffdaaa41bb8
R13: 00000000017252a0 R14: 00000000017252e0 R15: 0000000000000000
ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
lets restore old behavior. |
| Comments |
| Comment by Peter Jones [ 11/Jan/24 ] |
|
James I think that this was your change being referenced. Would this same crash still happen on a more current kernel? RHEL 9.3 is the primary kernel targeted for the 2.16 release... Peter |
| Comment by Alexey Lyashkov [ 11/Jan/24 ] |
|
Peter, I have a patch and it posted today. From: "Mr. NeilBrown" <neilb@suse.de> Change-Id: I6b5ecdba0defc6e033f78d8fc2b9be9e26c7f720 |
| Comment by Peter Jones [ 11/Jan/24 ] |
|
Sorry I am not following - can you please post a link to the patch in Gerrit? Nothing has been auto-commented in Jira and I cannot find anything searching on the LU-17418 reference... |
| Comment by Gerrit Updater [ 11/Jan/24 ] |
|
"Alexey Lyashkov <alexey.lyashkov@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53650 |
| Comment by Gerrit Updater [ 25/Jan/24 ] |
|
"Alexey Lyashkov <alexey.lyashkov@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53821 |
| Comment by James A Simmons [ 25/Jan/24 ] |
|
Patch https://review.whamcloud.com/c/fs/lustre-release/+/41664 does resolve this issue. |
| Comment by Gerrit Updater [ 25/Jan/24 ] |
|
"jsimmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53825 |