Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
BUG: unable to handle kernel NULL pointer dereference at 0000000000000160 in conf-sanity / 76c
Trace:
PID: 317366 TASK: ffffa2bfa2c219c0 CPU: 0 COMMAND: "lctl"
#0 [ffffa2bfa2c27c68] panic at ffffffffaf0b9786
/tmp/kernel/kernel/panic.c: 299
#1 [ffffa2bfa2c27d00] no_context at ffffffffaf0a9563
/tmp/kernel/arch/x86/mm/fault.c: 799
#2 [ffffa2bfa2c27d50] page_fault at ffffffffaf600f0e
/tmp/kernel/arch/x86/entry/entry_64.S: 1220
[exception RIP: mdd_changelog_recalc_mask+212]
RIP: ffffffffc0ce40a4 RSP: ffffa2bfa2c27e00 RFLAGS: 00010286
RAX: ffffffffffffffff RBX: ffffa2bf73145e00 RCX: 0000000000000000
RDX: 0000000000000a2f RSI: 0000000000000000 RDI: ffffa2bf765ec950
RBP: ffffa2bf6a67ec00 R8: ffffa2bfa2c27c28 R9: 0000000000000a6c
R10: 0000000000000000 R11: ffffa2bf8cd05a6b R12: ffffa2bf765ec950
R13: ffffa2bfa2c27e40 R14: ffffa2bf8d508000 R15: ffffa2bfa2c27f10
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
/home/lustre/master-mine/lustre/mdd/mdd_device.c: 1889
#3 [ffffa2bfa2c27e30] mdd_changelog_mask_seq_write at ffffffffc0d098e2 [mdd]
/home/lustre/master-mine/lustre/mdd/mdd_lproc.c: 174
#4 [ffffa2bfa2c27ea0] full_proxy_write at ffffffffaf2e6a7b
/tmp/kernel/fs/debugfs/file.c: 230
#5 [ffffa2bfa2c27ed8] vfs_write at ffffffffaf1cffc9
/tmp/kernel/fs/read_write.c: 550
#6 [ffffa2bfa2c27f08] ksys_write at ffffffffaf1d021d
/tmp/kernel/fs/read_write.c: 599
#7 [ffffa2bfa2c27f38] do_syscall_64 at ffffffffaf001893
/tmp/kernel/arch/x86/entry/common.c: 302
#8 [ffffa2bfa2c27f50] entry_SYSCALL_64_after_hwframe at ffffffffaf600099
/tmp/kernel/arch/x86/entry/entry_64.S: 151
RIP: 00007f8dd57e5915 RSP: 00007ffd59725268 RFLAGS: 00000246
RAX: ffffffffffffffda RBX: 00000000006822f0 RCX: 00007f8dd57e5915
RDX: 0000000000000006 RSI: 00007ffd59728f32 RDI: 0000000000000003
RBP: 0000000000000001 R8: 000000000000fc00 R9: 0000000000000001
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003
R13: 000000000068a301 R14: 0000000000682440 R15: 00007ffd5972740c
ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
/home/lustre/master-mine/lustre/mdd/mdd_device.c: 1889 0xffffffffc0ce40a0 <mdd_changelog_recalc_mask+208>: mov 0x30(%rbx),%rsi 0xffffffffc0ce40a4 <mdd_changelog_recalc_mask+212>: mov 0x160(%rsi),%rax
crash> p *(struct llog_ctxt *)0xffffa2bf73145e00
$1 = {
loc_idx = 14,
loc_obd = 0xffffa2bf765ec188,
loc_olg = 0xffffa2bf765ec868,
loc_exp = 0xffffa2bf6308f800,
loc_imp = 0x0,
loc_logops = 0xffffffffc0d470c0 <changelog_orig_logops>,
loc_handle = 0x0,
just before the panic another process (mount) started to initialize the changelog:
00000004:00000080:1.0:1677245239.833125:0:317339:0:(mdd_device.c:548:mdd_changelog_llog_init()) changelog starting index=0 00000040:00000001:1.0:1677245239.833125:0:317339:0:(llog_obd.c:150:llog_setup()) Process entered 00000040:00000010:1.0:1677245239.833127:0:317339:0:(llog_obd.c:44:llog_new_ctxt()) kmalloced '(ctxt)': 264 at 00000000ccbe3063. 00000020:00000040:1.0:1677245239.833128:0:317339:0:(genops.c:970:class_export_get()) GET export 00000000c0fe929f refcount=3 00000040:00000001:1.0:1677245239.833128:0:317339:0:(llog_osd.c:1919:llog_osd_setup()) Process entered 00000040:00000040:1.0:1677245239.833129:0:317339:0:(lustre_log.h:395:llog_ctxt_get()) GETting ctxt 00000000ccbe3063 : new refcount 2 00000020:00000001:1.0:1677245239.833129:0:317339:0:(local_storage.c:842:local_oid_storage_init()) Process entered 00000020:00000001:1.0:1677245239.833130:0:317339:0:(local_storage.c:147:ls_device_get()) Process entered 00000020:00000001:1.0:1677245239.833130:0:317339:0:(local_storage.c:152:ls_device_get()) Process leaving via out_ls (rc=18446641542280932352 : -102531428619264 : 0xffffa2bf8a9e6c00)
tos this is a race obviously, though I don't quite understand how lctl was able to start before MDS mount completion.