Details
-
Bug
-
Resolution: Fixed
-
Minor
-
None
-
None
-
3
-
9223372036854775807
Description
When trying to reproduce the crash for LU-18788 by running the following script/loop with recent master version :
while true ; do mount <Lustre FS with KRB flavor> timeout 20 dd if=<Lustre big file wide striped> of=/dev/null umount lustre_rmmod done
I have triggered the following other crash/GPF :
[4391638.976039] general protection fault, probably for non-canonical address 0x14f4d7cf9ff71fc7: 0000 [#1] PREEMPT SMP NOPTI [4391638.988979] CPU: 108 PID: 1586 Comm: kworker/108:1 Kdump: loaded Tainted: G OE 6.8.0-51-generic #52-Ubuntu [4391639.002029] Hardware name: <Platform Name>, BIOS 1.3.6 01/10/2024 [4391639.010269] Workqueue: events sec_gc_main [ptlrpc] [4391639.016265] RIP: 0010:lprocfs_stats_collect+0x90/0x150 [obdclass] [4391639.023757] Code: b7 6f 14 45 85 ed 74 67 48 63 f6 45 89 ed 4c 89 e2 48 8d 34 b6 4f 8d 04 ec 48 c1 e6 03 48 8b 42 38 48 85 c0 74 38 48 8d 0c 30 <48> 8b 39 48 01 3b 48 8b 79 18 48 01 7b 18 48 8b 49 08 48 3b 4b 08 [4391639.045441] RSP: 0018:ff5e71981e2efbc8 EFLAGS: 00010202 [4391639.051790] RAX: 14f4d7cf9ff71fc7 RBX: ff5e71981e2efc28 RCX: 14f4d7cf9ff71fc7 [4391639.060252] RDX: ff3cc127be0b83c8 RSI: 0000000000000000 RDI: 0000000000000000 [4391639.068714] RBP: ff5e71981e2efbe8 R08: ff3cc127be0b83e0 R09: 0000000000000000 [4391639.077170] R10: 0000000000000000 R11: 0000000000000000 R12: ff3cc127be0b8000 [4391639.085625] R13: 000000000000007c R14: ff3cc231dfd89058 R15: ff3cc231dfd89040 [4391639.094072] FS: 0000000000000000(0000) GS:ff3cc3213d200000(0000) knlGS:0000000000000000 [4391639.103593] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [4391639.110474] CR2: 00005ed672f4e038 CR3: 0000010b05f12004 CR4: 0000000000f71ef0 [4391639.118932] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [4391639.127395] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 [4391639.135851] PKRU: 55555554 [4391639.139335] Call Trace: [4391639.142534] <TASK> [4391639.145332] ? show_regs+0x6d/0x80 [4391639.149602] ? die_addr+0x37/0xa0 [4391639.153774] ? exc_general_protection+0x1db/0x480 [4391639.159511] ? kiblnd_send+0x2aa/0xa70 [ko2iblnd] [4391639.165263] ? asm_exc_general_protection+0x27/0x30 [4391639.171197] ? lprocfs_stats_collect+0x90/0x150 [obdclass] [4391639.177882] ? ptl_send_buf+0x138/0x5b0 [ptlrpc] [4391639.183594] kick_cpu_latency+0xca/0x6c0 [ptlrpc] [4391639.189397] ? __pfx_ptlrpc_master_callback+0x10/0x10 [ptlrpc] [4391639.196468] ptl_send_rpc+0x529/0x14a0 [ptlrpc] [4391639.202068] gss_do_ctx_fini_rpc+0x230/0x510 [ptlrpc_gss] [4391639.208609] gss_cli_ctx_fini_common+0x5d/0x2f0 [ptlrpc_gss] [4391639.215438] ctx_destroy_kr+0x7a/0x2f0 [ptlrpc_gss] [4391639.221391] gss_sec_release_ctx_kr+0x2d/0xd0 [ptlrpc_gss] [4391639.228026] sptlrpc_cli_ctx_put+0x41/0xd0 [ptlrpc] [4391639.234029] sec_process_ctx_list+0xbf/0x1b0 [ptlrpc] [4391639.240226] sec_gc_main+0x1d/0x290 [ptlrpc] [4391639.245543] process_one_work+0x175/0x350 [4391639.250526] worker_thread+0x306/0x440 [4391639.255213] ? _raw_spin_lock_irqsave+0xe/0x20 [4391639.260670] ? __pfx_worker_thread+0x10/0x10 [4391639.265936] kthread+0xef/0x120 [4391639.269923] ? __pfx_kthread+0x10/0x10 [4391639.274606] ret_from_fork+0x44/0x70 [4391639.279094] ? __pfx_kthread+0x10/0x10 [4391639.283771] ret_from_fork_asm+0x1b/0x30
Crash dump analysis has permitted to determine that this occurs in recent PM-QoS related code coming from LU-18446 where OBD dev stats are used to evaluate best time period for low CPUs latency to be kept, but at the time a late KRB-related kthread wants to send an RPC to the server/target, obd->obd_svc_stats has already been freed (and reused, thus causing the crash due to garbage) and pointer was even found being NULL at the time of the crash-dump, as a highly suspected consequence of this umount thread also found in present in crash-dump :
PID: 3715482 TASK: ff3cc22e6001af80 CPU: 220 COMMAND: "umount"
#0 [ff5e7198732e7a28] __schedule at ffffffff8504095c
#1 [ff5e7198732e7ad0] schedule at ffffffff85040dd3
#2 [ff5e7198732e7af0] synchronize_rcu_expedited at ffffffff83fdd03c
#3 [ff5e7198732e7ba0] bdi_unregister at ffffffff841f8770
#4 [ff5e7198732e7bf8] generic_shutdown_super at ffffffff842e7c0d
#5 [ff5e7198732e7c20] kill_anon_super at ffffffff842e82d8
#6 [ff5e7198732e7c40] lustre_kill_super at ffffffffc32a8cf4 [lustre]
#7 [ff5e7198732e7c58] deactivate_locked_super at ffffffff842e91d2
#8 [ff5e7198732e7c78] deactivate_super at ffffffff842e92b6
#9 [ff5e7198732e7c90] cleanup_mnt at ffffffff843197d3
#10 [ff5e7198732e7cc0] __cleanup_mnt at ffffffff84319902
#11 [ff5e7198732e7cd0] task_work_run at ffffffff83f33cde
#12 [ff5e7198732e7cf8] syscall_exit_to_user_mode at ffffffff85030d27
#13 [ff5e7198732e7d20] do_syscall_64 at ffffffff85028fec
RIP: 00007b2b3352a9fb RSP: 00007fffc31c3248 RFLAGS: 00000246
RAX: 0000000000000000 RBX: 00005ed672f464c0 RCX: 00007b2b3352a9fb
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00005ed672f4d990
RBP: 00007fffc31c3320 R8: 00007b2b33603b20 R9: 0000000000000020
R10: 0000000000000001 R11: 0000000000000246 R12: 00005ed672f465c0
R13: 0000000000000000 R14: 00005ed672f4d990 R15: 00005ed672f4e040
ORIG_RAX: 00000000000000a6 CS: 0033 SS: 002b
After looking more in Lustre source code to understand how obd->obd_svc_stats access is usually protected, I have found that before PM-QoS changes its access could only occur thru /sys or /debug external tunables, thus concurrent access protection was only achieved by removing these external tunables, but now some internal protection mechanism must be put in place to prevent such race...