Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.17.0
-
RHEL8 running with debug kernel.
-
3
-
9223372036854775807
Description
The function distribute_txn_commit_batchid_update() is called in an atomic context and its memory allocation is using the wrong flags so it could sleep.
https://testing.whamcloud.com/gerrit-janitor/54439/testresults/sanity3-ldiskfs-DNE-rocky8.10_x86_64-rocky8.10_x86_64/
Lustre: DEBUG MARKER: == sanity test 60g: transaction abort won't cause MDT hung === 01:29:59 (1737268199) Lustre: *** cfs_fail_loc=19a, val=0*** ------------[ cut here ]------------ do not call blocking ops when !TASK_RUNNING; state=402 set at [<00000000878da86f>] distribute_txn_commit_thread+0x98/0x1020 [ptlrpc] WARNING: CPU: 0 PID: 11190 at kernel/sched/core.c:7471 __might_sleep+0x9d/0xc0 CPU: 0 PID: 11190 Comm: dist_txn-1 4.18.0rh8.10-debug #7 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc38 04/01/2014 [ 438.807736] RIP: 0010:__might_sleep+0x9d/0xc0 Call Trace: ? __might_sleep+0x9d/0xc0 ? report_bug+0x113/0x140 ? do_error_trap+0xb6/0x130 ? do_invalid_op+0x46/0x60 ? __might_sleep+0x9d/0xc0 ? invalid_op+0x14/0x20 ? distribute_txn_commit_batchid_update+0x68/0xa90 [ptlrpc] ? __might_sleep+0x9d/0xc0 ? __might_sleep+0x95/0xc0 slab_pre_alloc_hook.constprop.59+0x13d/0x1f0 kmem_cache_alloc_trace+0x5b/0x380 distribute_txn_commit_batchid_update+0x68/0xa90 [ptlrpc] distribute_txn_commit_thread+0xab5/0x1020 [ptlrpc] kthread+0x1d7/0x210