Details
-
Bug
-
Resolution: Fixed
-
Minor
-
None
-
None
-
3
-
9223372036854775807
Description
A recent patch to update_trans.c changed how distribute_txn_thread() waited for more work to do.
It previously had an explicit "wait_event()" which listed all the conditions to wait for. It would then recheck each condition and possibly perform an appropriate action.
It was changed to check each condition only once (per loop). If the condition was true, the action would be performed and a flag set. If no conditions were true (indicated by flag), it would wait, otherwise it would loop and recheck all condition.
One of the "if (condition) { do work }" stanzas in the loop tested a condition that was not a condition that should wake up the loop. "batchid" was not tested at all in the wait_event(). The flag mentioned above was, however, set when that condition tested true.
This can cause the loop to spin indefinitely.
The "__set_current_state(TASK_RUNNING);" should be removed so that the value
of batchid cannot stop the loop from sleeping (calling 'schedule()').
Attachments
Issue Links
- is related to
-
LU-12780 Avoid using ptlrpc_thread where is in't needed
-
- Resolved
-
I guess the following warning is related:
[ 62.515602] ------------[ cut here ]------------ [ 62.519116] do not call blocking ops when !TASK_RUNNING; state=402 set at [<00000000a9f5c02a>] distribute_txn_commit_thread+0x5c/0x12f0 [ptlrpc] [ 62.519405] WARNING: CPU: 0 PID: 5426 at kernel/sched/core.c:7438 __might_sleep+0x5d/0x70 [ 62.519509] Modules linked in: zfs(O) zunicode(O) zzstd(O) zlua(O) zcommon(O) znvpair(O) zavl(O) icp(O) spl(O) lustre(O) osp(O) ofd(O) lod(O) mdt(O) mdd(O) mgs(O) osd_ldiskfs(O) ldiskfs(O) lquota(O) lfsck(O) obdecho(O) mgc(O) mdc(O) lov(O) osc(O) lmv(O) fid(O) fld(O) ptlrpc(O) obdclass(O) ksocklnd(O) lnet(O) libcfs(O) [ 62.519850] CPU: 0 PID: 5426 Comm: dist_txn-0 Tainted: G W O --------- - - 4.18.0 #11 [ 62.519959] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-1.fc39 04/01/2014 [ 62.520075] RIP: 0010:__might_sleep+0x5d/0x70 [ 62.520141] Code: ee 48 89 df 5b 5d 41 5c e9 70 fe ff ff 48 8b 90 d0 1a 00 00 48 c7 c7 00 28 e2 a8 c6 05 2f a4 46 01 01 48 89 d1 e8 49 58 fd ff <0f> 0b eb ce 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 55 65 48 [ 62.520362] RSP: 0018:ffff914fbe7c3e10 EFLAGS: 00010282 [ 62.520426] RAX: 0000000000000084 RBX: ffffffffa8e38be3 RCX: 0000000000000001 [ 62.520518] RDX: 0000000080000001 RSI: ffffffffa8e39811 RDI: 00000000ffffffff [ 62.520610] RBP: 00000000000000e2 R08: 0000000000000000 R09: 0000000000000000 [ 62.520701] R10: ffff914fbe7c3c58 R11: ffff914fbe7c3c50 R12: 0000000000000000 [ 62.520793] R13: 0000000000608040 R14: 0000000000000058 R15: ffff914fe377ad00 [ 62.520885] FS: 0000000000000000(0000) GS:ffff915073800000(0000) knlGS:0000000000000000 [ 62.520977] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 62.521061] CR2: 00007fa2d06618a0 CR3: 0000000061011002 CR4: 0000000000370eb0 [ 62.521154] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 62.521247] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 62.521338] Call Trace: [ 62.521374] kmem_cache_alloc_trace+0x1cd/0x2a0 [ 62.521443] ? sub_updates_write+0x1360/0x1360 [ptlrpc] [ 62.521600] distribute_txn_commit_thread+0x64e/0x12f0 [ptlrpc] [ 62.521757] ? rcu_read_lock_sched_held+0xe/0x60 [ 62.521824] ? lock_release+0x20c/0x2d0 [ 62.521874] ? trace_hardirqs_on+0x1c/0xe0 [ 62.521926] ? sub_updates_write+0x1360/0x1360 [ptlrpc] [ 62.522074] kthread+0x16e/0x1a0 [ 62.522127] ? set_kthread_struct+0x40/0x40 [ 62.522177] ret_from_fork+0x24/0x30