[LU-12780] Avoid using ptlrpc_thread where is in't needed Created: 18/Sep/19 Updated: 26/Feb/21 Resolved: 26/Feb/21 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.15.0 |
| Type: | Improvement | Priority: | Minor |
| Reporter: | Neil Brown | Assignee: | Neil Brown |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
Several places in lustre use 'struct ptlrpc_thread' to "help" manage a kthread. These places are not directly related to ptlrpc, so the naming is not help, and in general the usage of ptlrpc_thread does not improve the code. By removing the use of ptlrpc_thread - particularly by avoiding startup synchornization and by using kthread_stop/ kthread_should_stop(), I can reduce code size by a couple of hundred lines. Using the seemly uniform approach of ptlrpc_thread can lead to an impression that these threads are all managed much the same way, but this is not the case. There is substantial variety on how the threads work and exposing that clear will benefit maintainability.
|
| Comments |
| Comment by Gerrit Updater [ 18/Sep/19 ] |
|
Neil Brown (neilb@suse.de) uploaded a new patch: https://review.whamcloud.com/36225 |
| Comment by Gerrit Updater [ 23/Sep/19 ] |
|
Neil Brown (neilb@suse.de) uploaded a new patch: https://review.whamcloud.com/36257 |
| Comment by Gerrit Updater [ 23/Sep/19 ] |
|
Neil Brown (neilb@suse.de) uploaded a new patch: https://review.whamcloud.com/36265 |
| Comment by Gerrit Updater [ 23/Sep/19 ] |
|
Neil Brown (neilb@suse.de) uploaded a new patch: https://review.whamcloud.com/36268 |
| Comment by Gerrit Updater [ 23/Sep/19 ] |
|
Neil Brown (neilb@suse.de) uploaded a new patch: https://review.whamcloud.com/36267 |
| Comment by Gerrit Updater [ 23/Sep/19 ] |
|
Neil Brown (neilb@suse.de) uploaded a new patch: https://review.whamcloud.com/36258 |
| Comment by Gerrit Updater [ 23/Sep/19 ] |
|
Neil Brown (neilb@suse.de) uploaded a new patch: https://review.whamcloud.com/36259 |
| Comment by Gerrit Updater [ 23/Sep/19 ] |
|
Neil Brown (neilb@suse.de) uploaded a new patch: https://review.whamcloud.com/36262 |
| Comment by Gerrit Updater [ 23/Sep/19 ] |
|
Neil Brown (neilb@suse.de) uploaded a new patch: https://review.whamcloud.com/36260 |
| Comment by Gerrit Updater [ 23/Sep/19 ] |
|
Neil Brown (neilb@suse.de) uploaded a new patch: https://review.whamcloud.com/36261 |
| Comment by Gerrit Updater [ 23/Sep/19 ] |
|
Neil Brown (neilb@suse.de) uploaded a new patch: https://review.whamcloud.com/36263 |
| Comment by Gerrit Updater [ 23/Sep/19 ] |
|
Neil Brown (neilb@suse.de) uploaded a new patch: https://review.whamcloud.com/36264 |
| Comment by Gerrit Updater [ 23/Sep/19 ] |
|
Neil Brown (neilb@suse.de) uploaded a new patch: https://review.whamcloud.com/36266 |
| Comment by Gerrit Updater [ 23/Oct/19 ] |
|
Neil Brown (neilb@suse.de) uploaded a new patch: https://review.whamcloud.com/36556 |
| Comment by Gerrit Updater [ 06/Dec/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36257/ |
| Comment by Alex Zhuravlev [ 24/Jan/20 ] |
|
this seem to be related? BUG: unable to handle kernel paging request at ffff88012e9ac5e8 PGD 368b067 P4D 368b067 PUD 170b75067 PMD 170a00067 PTE 800ffffed1653060 Oops: 0000 [#1] SMP DEBUG_PAGEALLOC CPU: 0 PID: 8522 Comm: osp-syn-1-1 Tainted: G W O --------- --- 4.18.0 #32 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 RIP: 0010:native_queued_spin_lock_slowpath+0xa/0x1b0 Code: 52 b8 01 00 00 00 31 d2 f0 0f b0 17 3c 01 75 02 5a c3 56 0f b6 f0 e8 c5 ff ff ff 5e 5a c3 66 90 0f 1f 44 00 00 ba 01 00 00 00 <8b> 07 85 c0 75 09 f0 0f b1 17 85 c0 75 f2 c3 f3 90 eb ed 81 fe 00 RSP: 0018:ffff880153abfdd8 EFLAGS: 00010002 RAX: 000000006b6b6b6b RBX: ffff88012e9ac5e8 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff88012e9ac5e8 RBP: ffff88012e9ac5e8 R08: 0000000000000001 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: ffff88012e9ac600 R13: 0000000000000282 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88016ae00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff88012e9ac5e8 CR3: 000000012342b000 CR4: 00000000000006b0 Call Trace: do_raw_spin_lock+0x85/0x90 _raw_spin_lock_irqsave+0x7c/0x80 __wake_up_common_lock+0x4f/0x90 osp_sync_thread+0x432/0xa00 [osp] kthread+0x100/0x140 ? osp_sync_process_committed+0x980/0x980 [osp] ? kthread_flush_work_fn+0x10/0x10 ret_from_fork+0x24/0x30 this patch has been applied long ago:
LU-12780 osp: use native kthreads for opd_pre_thread
rather than ptlrpc_thread, use native kthreads functionality.
|
| Comment by Alex Zhuravlev [ 29/Jan/20 ] |
|
another report seems to be related here: https://testing.whamcloud.com/test_sessions/ec5e5eb5-7924-45be-b30f-f61c4fa789e0 [11688.746654] Lustre: DEBUG MARKER: == sanity test 154g: various llapi FID tests ========================================================= 13:00:19 (1580043619) [11728.167540] LustreError: 15355:0:(llog_cat.c:411:llog_cat_id2handle()) lustre-OST0000-osc-MDT0000: error opening log id [0x14a10:0x1:0x0]:0: rc = -2 [11728.170363] LustreError: 15355:0:(osp_sync.c:1317:osp_sync_thread()) ASSERTION( thread->t_flags != SVC_RUNNING ) failed: 1 changes, 0 in progress, 0 in flight [11728.172819] LustreError: 15355:0:(osp_sync.c:1317:osp_sync_thread()) LBUG [11728.174009] Pid: 15355, comm: osp-syn-0-0 3.10.0-1062.9.1.el7_lustre.x86_64 #1 SMP Sun Jan 26 08:12:08 UTC 2020 [11728.175778] Call Trace: [11728.176263] [<ffffffffc09868ac>] libcfs_call_trace+0x8c/0xc0 [libcfs] [11728.177527] [<ffffffffc098695c>] lbug_with_loc+0x4c/0xa0 [libcfs] [11728.178659] [<ffffffffc18a99e7>] osp_sync_thread+0xb17/0xb20 [osp] [11728.179823] [<ffffffff844c61f1>] kthread+0xd1/0xe0 [11728.180739] [<ffffffff84b8dd37>] ret_from_fork_nospec_end+0x0/0x39 [11728.181961] [<ffffffffffffffff>] 0xffffffffffffffff [11728.183118] Kernel panic - not syncing: LBUG [11728.183892] CPU: 1 PID: 15355 Comm: osp-syn-0-0 Kdump: loaded Tainted: P OE ------------ 3.10.0-1062.9.1.el7_lustre.x86_64 #1 [11728.186387] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [11728.187518] Call Trace: [11728.188077] [<ffffffff84b7ac23>] dump_stack+0x19/0x1b [11728.189229] [<ffffffff84b74967>] panic+0xe8/0x21f [11728.190230] [<ffffffffc09869ab>] lbug_with_loc+0x9b/0xa0 [libcfs] [11728.191518] [<ffffffffc18a99e7>] osp_sync_thread+0xb17/0xb20 [osp] [11728.192783] [<ffffffff844d4ffe>] ? finish_task_switch+0x4e/0x1c0 [11728.194071] [<ffffffff84b805a2>] ? __schedule+0x402/0x840 [11728.195164] [<ffffffffc18a8ed0>] ? osp_sync_process_committed+0xa90/0xa90 [osp] [11728.196746] [<ffffffff844c61f1>] kthread+0xd1/0xe0 [11728.197750] [<ffffffff844c6120>] ? insert_kthread_work+0x40/0x40 [11728.198987] [<ffffffff84b8dd37>] ret_from_fork_nospec_begin+0x21/0x21 [11728.200278] [<ffffffff844c6120>] ? insert_kthread_work+0x40/0x40 |
| Comment by James A Simmons [ 29/Jan/20 ] |
|
https://review.whamcloud.com/#/c/36265/ has not landed to master. Only patch in this series that landed is 2bf7e6baa4947f96b7fee6ee625d695e2f34c064 which remove the unused flag SVC_EVENT. |
| Comment by Neil Brown [ 29/Jan/20 ] |
|
The patch is more likely to fix these bugs than cause them. I think I can see how the second might happen. Maybe the first too, though one task would have to get quite a lot done while another task does almost nothing.
|
| Comment by Gerrit Updater [ 25/Feb/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36258/ |
| Comment by Gerrit Updater [ 25/Feb/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36259/ |
| Comment by Gerrit Updater [ 01/Mar/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36263/ |
| Comment by Gerrit Updater [ 05/Mar/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36264/ |
| Comment by Gerrit Updater [ 17/Mar/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36261/ |
| Comment by Gerrit Updater [ 24/Mar/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36266/ |
| Comment by Gerrit Updater [ 24/Mar/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36268/ |
| Comment by Gerrit Updater [ 31/Mar/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36267/ |
| Comment by Gerrit Updater [ 07/Apr/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36556/ |
| Comment by Gerrit Updater [ 14/Apr/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36265/ |
| Comment by Gerrit Updater [ 07/May/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36260/ |
| Comment by Gerrit Updater [ 15/May/20 ] |
|
Neil Brown (neilb@suse.de) uploaded a new patch: https://review.whamcloud.com/38612 |
| Comment by Gerrit Updater [ 04/Jun/20 ] |
|
Neil Brown (neilb@suse.de) uploaded a new patch: https://review.whamcloud.com/38824 |
| Comment by Gerrit Updater [ 06/Jun/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38612/ |
| Comment by Gerrit Updater [ 18/Jun/20 ] |
|
Neil Brown (neilb@suse.de) uploaded a new patch: https://review.whamcloud.com/38974 |
| Comment by Gerrit Updater [ 23/Jun/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38974/ |
| Comment by Cory Spitz [ 24/Jun/20 ] |
|
From https://review.whamcloud.com/38974: If two of these updates can race, corruption can occurs Are there any known bugs with scrub that are know to be fixed or might be fixed by the change? |
| Comment by Neil Brown [ 01/Jul/20 ] |
|
(oops, I meant to reply to the above, but inadvertently editted the comment instead. I didn't know I could do that, but think I've put it back properly).
> Are there any known bugs with scrub that are know to be fixed or might be fixed by the change? I didn't find any clear races, but I didn't look very hard. It is probably that the flags are changed very rarely, so any races that are possible are extremely unlikely. My main reason for the patch was that I was going to add another flag which could change asynchronously, and I didn't think that would be safe without proper locking.
|
| Comment by Gerrit Updater [ 01/Sep/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38824/ |
| Comment by Gerrit Updater [ 26/Feb/21 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36262/ |
| Comment by Peter Jones [ 26/Feb/21 ] |
|
Landed for 2.15 |