[LU-12780] Avoid using ptlrpc_thread where is in't needed Created: 18/Sep/19  Updated: 26/Feb/21  Resolved: 26/Feb/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.15.0

Type: Improvement Priority: Minor
Reporter: Neil Brown Assignee: Neil Brown
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-13986 livelock is possible in distribute_tx... Resolved
is related to LU-12511 Prepare lustre for adoption into the ... Open
Rank (Obsolete): 9223372036854775807

 Description   

Several places in lustre use 'struct ptlrpc_thread' to "help" manage a kthread.

These places are not directly related to ptlrpc, so the naming is not help, and in general the usage of ptlrpc_thread does not improve the code.

By removing the use of ptlrpc_thread - particularly by avoiding startup synchornization and by using kthread_stop/ kthread_should_stop(), I can reduce code size by a couple of hundred lines.

Using the seemly uniform approach of ptlrpc_thread can lead to an impression that these threads are all managed much the same way, but this is not the case.  There is substantial variety on how the threads work and exposing that clear will benefit maintainability.

 



 Comments   
Comment by Gerrit Updater [ 18/Sep/19 ]

Neil Brown (neilb@suse.de) uploaded a new patch: https://review.whamcloud.com/36225
Subject: LU-12780 lustre: combined patch - testing only.
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 5ee58cedcfec59bdb41d0cf965d250d6353095e2

Comment by Gerrit Updater [ 23/Sep/19 ]

Neil Brown (neilb@suse.de) uploaded a new patch: https://review.whamcloud.com/36257
Subject: LU-12780 lustre: remove SVC_EVENT flag
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 32e077dfd1256e31940179a0d06bb7205db52943

Comment by Gerrit Updater [ 23/Sep/19 ]

Neil Brown (neilb@suse.de) uploaded a new patch: https://review.whamcloud.com/36265
Subject: LU-12780 osp: don't use ptlrpc_thread for osp_sync_thread()
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 92c57c2f0405b6ae4b5df2f383b850f482e61013

Comment by Gerrit Updater [ 23/Sep/19 ]

Neil Brown (neilb@suse.de) uploaded a new patch: https://review.whamcloud.com/36268
Subject: LU-12780 quota: don't use ptlrpc_thread for qsd_reint_main()
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: dd9ba27b53730fd064a7080f333130ac83537aad

Comment by Gerrit Updater [ 23/Sep/19 ]

Neil Brown (neilb@suse.de) uploaded a new patch: https://review.whamcloud.com/36267
Subject: LU-12780 quota: don't use ptlrpc_thread for qsd_upd_thread()
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 31c8ba57c066726f81cfc0f36709d74f8d103e3e

Comment by Gerrit Updater [ 23/Sep/19 ]

Neil Brown (neilb@suse.de) uploaded a new patch: https://review.whamcloud.com/36258
Subject: LU-12780 llite: don't use ptlrpc_thread for sai_agl_thread
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: a7108c244dd91e36b233c7afc34e69f413768957

Comment by Gerrit Updater [ 23/Sep/19 ]

Neil Brown (neilb@suse.de) uploaded a new patch: https://review.whamcloud.com/36259
Subject: LU-12780 llite: avoid ptlrpc_thread for ll_statahead_thread
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 9ce2b04b62cfa9e2037c87f572bcc00fe4478d96

Comment by Gerrit Updater [ 23/Sep/19 ]

Neil Brown (neilb@suse.de) uploaded a new patch: https://review.whamcloud.com/36262
Subject: LU-12780 ofd: don't use ptlrpc_thread
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 0b8588b501423bb4eb25cd2cef0c396dc4516f5e

Comment by Gerrit Updater [ 23/Sep/19 ]

Neil Brown (neilb@suse.de) uploaded a new patch: https://review.whamcloud.com/36260
Subject: LU-12780 target: don't use ptlrpc_thread for txn_commit_thread
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: e7f15fceb5ff7dda65fbd52d8500f2bfacced7a1

Comment by Gerrit Updater [ 23/Sep/19 ]

Neil Brown (neilb@suse.de) uploaded a new patch: https://review.whamcloud.com/36261
Subject: LU-12780 lod: don't use ptlrpc_thread for recovery thread
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: b72bd96a2980417ffd39782fadd1b5761a7b86a1

Comment by Gerrit Updater [ 23/Sep/19 ]

Neil Brown (neilb@suse.de) uploaded a new patch: https://review.whamcloud.com/36263
Subject: LU-12780 osp: use native kthreads for opd_pre_thread
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: a67a5ede47eda9674cb44cb80f28d5e07512d35e

Comment by Gerrit Updater [ 23/Sep/19 ]

Neil Brown (neilb@suse.de) uploaded a new patch: https://review.whamcloud.com/36264
Subject: LU-12780 osp: don't use ptlrpc_thread for opd_update_thread
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: f1434a35c7b9e4acaa8a74a6cd192d940b1e43a7

Comment by Gerrit Updater [ 23/Sep/19 ]

Neil Brown (neilb@suse.de) uploaded a new patch: https://review.whamcloud.com/36266
Subject: LU-12780 osp: don't use ptlrpc_thread for lwp_notify_main()
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 545cecf91b6e79ea0d5cb16eb3b23b0162535476

Comment by Gerrit Updater [ 23/Oct/19 ]

Neil Brown (neilb@suse.de) uploaded a new patch: https://review.whamcloud.com/36556
Subject: LU-12780 quota: don't use ptlrpc_thread for qmt_reba_thread
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 6516b4ab67b71dea68ad403ee2158dc109b13292

Comment by Gerrit Updater [ 06/Dec/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36257/
Subject: LU-12780 lustre: remove SVC_EVENT flag
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 2bf7e6baa4947f96b7fee6ee625d695e2f34c064

Comment by Alex Zhuravlev [ 24/Jan/20 ]

this seem to be related?

BUG: unable to handle kernel paging request at ffff88012e9ac5e8
PGD 368b067 P4D 368b067 PUD 170b75067 PMD 170a00067 PTE 800ffffed1653060
Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
CPU: 0 PID: 8522 Comm: osp-syn-1-1 Tainted: G        W  O     --------- ---  4.18.0 #32
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
RIP: 0010:native_queued_spin_lock_slowpath+0xa/0x1b0
Code: 52 b8 01 00 00 00 31 d2 f0 0f b0 17 3c 01 75 02 5a c3 56 0f b6 f0 e8 c5 ff ff ff 5e 5a c3 66 90 0f 1f 44 00 00 ba 01 00 00 00 <8b> 07 85 c0 75 09 f0 0f b1 17 85 c0 75 f2 c3 f3 90 eb ed 81 fe 00
RSP: 0018:ffff880153abfdd8 EFLAGS: 00010002
RAX: 000000006b6b6b6b RBX: ffff88012e9ac5e8 RCX: 0000000000000000
RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff88012e9ac5e8
RBP: ffff88012e9ac5e8 R08: 0000000000000001 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000000 R12: ffff88012e9ac600
R13: 0000000000000282 R14: 0000000000000000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88016ae00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff88012e9ac5e8 CR3: 000000012342b000 CR4: 00000000000006b0
Call Trace:
 do_raw_spin_lock+0x85/0x90
 _raw_spin_lock_irqsave+0x7c/0x80
 __wake_up_common_lock+0x4f/0x90
 osp_sync_thread+0x432/0xa00 [osp]
 kthread+0x100/0x140
 ? osp_sync_process_committed+0x980/0x980 [osp]
 ? kthread_flush_work_fn+0x10/0x10
 ret_from_fork+0x24/0x30

this patch has been applied long ago:

    LU-12780 osp: use native kthreads for opd_pre_thread
    
    rather than ptlrpc_thread, use native kthreads functionality.
Comment by Alex Zhuravlev [ 29/Jan/20 ]

another report seems to be related here: https://testing.whamcloud.com/test_sessions/ec5e5eb5-7924-45be-b30f-f61c4fa789e0

[11688.746654] Lustre: DEBUG MARKER: == sanity test 154g: various llapi FID tests ========================================================= 13:00:19 (1580043619)
[11728.167540] LustreError: 15355:0:(llog_cat.c:411:llog_cat_id2handle()) lustre-OST0000-osc-MDT0000: error opening log id [0x14a10:0x1:0x0]:0: rc = -2
[11728.170363] LustreError: 15355:0:(osp_sync.c:1317:osp_sync_thread()) ASSERTION( thread->t_flags != SVC_RUNNING ) failed: 1 changes, 0 in progress, 0 in flight
[11728.172819] LustreError: 15355:0:(osp_sync.c:1317:osp_sync_thread()) LBUG
[11728.174009] Pid: 15355, comm: osp-syn-0-0 3.10.0-1062.9.1.el7_lustre.x86_64 #1 SMP Sun Jan 26 08:12:08 UTC 2020
[11728.175778] Call Trace:
[11728.176263]  [<ffffffffc09868ac>] libcfs_call_trace+0x8c/0xc0 [libcfs]
[11728.177527]  [<ffffffffc098695c>] lbug_with_loc+0x4c/0xa0 [libcfs]
[11728.178659]  [<ffffffffc18a99e7>] osp_sync_thread+0xb17/0xb20 [osp]
[11728.179823]  [<ffffffff844c61f1>] kthread+0xd1/0xe0
[11728.180739]  [<ffffffff84b8dd37>] ret_from_fork_nospec_end+0x0/0x39
[11728.181961]  [<ffffffffffffffff>] 0xffffffffffffffff
[11728.183118] Kernel panic - not syncing: LBUG
[11728.183892] CPU: 1 PID: 15355 Comm: osp-syn-0-0 Kdump: loaded Tainted: P           OE  ------------   3.10.0-1062.9.1.el7_lustre.x86_64 #1
[11728.186387] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[11728.187518] Call Trace:
[11728.188077]  [<ffffffff84b7ac23>] dump_stack+0x19/0x1b
[11728.189229]  [<ffffffff84b74967>] panic+0xe8/0x21f
[11728.190230]  [<ffffffffc09869ab>] lbug_with_loc+0x9b/0xa0 [libcfs]
[11728.191518]  [<ffffffffc18a99e7>] osp_sync_thread+0xb17/0xb20 [osp]
[11728.192783]  [<ffffffff844d4ffe>] ? finish_task_switch+0x4e/0x1c0
[11728.194071]  [<ffffffff84b805a2>] ? __schedule+0x402/0x840
[11728.195164]  [<ffffffffc18a8ed0>] ? osp_sync_process_committed+0xa90/0xa90 [osp]
[11728.196746]  [<ffffffff844c61f1>] kthread+0xd1/0xe0
[11728.197750]  [<ffffffff844c6120>] ? insert_kthread_work+0x40/0x40
[11728.198987]  [<ffffffff84b8dd37>] ret_from_fork_nospec_begin+0x21/0x21
[11728.200278]  [<ffffffff844c6120>] ? insert_kthread_work+0x40/0x40
Comment by James A Simmons [ 29/Jan/20 ]

https://review.whamcloud.com/#/c/36265/ has not landed to master. Only patch in this series that landed is 

2bf7e6baa4947f96b7fee6ee625d695e2f34c064 which remove the unused flag SVC_EVENT.

Comment by Neil Brown [ 29/Jan/20 ]

The patch is more likely to fix these bugs than cause them.

I think I can see how the second might happen.  Maybe the first too, though one task would have to get quite a lot done while another task does almost nothing.

 

Comment by Gerrit Updater [ 25/Feb/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36258/
Subject: LU-12780 llite: don't use ptlrpc_thread for sai_agl_thread
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 5b630935452d5d8d7529d9ed267885f1ae11b450

Comment by Gerrit Updater [ 25/Feb/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36259/
Subject: LU-12780 llite: avoid ptlrpc_thread for ll_statahead_thread
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 6bf49037b3a1345877692a6ba6d92f579a8e2442

Comment by Gerrit Updater [ 01/Mar/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36263/
Subject: LU-12780 osp: use native kthreads for opd_pre_thread
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: cce9699b1ae128ac2874516b8f9ee85b5ac3fbc6

Comment by Gerrit Updater [ 05/Mar/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36264/
Subject: LU-12780 osp: don't use ptlrpc_thread for opd_update_thread
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 7c8cd8ad075d731e4bd6ed1d623e1c43c2cd5107

Comment by Gerrit Updater [ 17/Mar/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36261/
Subject: LU-12780 lod: don't use ptlrpc_thread for recovery thread
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 3b0094779b58c162e5dc553f6d7cdff1910d0c3b

Comment by Gerrit Updater [ 24/Mar/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36266/
Subject: LU-12780 osp: don't use ptlrpc_thread for lwp_notify_main()
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 051f0adeb0615da83c819ff74da467bb4117ae63

Comment by Gerrit Updater [ 24/Mar/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36268/
Subject: LU-12780 quota: don't use ptlrpc_thread for qsd_reint_main()
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 24eeeeef5d56f05b19c94fdd69a5a22ab7384616

Comment by Gerrit Updater [ 31/Mar/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36267/
Subject: LU-12780 quota: don't use ptlrpc_thread for qsd_upd_thread()
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 2ddb1b57670ce239b755e8a7fb4f5cc9d6319786

Comment by Gerrit Updater [ 07/Apr/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36556/
Subject: LU-12780 quota: don't use ptlrpc_thread for qmt_reba_thread
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 43046d2732a33f81390901acc87fb28920d3864f

Comment by Gerrit Updater [ 14/Apr/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36265/
Subject: LU-12780 osp: don't use ptlrpc_thread for osp_sync_thread()
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 8afc9590a78f6b20c338c027e17043bb22012817

Comment by Gerrit Updater [ 07/May/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36260/
Subject: LU-12780 target: don't use ptlrpc_thread for txn_commit_thread
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 8bbcfedcba1d5e4401bcfd8e60936efea262d562

Comment by Gerrit Updater [ 15/May/20 ]

Neil Brown (neilb@suse.de) uploaded a new patch: https://review.whamcloud.com/38612
Subject: LU-12780 quota: don't use ptlrpc_thead of qmt_pool_recalc
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 6fa73578014258098e0f86e8b448d8fbb2290a8d

Comment by Gerrit Updater [ 04/Jun/20 ]

Neil Brown (neilb@suse.de) uploaded a new patch: https://review.whamcloud.com/38824
Subject: LU-12780 osd: use native kthreads for scrub.
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 562bf522f156df53d4d298fd9b144398301b50d8

Comment by Gerrit Updater [ 06/Jun/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38612/
Subject: LU-12780 quota: don't use ptlrpc_thead of qmt_pool_recalc
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: d9094cf66b54c953416969e8581e1f3a6df461b0

Comment by Gerrit Updater [ 18/Jun/20 ]

Neil Brown (neilb@suse.de) uploaded a new patch: https://review.whamcloud.com/38974
Subject: LU-12780 scrub: all update to bitfields must be protected.
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: f8ff777659f2152528085c77789c08fcca81301d

Comment by Gerrit Updater [ 23/Jun/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38974/
Subject: LU-12780 scrub: all update to bitfields must be protected.
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 23a6e1ed8eec2c07653ed07c35bb109ecb87a5b7

Comment by Cory Spitz [ 24/Jun/20 ]

From https://review.whamcloud.com/38974:

If two of these updates can race, corruption can occurs

Are there any known bugs with scrub that are know to be fixed or might be fixed by the change?

Comment by Neil Brown [ 01/Jul/20 ]

(oops, I meant to reply to the above, but inadvertently editted the comment instead.  I didn't know I could do that, but think I've put it back properly).

 

> Are there any known bugs with scrub that are know to be fixed or might be fixed by the change?

I didn't find any clear races, but I didn't look very hard.  It is probably that the flags are changed very rarely, so any races that are possible are extremely unlikely.

My main reason for the patch was that I was going to add another flag which could change asynchronously, and I didn't think that would be safe without proper locking.

 

Comment by Gerrit Updater [ 01/Sep/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38824/
Subject: LU-12780 osd: use native kthreads for scrub.
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: b76df292c241a6db8dfad76d0f5f4cb8babe22bf

Comment by Gerrit Updater [ 26/Feb/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36262/
Subject: LU-12780 ofd: don't use ptlrpc_thread for consistency verification
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: cb3c65d4a10b2ad0a6c70d5e38719bca3d501efb

Comment by Peter Jones [ 26/Feb/21 ]

Landed for 2.15

Generated at Sat Feb 10 02:55:35 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.