[LU-12193] possible scheduling with spinlocks held in the quota paths Created: 18/Apr/19 Updated: 14/Dec/19 Resolved: 14/Dec/19 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.14.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Alex Zhuravlev | Assignee: | Sergey Cheremencev |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
this looks very dangerous: [<ffffffff815f11f9>] _cond_resched+0x29/0x40 |
| Comments |
| Comment by Alex Zhuravlev [ 18/Apr/19 ] |
|
I guess ideally we want to collect all affected lqe's on a local list using cfs_hash_for_each_safe(), then handle all the items on the list with no spinlock held. currently this scanning is serialized (to some extent) with cfs_hash_lock, but it should be fine to introduce a mutex wrapping cfs_hash_for_each_safe() and subsequent handling. |
| Comment by James A Simmons [ 17/Jun/19 ] |
|
Does patch https://review.whamcloud.com/#/c/34389/ resolve this? It removes using the cfs_hash(). |
| Comment by Oleg Drokin [ 22/Jul/19 ] |
|
the 34389 patch does not help with this issues, I see no visible changes. compare http://testing.linuxhacker.ru:3333/lustre-reports/1480/testresults/sanity-quota-ldiskfs-DNE-centos7_x86_64-centos7_x86_64/oleg9-server-console.txt (search for sleeping) and |
| Comment by Oleg Drokin [ 22/Jul/19 ] |
|
Also it looks like this is a 100% crash on rhel8: [ 497.202928] Lustre: DEBUG MARKER: == sanity-quota test 3: Block soft limit (start timer, timer goes off, stop timer) =================== 03:02:23 (1563778943) [ 503.400751] BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:69 [ 503.403144] in_atomic(): 1, irqs_disabled(): 0, pid: 14758, name: mdt00_004 [ 503.404213] INFO: lockdep is turned off. [ 503.404655] CPU: 3 PID: 14758 Comm: mdt00_004 Kdump: loaded Tainted: G W O --------- - - 4.18.0-debug #8 [ 503.405815] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 503.406408] Call Trace: [ 503.406681] dump_stack+0x106/0x175 [ 503.407129] ___might_sleep.cold.50+0xfc/0x12a [ 503.407675] __might_sleep+0x5b/0xc0 [ 503.408164] down_write+0x35/0x120 [ 503.408694] qmt_set_with_lqe+0x140/0xb10 [lquota] [ 503.409216] ? qmt_set_with_lqe+0xb10/0xb10 [lquota] [ 503.409845] qmt_entry_iter_cb+0x4c/0xb0 [lquota] [ 503.410369] cfs_hash_for_each_tight+0x15c/0x430 [libcfs] [ 503.411151] cfs_hash_for_each_safe+0x17/0x20 [libcfs] [ 503.411893] qmt_set_with_lqe+0x53d/0xb10 [lquota] [ 503.412858] qmt_set.constprop.8+0x180/0x390 [lquota] [ 503.413399] qmt_quotactl+0x35f/0x690 [lquota] [ 503.414242] mdt_quotactl+0x366/0x9a0 [mdt] [ 503.415143] tgt_handle_request0+0xdf/0x890 [ptlrpc] [ 503.416130] tgt_request_handle+0x3c6/0x1ae0 [ptlrpc] [ 503.417108] ptlrpc_server_handle_request+0x634/0x11c0 [ptlrpc] [ 503.418138] ptlrpc_main+0xd7f/0x1470 [ptlrpc] [ 503.418968] ? ptlrpc_register_service+0x14d0/0x14d0 [ptlrpc] [ 503.419662] kthread+0x190/0x1c0 [ 503.420027] ? kthread_create_worker+0x90/0x90 [ 503.420879] ret_from_fork+0x24/0x50 [ 503.421939] BUG: scheduling while atomic: mdt00_004/14758/0x00000003 [ 503.423025] INFO: lockdep is turned off. [ 503.423735] Modules linked in: zfs(O) zunicode(O) zlua(O) zcommon(O) znvpair(O) zavl(O) icp(O) spl(O) lustre(O) ofd(O) osp(O) lod(O) ost(O) mdt(O) mdd(O) mgs(O) osd_ldiskfs(O) ldiskfs(O) lquota(O) lfsck(O) obdecho(O) mgc(O) lov(O) mdc(O) osc(O) lmv(O) fid(O) fld(O) ptlrpc_gss(O) ptlrpc(O) obdclass(O) ksocklnd(O) lnet(O) libcfs(O) dm_flakey rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver i2c_piix4 pcspkr squashfs ip_tables ata_generic serio_raw ata_piix libata dm_mirror dm_region_hash dm_log dm_mod [ 503.430905] CPU: 3 PID: 14758 Comm: mdt00_004 Kdump: loaded Tainted: G W O --------- - - 4.18.0-debug #8 [ 503.432722] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 503.433736] Call Trace: [ 503.434079] dump_stack+0x106/0x175 [ 503.434558] __schedule_bug.cold.48+0x90/0xc5 [ 503.435288] __schedule+0xa14/0xfc0 [ 503.436010] ? _raw_spin_lock_irqsave+0xd2/0x130 [ 503.436853] schedule+0x5d/0x100 [ 503.437500] schedule_timeout+0x2db/0x8f0 [ 503.438216] ? __next_timer_interrupt+0x130/0x130 [ 503.439366] ? trace_hardirqs_on+0x19/0x30 [ 503.440565] ? ptlrpc_set_wait+0x60b/0xab0 [ptlrpc] [ 503.441599] ptlrpc_set_wait+0x678/0xab0 [ptlrpc] [ 503.442359] ? try_to_wake_up+0x790/0x790 [ 503.443201] ldlm_run_ast_work+0x17a/0x4e0 [ptlrpc] [ 503.444196] ldlm_glimpse_locks+0x46/0x130 [ptlrpc] [ 503.444999] qmt_glimpse_lock.isra.15.constprop.17+0x2d6/0x830 [lquota] [ 503.446253] qmt_glb_lock_notify+0x27d/0x480 [lquota] [ 503.447275] qmt_set_with_lqe+0x4d3/0xb10 [lquota] [ 503.448283] ? qmt_set_with_lqe+0xb10/0xb10 [lquota] [ 503.449006] qmt_entry_iter_cb+0x4c/0xb0 [lquota] [ 503.449466] cfs_hash_for_each_tight+0x15c/0x430 [libcfs] [ 503.450036] cfs_hash_for_each_safe+0x17/0x20 [libcfs] [ 503.451217] qmt_set_with_lqe+0x53d/0xb10 [lquota] [ 503.452016] qmt_set.constprop.8+0x180/0x390 [lquota] [ 503.452623] qmt_quotactl+0x35f/0x690 [lquota] [ 503.453576] mdt_quotactl+0x366/0x9a0 [mdt] [ 503.454672] tgt_handle_request0+0xdf/0x890 [ptlrpc] [ 503.455935] tgt_request_handle+0x3c6/0x1ae0 [ptlrpc] [ 503.457314] ptlrpc_server_handle_request+0x634/0x11c0 [ptlrpc] [ 503.458831] ptlrpc_main+0xd7f/0x1470 [ptlrpc] [ 503.459930] ? ptlrpc_register_service+0x14d0/0x14d0 [ptlrpc] [ 503.460906] kthread+0x190/0x1c0 [ 503.461495] ? kthread_create_worker+0x90/0x90 [ 503.462400] ret_from_fork+0x24/0x50 [ 503.463289] LNetError: 14758:0:(lib-move.c:764:lnet_ni_send()) ASSERTION( !((preempt_count() & ((((1UL << (4))-1) << ((0 + 8) + 8)) | (((1UL << (8))-1) << (0 + 8)) | (((1UL << (1))-1) << (((0 + 8) + 8) + 4))))) ) failed: [ 503.467427] LNetError: 14758:0:(lib-move.c:764:lnet_ni_send()) LBUG [ 503.468616] Kernel panic - not syncing: LBUG in interrupt. [ 503.469746] CPU: 3 PID: 14758 Comm: mdt00_004 Kdump: loaded Tainted: G W O --------- - - 4.18.0-debug #8 [ 503.471150] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 503.471972] Call Trace: [ 503.472235] dump_stack+0x106/0x175 [ 503.472697] panic+0x147/0x3af [ 503.473160] ? cfs_trace_unlock_tcd+0x5c/0xe0 [libcfs] [ 503.473934] ? cfs_trace_unlock_tcd+0x5c/0xe0 [libcfs] [ 503.474694] lbug_with_loc.cold.0+0x14/0x28 [libcfs] [ 503.475427] lnet_ni_send+0xb9/0x110 [lnet] [ 503.476086] lnet_send+0xb6/0x260 [lnet] [ 503.476708] LNetPut+0x513/0xef0 [lnet] [ 503.477427] ptl_send_buf+0x265/0x6a0 [ptlrpc] [ 503.478227] ptlrpc_send_reply+0x3a7/0xb70 [ptlrpc] [ 503.479006] target_send_reply_msg+0x192/0x350 [ptlrpc] [ 503.479822] target_send_reply+0x492/0xa00 [ptlrpc] [ 503.480626] tgt_handle_request0+0x164/0x890 [ptlrpc] [ 503.481506] tgt_request_handle+0x3c6/0x1ae0 [ptlrpc] [ 503.482362] ptlrpc_server_handle_request+0x634/0x11c0 [ptlrpc] [ 503.483303] ptlrpc_main+0xd7f/0x1470 [ptlrpc] [ 503.483963] ? ptlrpc_register_service+0x14d0/0x14d0 [ptlrpc] [ 503.484532] kthread+0x190/0x1c0 [ 503.484877] ? kthread_create_worker+0x90/0x90 [ 503.485352] ret_from_fork+0x24/0x50 |
| Comment by Gerrit Updater [ 19/Nov/19 ] |
|
Sergey Cheremencev (c17829@cray.com) uploaded a new patch: https://review.whamcloud.com/36795 |
| Comment by James A Simmons [ 19/Nov/19 ] |
|
Lets move the code to rhashtable instead. We get the benefit of the light weight rcu locks so it will scale way better than what libcfs hash can do. |
| Comment by James A Simmons [ 19/Nov/19 ] |
|
Another idea I had as well is using xarray as a potential alternative to rhashtable. Which one you use depends on the data arrangement. Xarrays are optimized for densely packed data. For the case of quotas we are using uid / gid and projid which tend to sequential. The other benefit is that you can 'mark' the data. This means that an entry in the Xarray for example 1000 could be labeled as a mix of UID or GID or PROJID. This could reduce all the [LL_MAXQUOTAS] down to one data structure. If the data is not densely pack then rhashtable is the way to go. |
| Comment by Sergey Cheremencev [ 19/Nov/19 ] |
To fix current problem I would prefer a small and clear fix, that adds rw_sem locking to cfs_hash and only changes flags that used to create lqs_hash. |
| Comment by Gerrit Updater [ 14/Dec/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36795/ |
| Comment by Peter Jones [ 14/Dec/19 ] |
|
Landed for 2.14 |