[LU-12193] possible scheduling with spinlocks held in the quota paths Created: 18/Apr/19  Updated: 14/Dec/19  Resolved: 14/Dec/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.14.0

Type: Bug Priority: Major
Reporter: Alex Zhuravlev Assignee: Sergey Cheremencev
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

this looks very dangerous:

[<ffffffff815f11f9>] _cond_resched+0x29/0x40
[<ffffffffa04f2c9b>] ptlrpc_check_set+0x16b/0x30a0 [ptlrpc]
[<ffffffffa04f5d2a>] ptlrpc_set_wait+0x15a/0x7b0 [ptlrpc]
[<ffffffff8137da0d>] ? __raw_spin_lock_init+0x2d/0x50
[<ffffffffa04ad520>] ? __ldlm_handle2lock+0x3f0/0x3f0 [ptlrpc]
[<ffffffffa04ec1d0>] ? ptlrpc_prep_set+0x180/0x2b0 [ptlrpc]
[<ffffffffa04b3197>] ldlm_run_ast_work+0xd7/0x3d0 [ptlrpc]
[<ffffffffa04d6006>] ldlm_glimpse_locks+0x36/0xf0 [ptlrpc]
[<ffffffffa0880d8a>] qmt_glimpse_lock.isra.2.constprop.4+0x52a/0xa70 [lquota]
[<ffffffffa0884445>] qmt_glb_lock_notify+0x1e5/0x390 [lquota]
[<ffffffffa087de2f>] qmt_set_with_lqe+0x35f/0x800 [lquota]
[<ffffffffa087e2d0>] ? qmt_set_with_lqe+0x800/0x800 [lquota]
[<ffffffffa087e326>] qmt_entry_iter_cb+0x56/0xa0 [lquota]
[<ffffffffa000e53b>] cfs_hash_for_each_tight+0x10b/0x2e0 [libcfs]
[<ffffffffa000e75e>] cfs_hash_for_each_safe+0xe/0x10 [libcfs]
[<ffffffffa087de7f>] qmt_set_with_lqe+0x3af/0x800 [lquota]
[<ffffffffa087e4b8>] qmt_set.constprop.2+0x148/0x2b0 [lquota]
[<ffffffffa0556dce>] ? barrier_entry+0x3e/0x180 [ptlrpc]
[<ffffffffa087ec03>] qmt_quotactl+0x5e3/0x600 [lquota]
[<ffffffffa0a93400>] mdt_quotactl+0x290/0x770 [mdt]



 Comments   
Comment by Alex Zhuravlev [ 18/Apr/19 ]

I guess ideally we want to collect all affected lqe's on a local list using cfs_hash_for_each_safe(), then handle all the items on the list with no spinlock held. currently this scanning is serialized (to some extent) with cfs_hash_lock, but it should be fine to introduce a mutex wrapping cfs_hash_for_each_safe() and subsequent handling.

Comment by James A Simmons [ 17/Jun/19 ]

Does patch https://review.whamcloud.com/#/c/34389/ resolve this? It removes using the cfs_hash().

Comment by Oleg Drokin [ 22/Jul/19 ]

the 34389 patch does not help with this issues, I see no visible changes.

compare http://testing.linuxhacker.ru:3333/lustre-reports/1480/testresults/sanity-quota-ldiskfs-DNE-centos7_x86_64-centos7_x86_64/oleg9-server-console.txt (search for sleeping) and
http://testing.linuxhacker.ru:3333/lustre-reports/685/testresults/sanity-quota-ldiskfs-DNE-centos7_x86_64-centos7_x86_64/oleg88-server-console.txt (search for sleeping)

Comment by Oleg Drokin [ 22/Jul/19 ]

Also it looks like this is a 100% crash on rhel8:

[  497.202928] Lustre: DEBUG MARKER: == sanity-quota test 3: Block soft limit (start timer, timer goes off, stop timer) =================== 03:02:23 (1563778943)
[  503.400751] BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:69
[  503.403144] in_atomic(): 1, irqs_disabled(): 0, pid: 14758, name: mdt00_004
[  503.404213] INFO: lockdep is turned off.
[  503.404655] CPU: 3 PID: 14758 Comm: mdt00_004 Kdump: loaded Tainted: G        W  O     --------- -  - 4.18.0-debug #8
[  503.405815] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  503.406408] Call Trace:
[  503.406681]  dump_stack+0x106/0x175
[  503.407129]  ___might_sleep.cold.50+0xfc/0x12a
[  503.407675]  __might_sleep+0x5b/0xc0
[  503.408164]  down_write+0x35/0x120
[  503.408694]  qmt_set_with_lqe+0x140/0xb10 [lquota]
[  503.409216]  ? qmt_set_with_lqe+0xb10/0xb10 [lquota]
[  503.409845]  qmt_entry_iter_cb+0x4c/0xb0 [lquota]
[  503.410369]  cfs_hash_for_each_tight+0x15c/0x430 [libcfs]
[  503.411151]  cfs_hash_for_each_safe+0x17/0x20 [libcfs]
[  503.411893]  qmt_set_with_lqe+0x53d/0xb10 [lquota]
[  503.412858]  qmt_set.constprop.8+0x180/0x390 [lquota]
[  503.413399]  qmt_quotactl+0x35f/0x690 [lquota]
[  503.414242]  mdt_quotactl+0x366/0x9a0 [mdt]
[  503.415143]  tgt_handle_request0+0xdf/0x890 [ptlrpc]
[  503.416130]  tgt_request_handle+0x3c6/0x1ae0 [ptlrpc]
[  503.417108]  ptlrpc_server_handle_request+0x634/0x11c0 [ptlrpc]
[  503.418138]  ptlrpc_main+0xd7f/0x1470 [ptlrpc]
[  503.418968]  ? ptlrpc_register_service+0x14d0/0x14d0 [ptlrpc]
[  503.419662]  kthread+0x190/0x1c0
[  503.420027]  ? kthread_create_worker+0x90/0x90
[  503.420879]  ret_from_fork+0x24/0x50
[  503.421939] BUG: scheduling while atomic: mdt00_004/14758/0x00000003
[  503.423025] INFO: lockdep is turned off.
[  503.423735] Modules linked in: zfs(O) zunicode(O) zlua(O) zcommon(O) znvpair(O) zavl(O) icp(O) spl(O) lustre(O) ofd(O) osp(O) lod(O) ost(O) mdt(O) mdd(O) mgs(O) osd_ldiskfs(O) ldiskfs(O) lquota(O) lfsck(O) obdecho(O) mgc(O) lov(O) mdc(O) osc(O) lmv(O) fid(O) fld(O) ptlrpc_gss(O) ptlrpc(O) obdclass(O) ksocklnd(O) lnet(O) libcfs(O) dm_flakey rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver i2c_piix4 pcspkr squashfs ip_tables ata_generic serio_raw ata_piix libata dm_mirror dm_region_hash dm_log dm_mod
[  503.430905] CPU: 3 PID: 14758 Comm: mdt00_004 Kdump: loaded Tainted: G        W  O     --------- -  - 4.18.0-debug #8
[  503.432722] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  503.433736] Call Trace:
[  503.434079]  dump_stack+0x106/0x175
[  503.434558]  __schedule_bug.cold.48+0x90/0xc5
[  503.435288]  __schedule+0xa14/0xfc0
[  503.436010]  ? _raw_spin_lock_irqsave+0xd2/0x130
[  503.436853]  schedule+0x5d/0x100
[  503.437500]  schedule_timeout+0x2db/0x8f0
[  503.438216]  ? __next_timer_interrupt+0x130/0x130
[  503.439366]  ? trace_hardirqs_on+0x19/0x30
[  503.440565]  ? ptlrpc_set_wait+0x60b/0xab0 [ptlrpc]
[  503.441599]  ptlrpc_set_wait+0x678/0xab0 [ptlrpc]
[  503.442359]  ? try_to_wake_up+0x790/0x790
[  503.443201]  ldlm_run_ast_work+0x17a/0x4e0 [ptlrpc]
[  503.444196]  ldlm_glimpse_locks+0x46/0x130 [ptlrpc]
[  503.444999]  qmt_glimpse_lock.isra.15.constprop.17+0x2d6/0x830 [lquota]
[  503.446253]  qmt_glb_lock_notify+0x27d/0x480 [lquota]
[  503.447275]  qmt_set_with_lqe+0x4d3/0xb10 [lquota]
[  503.448283]  ? qmt_set_with_lqe+0xb10/0xb10 [lquota]
[  503.449006]  qmt_entry_iter_cb+0x4c/0xb0 [lquota]
[  503.449466]  cfs_hash_for_each_tight+0x15c/0x430 [libcfs]
[  503.450036]  cfs_hash_for_each_safe+0x17/0x20 [libcfs]
[  503.451217]  qmt_set_with_lqe+0x53d/0xb10 [lquota]
[  503.452016]  qmt_set.constprop.8+0x180/0x390 [lquota]
[  503.452623]  qmt_quotactl+0x35f/0x690 [lquota]
[  503.453576]  mdt_quotactl+0x366/0x9a0 [mdt]
[  503.454672]  tgt_handle_request0+0xdf/0x890 [ptlrpc]
[  503.455935]  tgt_request_handle+0x3c6/0x1ae0 [ptlrpc]
[  503.457314]  ptlrpc_server_handle_request+0x634/0x11c0 [ptlrpc]
[  503.458831]  ptlrpc_main+0xd7f/0x1470 [ptlrpc]
[  503.459930]  ? ptlrpc_register_service+0x14d0/0x14d0 [ptlrpc]
[  503.460906]  kthread+0x190/0x1c0
[  503.461495]  ? kthread_create_worker+0x90/0x90
[  503.462400]  ret_from_fork+0x24/0x50
[  503.463289] LNetError: 14758:0:(lib-move.c:764:lnet_ni_send()) ASSERTION( !((preempt_count() & ((((1UL << (4))-1) << ((0 + 8) + 8)) | (((1UL << (8))-1) << (0 + 8)) | (((1UL << (1))-1) << (((0 + 8) + 8) + 4))))) ) failed: 
[  503.467427] LNetError: 14758:0:(lib-move.c:764:lnet_ni_send()) LBUG
[  503.468616] Kernel panic - not syncing: LBUG in interrupt.

[  503.469746] CPU: 3 PID: 14758 Comm: mdt00_004 Kdump: loaded Tainted: G        W  O     --------- -  - 4.18.0-debug #8
[  503.471150] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  503.471972] Call Trace:
[  503.472235]  dump_stack+0x106/0x175
[  503.472697]  panic+0x147/0x3af
[  503.473160]  ? cfs_trace_unlock_tcd+0x5c/0xe0 [libcfs]
[  503.473934]  ? cfs_trace_unlock_tcd+0x5c/0xe0 [libcfs]
[  503.474694]  lbug_with_loc.cold.0+0x14/0x28 [libcfs]
[  503.475427]  lnet_ni_send+0xb9/0x110 [lnet]
[  503.476086]  lnet_send+0xb6/0x260 [lnet]
[  503.476708]  LNetPut+0x513/0xef0 [lnet]
[  503.477427]  ptl_send_buf+0x265/0x6a0 [ptlrpc]
[  503.478227]  ptlrpc_send_reply+0x3a7/0xb70 [ptlrpc]
[  503.479006]  target_send_reply_msg+0x192/0x350 [ptlrpc]
[  503.479822]  target_send_reply+0x492/0xa00 [ptlrpc]
[  503.480626]  tgt_handle_request0+0x164/0x890 [ptlrpc]
[  503.481506]  tgt_request_handle+0x3c6/0x1ae0 [ptlrpc]
[  503.482362]  ptlrpc_server_handle_request+0x634/0x11c0 [ptlrpc]
[  503.483303]  ptlrpc_main+0xd7f/0x1470 [ptlrpc]
[  503.483963]  ? ptlrpc_register_service+0x14d0/0x14d0 [ptlrpc]
[  503.484532]  kthread+0x190/0x1c0
[  503.484877]  ? kthread_create_worker+0x90/0x90
[  503.485352]  ret_from_fork+0x24/0x50
Comment by Gerrit Updater [ 19/Nov/19 ]

Sergey Cheremencev (c17829@cray.com) uploaded a new patch: https://review.whamcloud.com/36795
Subject: LU-12193 quota: use rw_sem to protect lqs_hash
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: bc9c8b85229ab7aa21cad04f271903b180dcd2b8

Comment by James A Simmons [ 19/Nov/19 ]

Lets move the code to rhashtable instead. We get the benefit of the light weight rcu locks so it will scale way better than what libcfs hash can do.

Comment by James A Simmons [ 19/Nov/19 ]

Another idea I had as well is using xarray as a potential alternative to rhashtable. Which one you use depends on the data arrangement. Xarrays are optimized for densely packed data. For the case of quotas we are using uid / gid and projid which tend to sequential. The other benefit is that you can 'mark' the data. This means that an entry in the Xarray for example 1000 could be labeled as a mix of UID or GID or PROJID. This could reduce all the [LL_MAXQUOTAS] down to one data structure. If the data is not densely pack then rhashtable is the way to go.

Comment by Sergey Cheremencev [ 19/Nov/19 ]

Lets move the code to rhashtable instead. We get the benefit of the light weight rcu locks so it will scale way better than what libcfs hash can do.

To fix current problem I would prefer a small and clear fix, that adds rw_sem locking to cfs_hash and only changes flags that used to create lqs_hash.
I am ok with suggested solutions, but they look more like optimizations or improvements. I am afraid to fix one problem and introduce several regressions.

Comment by Gerrit Updater [ 14/Dec/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36795/
Subject: LU-12193 quota: use rw_sem to protect lqs_hash
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: f3cdf905c522837e2cdce779a03b4ecf16313c65

Comment by Peter Jones [ 14/Dec/19 ]

Landed for 2.14

Generated at Sat Feb 10 02:50:27 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.