[LU-13476] vvp_set_pagevec_dirty / vvp_page_completion_write lock ordering appears to trigger RCU stalls Created: 22/Apr/20 Updated: 22/Oct/20 Resolved: 04/Jul/20 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.14.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Shaun Tancheff | Assignee: | Shaun Tancheff |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
vvp_set_pagevec_dirty locks mapping->page_tree and integrates over pages This can cause an extended lock or deadlock with vvp_set_pagevec_dirty One core spinning here #0 [ffffc9000b97fa78] _raw_spin_lock_irqsave at ffffffff815b65e9
/home/abuild/rpmbuild/BUILD/kernel-cray_ari_c-4.12.14/linux-4.12.14/linux-obj/../kernel/locking/spinlock.c: 160
#1 [ffffc9000b97fa98] lock_page_memcg at ffffffff811d7a89
/home/abuild/rpmbuild/BUILD/kernel-cray_ari_c-4.12.14/linux-4.12.14/linux-obj/../mm/memcontrol.c: 1695
#2 [ffffc9000b97fac0] test_clear_page_writeback at ffffffff8117c479
/home/abuild/rpmbuild/BUILD/kernel-cray_ari_c-4.12.14/linux-4.12.14/linux-obj/../mm/page-writeback.c: 2780
#3 [ffffc9000b97fb10] end_page_writeback at ffffffff8116a657
/home/abuild/rpmbuild/BUILD/kernel-cray_ari_c-4.12.14/linux-4.12.14/linux-obj/../mm/filemap.c: 1273
#4 [ffffc9000b97fb28] vvp_page_completion_write at ffffffffa0816341 [lustre]
/home/abuild/rpmbuild/BUILD/cray-lustre-2.12.0.5_cray_290_gdd6781b/lustre/llite/vvp_page.c: 316
#5 [ffffc9000b97fb58] cl_page_completion at ffffffffa0504663 [obdclass]
/home/abuild/rpmbuild/BUILD/cray-lustre-2.12.0.5_cray_290_gdd6781b/lustre/obdclass/cl_page.c: 931
With many cores spinning here: _raw_spin_lock_irqsave+0x39/0x50 vvp_set_pagevec_dirty+0x97/0x3a0 [lustre] write_commit_callback+0x64/0x1a0 [lustre] osc_queue_async_io+0x910/0x18e0 [osc] ? vvp_set_pagevec_dirty+0x3a0/0x3a0 [lustre] ? vvp_set_pagevec_dirty+0x3a0/0x3a0 [lustre] osc_page_cache_add+0x5f/0x180 [osc] osc_io_commit_async+0x2a0/0x500 [osc] ? vvp_set_pagevec_dirty+0x3a0/0x3a0 [lustre] ? vvp_set_pagevec_dirty+0x3a0/0x3a0 [lustre] cl_io_commit_async+0xa9/0x150 [obdclass] ? vvp_set_pagevec_dirty+0x3a0/0x3a0 [lustre] lov_io_commit_async+0x106/0x580 [lov] ? vvp_set_pagevec_dirty+0x3a0/0x3a0 [lustre] ? vvp_set_pagevec_dirty+0x3a0/0x3a0 [lustre] cl_io_commit_async+0xa9/0x150 [obdclass] vvp_io_write_commit+0x157/0x5e0 [lustre] vvp_io_write_start+0x6ac/0x8b0 [lustre] cl_io_start+0x6e/0x120 [obdclass] cl_io_loop+0xca/0x1c0 [obdclass] ll_file_io_generic+0x3c9/0xdd0 [lustre] ll_file_write_iter+0x124/0x630 [lustre] Would explain the RCU stall where the memcg and tree_lock/i_pages order is inverted. |
| Comments |
| Comment by Gerrit Updater [ 22/Apr/20 ] |
|
Shaun Tancheff (shaun.tancheff@hpe.com) uploaded a new patch: https://review.whamcloud.com/38317 |
| Comment by Gerrit Updater [ 04/Jul/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38317/ |
| Comment by Peter Jones [ 04/Jul/20 ] |
|
Landed for 2.14 |
| Comment by Gerrit Updater [ 22/Oct/20 ] |
|
Jian Yu (yujian@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40358 |