[LU-4977] Deadlock in balance_dirty_pages() Created: 29/Apr/14  Updated: 15/May/14  Resolved: 14/May/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.6.0
Fix Version/s: Lustre 2.6.0

Type: Bug Priority: Blocker
Reporter: Jinshan Xiong (Inactive) Assignee: Jinshan Xiong (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-4873 Lustre client hangs in vvp_page_make_... Resolved
Severity: 3
Rank (Obsolete): 13790

 Description   

I can occasionally see this issue in machines with less memory. The deadlock has the following call stack:

dd            D 0000000000000000     0  2158      1 0x00000004
 ffff88010ecc10f8 0000000000000086 ffff8801ffffffff 0000000042a8c635
 ffff88010ecc1078 ffff88009ccb68a0 0000000000047e6a ffffffffaca103a3
 ffff8800d7bd5058 ffff88010ecc1fd8 000000000000fb88 ffff8800d7bd5058
Call Trace:
 [<ffffffff810a2431>] ? ktime_get_ts+0xb1/0xf0
 [<ffffffff81119e10>] ? sync_page+0x0/0x50
 [<ffffffff8150ed93>] io_schedule+0x73/0xc0
 [<ffffffff81119e4d>] sync_page+0x3d/0x50
 [<ffffffff8150f5fa>] __wait_on_bit_lock+0x5a/0xc0
 [<ffffffff81119de7>] __lock_page+0x67/0x70
 [<ffffffff81096de0>] ? wake_bit_function+0x0/0x50
 [<ffffffffa0f60101>] vvp_page_make_ready+0x271/0x280 [lustre]
 [<ffffffffa0542999>] cl_page_make_ready+0x89/0x370 [obdclass]
 [<ffffffffa03b45a1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
 [<ffffffffa0a323b7>] osc_extent_make_ready+0x3b7/0xe50 [osc]
 [<ffffffff81055ad3>] ? __wake_up+0x53/0x70
 [<ffffffffa0a36af6>] osc_io_unplug0+0x1736/0x2130 [osc]
 [<ffffffff8103c7d8>] ? pvclock_clocksource_read+0x58/0xd0
 [<ffffffffa03b45a1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
 [<ffffffffa0a39681>] osc_io_unplug+0x11/0x20 [osc]
 [<ffffffffa0a3bc86>] osc_cache_writeback_range+0xdb6/0x1290 [osc]
 [<ffffffffa03b9d47>] ? cfs_hash_bd_lookup_intent+0x37/0x130 [libcfs]
 [<ffffffffa03b9d47>] ? cfs_hash_bd_lookup_intent+0x37/0x130 [libcfs]
 [<ffffffffa03b9362>] ? cfs_hash_bd_add_locked+0x62/0x90 [libcfs]
 [<ffffffffa054a45d>] ? cl_io_sub_init+0x5d/0xc0 [obdclass]
 [<ffffffffa0a29fd0>] osc_io_fsync_start+0x90/0x360 [osc]
 [<ffffffffa0547640>] ? cl_io_start+0x0/0x140 [obdclass]
 [<ffffffffa05476aa>] cl_io_start+0x6a/0x140 [obdclass]
 [<ffffffffa0a8f18e>] lov_io_call+0x8e/0x130 [lov]
 [<ffffffffa0a9324c>] lov_io_start+0x10c/0x180 [lov]
 [<ffffffffa05476aa>] cl_io_start+0x6a/0x140 [obdclass]
 [<ffffffffa054aea4>] cl_io_loop+0xb4/0x1b0 [obdclass]
 [<ffffffffa0f02acb>] cl_sync_file_range+0x31b/0x500 [lustre]
 [<ffffffffa0f2fe7c>] ll_writepages+0x9c/0x220 [lustre]
 [<ffffffff8112e1b1>] do_writepages+0x21/0x40
 [<ffffffff811aca9d>] writeback_single_inode+0xdd/0x290
 [<ffffffff811aceae>] writeback_sb_inodes+0xce/0x180
 [<ffffffff811ad00b>] writeback_inodes_wb+0xab/0x1b0
 [<ffffffff8112d60d>] balance_dirty_pages+0x23d/0x4d0
 [<ffffffffa0541768>] ? cl_page_invoid+0x68/0x160 [obdclass]
 [<ffffffff8112d904>] balance_dirty_pages_ratelimited_nr+0x64/0x70
 [<ffffffff8111a86a>] generic_file_buffered_write+0x1da/0x2e0
 [<ffffffff81075887>] ? current_fs_time+0x27/0x30
 [<ffffffff8111c210>] __generic_file_aio_write+0x260/0x490
 [<ffffffffa0a93d9c>] ? lov_lock_enqueue+0xbc/0x170 [lov]
 [<ffffffff8111c4c8>] generic_file_aio_write+0x88/0x100
 [<ffffffffa0f634a2>] vvp_io_write_start+0x102/0x3f0 [lustre]
 [<ffffffffa05476aa>] cl_io_start+0x6a/0x140 [obdclass]
 [<ffffffffa054aea4>] cl_io_loop+0xb4/0x1b0 [obdclass]
 [<ffffffffa0f00297>] ll_file_io_generic+0x407/0x8d0 [lustre]
 [<ffffffffa05406c9>] ? cl_env_get+0x29/0x350 [obdclass]
 [<ffffffffa0f00fa3>] ll_file_aio_write+0x133/0x2b0 [lustre]
 [<ffffffffa0f01279>] ll_file_write+0x159/0x290 [lustre]
 [<ffffffff81181398>] vfs_write+0xb8/0x1a0
 [<ffffffff81181c91>] sys_write+0x51/0x90
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b

In balance_dirty_pages(), it tries to write back some dirty pages between after write_end(). However, ll_write_end() can hold the page to add it into commit queue and causes the problem.

We can fix the problem by releasing the page in ll_write_end() if the page is already dirty.

Patch is coming.



 Comments   
Comment by Jinshan Xiong (Inactive) [ 29/Apr/14 ]

patch is located at http://review.whamcloud.com/10149

Comment by Peter Jones [ 14/May/14 ]

Landed for 2.6

Comment by Patrick Farrell (Inactive) [ 14/May/14 ]

The patch for this issue removes the change made to fix LU-4873.

Can you explain why this change means it's safe to do:
vmpage = grab_cache_page_nowait(mapping, index);

for partial page writes?

Comment by Patrick Farrell (Inactive) [ 15/May/14 ]

With LU-4977, we see many tasks failing to exit, stuck in various pieces of the memory management code. While we no longer see LU-4873, we see many tasks stuck in other locations:

Here's a sampling... We saw a small number of this stack trace:
2014-05-14T23:40:01.626699-05:00 c3-0c0s5n1 <node_health:5.1> APID:13971031 (Application_Exited_Check) STACK: call_rwsem_down_write_failed+0x13/0x20; ll_setattr_raw+0x19a/0x1030 [lustre]; ll_setattr+0x57/0xf0 [lustre]; notify_change+0x120/0x310; file_remove_suid+0x81/0xb0; __generic_file_aio_write+0x209/0x450; generic_file_aio_write+0x59/0xc0; vvp_io_write_start+0xe0/0x3c0 [lustre]; cl_io_start+0x72/0x140 [obdclass]; cl_io_loop+0xac/0x1a0 [obdclass]; ll_file_io_generic+0x452/0x700 [lustre]; ll_file_aio_write+0x23e/0x290 [lustre]; ll_file_write+0x1f2/0x280 [lustre]; vfs_write+0xcb/0x180; sys_write+0x55/0x90; system_call_fastpath+0x16/0x1b; 0x2aaaad4f6730; 0xffffffffffffffff;

A similar number with this stack trace:
2014-05-14T23:24:19.890744-05:00 c6-0c2s2n0 <node_health:5.1> APID:13969099 (Application_Exited_Check) STACK: call_rwsem_down_write_failed+0x13/0x20; do_coredump+0x14e/0xbe0; get_signal_to_deliver+0x23b/0x480; do_notify_resume+0xe0/0x7f0; retint_signal+0x46/0x83; 0x2000feb6; 0xffffffffffffffff;

A lot with this stack trace:
2014-05-14T23:24:20.280631-05:00 c6-0c1s4n0 <node_health:5.1> APID:13969099 (Application_Exited_Check) STACK: call_rwsem_down_write_failed+0x13/0x20; kgni_mm_fini+0x82/0x3e0 [kgni_gem]; kgni_nic_destroy+0x37/0x70 [kgni_gem]; kgni_close+0x164/0x230 [kgni_gem]; fput+0xda/0x200; filp_close+0x63/0x90; put_files_struct+0x84/0xe0; exit_files+0x53/0x70; do_exit+0x1ec/0x980; do_group_exit+0x48/0xc0; get_signal_to_deliver+0x243/0x480; do_notify_resume+0xe0/0x7f0; int_signal+0x12/0x17; 0x2001c111; 0xffffffffffffffff;

And the largest number were stuck here:
2014-05-14T22:44:36.488369-05:00 c4-0c2s4n0 <node_health:5.1> APID:13969066 (Application_Exited_Check) STACK: lu_cache_shrink+0x56/0x2a0 [obdclass]; shrink_slab+0xae/0x2c0; do_try_to_free_pages+0x593/0x6e0; try_to_free_pages+0x116/0x3c0; __alloc_pages_nodemask+0x58c/0x950; alloc_pages_current+0xbe/0x130; alloc_buddy_huge_page+0xf4/0x130; hugetlb_acct_memory+0xf0/0x330; hugetlb_reserve_pages+0x10c/0x1e0; hugetlbfs_file_mmap+0xec/0x150; mmap_region+0x4d2/0x6b0; do_mmap_pgoff+0x367/0x390; sys_mmap_pgoff+0x1fe/0x220; sys_mmap+0x29/0x30; system_call_fastpath+0x16/0x1b; 0x201b5599; 0xffffffffffffffff;

Comment by Jinshan Xiong (Inactive) [ 15/May/14 ]

Hi Patrick,

I don't think the problem you've seen is related to the patch here. Is this the first time you see this issue? Please file a separate ticket.

Jinshan

Comment by Patrick Farrell (Inactive) [ 15/May/14 ]

Ah, sorry, the setattr_raw issue is a known one. For some reason that fix wasn't on this tree.

The others are, as far as I can tell, new instances. I'm going to test again with the latest master...
The processes stuck in lu_cache_shrink eventually completed, but a significant amount of time was spent there first.

Generated at Sat Feb 10 01:47:29 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.