Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
Lustre 2.6.0
-
None
-
3
-
13790
Description
I can occasionally see this issue in machines with less memory. The deadlock has the following call stack:
dd D 0000000000000000 0 2158 1 0x00000004 ffff88010ecc10f8 0000000000000086 ffff8801ffffffff 0000000042a8c635 ffff88010ecc1078 ffff88009ccb68a0 0000000000047e6a ffffffffaca103a3 ffff8800d7bd5058 ffff88010ecc1fd8 000000000000fb88 ffff8800d7bd5058 Call Trace: [<ffffffff810a2431>] ? ktime_get_ts+0xb1/0xf0 [<ffffffff81119e10>] ? sync_page+0x0/0x50 [<ffffffff8150ed93>] io_schedule+0x73/0xc0 [<ffffffff81119e4d>] sync_page+0x3d/0x50 [<ffffffff8150f5fa>] __wait_on_bit_lock+0x5a/0xc0 [<ffffffff81119de7>] __lock_page+0x67/0x70 [<ffffffff81096de0>] ? wake_bit_function+0x0/0x50 [<ffffffffa0f60101>] vvp_page_make_ready+0x271/0x280 [lustre] [<ffffffffa0542999>] cl_page_make_ready+0x89/0x370 [obdclass] [<ffffffffa03b45a1>] ? libcfs_debug_msg+0x41/0x50 [libcfs] [<ffffffffa0a323b7>] osc_extent_make_ready+0x3b7/0xe50 [osc] [<ffffffff81055ad3>] ? __wake_up+0x53/0x70 [<ffffffffa0a36af6>] osc_io_unplug0+0x1736/0x2130 [osc] [<ffffffff8103c7d8>] ? pvclock_clocksource_read+0x58/0xd0 [<ffffffffa03b45a1>] ? libcfs_debug_msg+0x41/0x50 [libcfs] [<ffffffffa0a39681>] osc_io_unplug+0x11/0x20 [osc] [<ffffffffa0a3bc86>] osc_cache_writeback_range+0xdb6/0x1290 [osc] [<ffffffffa03b9d47>] ? cfs_hash_bd_lookup_intent+0x37/0x130 [libcfs] [<ffffffffa03b9d47>] ? cfs_hash_bd_lookup_intent+0x37/0x130 [libcfs] [<ffffffffa03b9362>] ? cfs_hash_bd_add_locked+0x62/0x90 [libcfs] [<ffffffffa054a45d>] ? cl_io_sub_init+0x5d/0xc0 [obdclass] [<ffffffffa0a29fd0>] osc_io_fsync_start+0x90/0x360 [osc] [<ffffffffa0547640>] ? cl_io_start+0x0/0x140 [obdclass] [<ffffffffa05476aa>] cl_io_start+0x6a/0x140 [obdclass] [<ffffffffa0a8f18e>] lov_io_call+0x8e/0x130 [lov] [<ffffffffa0a9324c>] lov_io_start+0x10c/0x180 [lov] [<ffffffffa05476aa>] cl_io_start+0x6a/0x140 [obdclass] [<ffffffffa054aea4>] cl_io_loop+0xb4/0x1b0 [obdclass] [<ffffffffa0f02acb>] cl_sync_file_range+0x31b/0x500 [lustre] [<ffffffffa0f2fe7c>] ll_writepages+0x9c/0x220 [lustre] [<ffffffff8112e1b1>] do_writepages+0x21/0x40 [<ffffffff811aca9d>] writeback_single_inode+0xdd/0x290 [<ffffffff811aceae>] writeback_sb_inodes+0xce/0x180 [<ffffffff811ad00b>] writeback_inodes_wb+0xab/0x1b0 [<ffffffff8112d60d>] balance_dirty_pages+0x23d/0x4d0 [<ffffffffa0541768>] ? cl_page_invoid+0x68/0x160 [obdclass] [<ffffffff8112d904>] balance_dirty_pages_ratelimited_nr+0x64/0x70 [<ffffffff8111a86a>] generic_file_buffered_write+0x1da/0x2e0 [<ffffffff81075887>] ? current_fs_time+0x27/0x30 [<ffffffff8111c210>] __generic_file_aio_write+0x260/0x490 [<ffffffffa0a93d9c>] ? lov_lock_enqueue+0xbc/0x170 [lov] [<ffffffff8111c4c8>] generic_file_aio_write+0x88/0x100 [<ffffffffa0f634a2>] vvp_io_write_start+0x102/0x3f0 [lustre] [<ffffffffa05476aa>] cl_io_start+0x6a/0x140 [obdclass] [<ffffffffa054aea4>] cl_io_loop+0xb4/0x1b0 [obdclass] [<ffffffffa0f00297>] ll_file_io_generic+0x407/0x8d0 [lustre] [<ffffffffa05406c9>] ? cl_env_get+0x29/0x350 [obdclass] [<ffffffffa0f00fa3>] ll_file_aio_write+0x133/0x2b0 [lustre] [<ffffffffa0f01279>] ll_file_write+0x159/0x290 [lustre] [<ffffffff81181398>] vfs_write+0xb8/0x1a0 [<ffffffff81181c91>] sys_write+0x51/0x90 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
In balance_dirty_pages(), it tries to write back some dirty pages between after write_end(). However, ll_write_end() can hold the page to add it into commit queue and causes the problem.
We can fix the problem by releasing the page in ll_write_end() if the page is already dirty.
Patch is coming.
Attachments
Issue Links
- is related to
-
LU-4873 Lustre client hangs in vvp_page_make_ready
-
- Resolved
-