[LU-4873] Lustre client hangs in vvp_page_make_ready Created: 08/Apr/14 Updated: 29/Apr/14 Resolved: 17/Apr/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.6.0 |
| Fix Version/s: | Lustre 2.6.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Shuichi Ihara (Inactive) | Assignee: | Niu Yawei (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 13466 | ||||||||
| Description |
|
Application hangs in vvp_page_make_ready and dumps call traces. Apr 8 01:27:04 r21 kernel: Call Trace: Apr 8 01:27:04 r21 kernel: [<ffffffff81055f96>] ? enqueue_task+0x66/0x80 Apr 8 01:27:04 r21 kernel: [<ffffffff810a2431>] ? ktime_get_ts+0xb1/0xf0 Apr 8 01:27:04 r21 kernel: [<ffffffff81119e10>] ? sync_page+0x0/0x50 Apr 8 01:27:04 r21 kernel: [<ffffffff8150e953>] io_schedule+0x73/0xc0 Apr 8 01:27:04 r21 kernel: [<ffffffff81119e4d>] sync_page+0x3d/0x50 Apr 8 01:27:04 r21 kernel: [<ffffffff8150f1ba>] __wait_on_bit_lock+0x5a/0xc0 Apr 8 01:27:04 r21 kernel: [<ffffffff81119de7>] __lock_page+0x67/0x70 Apr 8 01:27:04 r21 kernel: [<ffffffff81096de0>] ? wake_bit_function+0x0/0x50 Apr 8 01:27:04 r21 kernel: [<ffffffffa1055cc1>] vvp_page_make_ready+0x271/0x280 [lustre] Apr 8 01:27:04 r21 kernel: [<ffffffffa0e89149>] cl_page_make_ready+0x89/0x370 [obdclass] Apr 8 01:27:04 r21 kernel: [<ffffffff81055ad3>] ? __wake_up+0x53/0x70 Apr 8 01:27:04 r21 kernel: [<ffffffffa0ca069e>] osc_extent_make_ready+0x34e/0xc00 [osc] Apr 8 01:27:04 r21 kernel: [<ffffffffa0c9c113>] ? on_list+0x43/0x50 [osc] Apr 8 01:27:04 r21 kernel: [<ffffffffa0c9c1cd>] ? __osc_list_maint+0xad/0x150 [osc] Apr 8 01:27:04 r21 kernel: [<ffffffff81055ad3>] ? __wake_up+0x53/0x70 Apr 8 01:27:04 r21 kernel: [<ffffffffa0ca4826>] osc_io_unplug0+0x15e6/0x1f00 [osc] Apr 8 01:27:04 r21 kernel: [<ffffffffa0c9e1ad>] ? osc_extent_tree_dump0+0x28d/0xb20 [osc] Apr 8 01:27:04 r21 kernel: [<ffffffffa0ca6ec1>] osc_io_unplug+0x11/0x20 [osc] Apr 8 01:27:04 r21 kernel: [<ffffffffa0ca709d>] osc_queue_sync_pages+0x1cd/0x350 [osc] Apr 8 01:27:04 r21 kernel: [<ffffffffa0c985d7>] osc_io_submit+0x1c7/0x4b0 [osc] Apr 8 01:27:04 r21 kernel: [<ffffffffa0e92b3e>] cl_io_submit_rw+0x6e/0x160 [obdclass] Apr 8 01:27:04 r21 kernel: [<ffffffffa0d2c360>] lov_io_submit+0x2d0/0x4b0 [lov] Apr 8 01:27:04 r21 kernel: [<ffffffffa0d0f874>] ? lov_merge_lvb_kms+0x124/0x530 [lov] Apr 8 01:27:04 r21 kernel: [<ffffffffa0e92b3e>] cl_io_submit_rw+0x6e/0x160 [obdclass] Apr 8 01:27:04 r21 kernel: [<ffffffffa0e95ae7>] cl_io_submit_sync+0x87/0x1b0 [obdclass] Apr 8 01:27:04 r21 kernel: [<ffffffffa1026adc>] ll_page_sync_io+0x5c/0x110 [lustre] Apr 8 01:27:04 r21 kernel: [<ffffffffa0d2009e>] ? lov_attr_get+0x1e/0x60 [lov] Apr 8 01:27:04 r21 kernel: [<ffffffffa1045561>] ll_write_begin+0x5c1/0x760 [lustre] Apr 8 01:27:04 r21 kernel: [<ffffffff8111a7b3>] generic_file_buffered_write+0x123/0x2e0 Apr 8 01:27:04 r21 kernel: [<ffffffff81075887>] ? current_fs_time+0x27/0x30 Apr 8 01:27:04 r21 kernel: [<ffffffff8111c210>] __generic_file_aio_write+0x260/0x490 Apr 8 01:27:04 r21 kernel: [<ffffffff8111c4c8>] generic_file_aio_write+0x88/0x100 Apr 8 01:27:04 r21 kernel: [<ffffffffa105907b>] vvp_io_write_start+0xdb/0x3d0 [lustre] Apr 8 01:27:04 r21 kernel: [<ffffffffa0e92dca>] cl_io_start+0x6a/0x140 [obdclass] Apr 8 01:27:04 r21 kernel: [<ffffffffa0e96f54>] cl_io_loop+0xb4/0x1b0 [obdclass] Apr 8 01:27:04 r21 kernel: [<ffffffffa0ff9f36>] ll_file_io_generic+0x2b6/0x710 [lustre] Apr 8 01:27:04 r21 kernel: [<ffffffffa0e86d09>] ? cl_env_get+0x29/0x350 [obdclass] Apr 8 01:27:04 r21 kernel: [<ffffffffa0ffa4d2>] ll_file_aio_write+0x142/0x2c0 [lustre] Apr 8 01:27:04 r21 kernel: [<ffffffffa0ffad6c>] ll_file_write+0x16c/0x2a0 [lustre] Apr 8 01:27:04 r21 kernel: [<ffffffff81181398>] vfs_write+0xb8/0x1a0 Apr 8 01:27:04 r21 kernel: [<ffffffff81181c91>] sys_write+0x51/0x90 Apr 8 01:27:04 r21 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b Apr 8 01:29:04 r21 kernel: INFO: task pre:46376 blocked for more than 120 seconds. Apr 8 01:29:04 r21 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Apr 8 01:29:04 r21 kernel: pre D 000000000000000c 0 46376 46345 0x00000004 Apr 8 01:29:04 r21 kernel: ffff8807f19f7388 0000000000000086 ffff8807f19f72f8 ffff880b97c9b500 Apr 8 01:29:04 r21 kernel: ffff88085c456700 0000000000000003 ffff8807f19f7328 ffffffff81055f96 Apr 8 01:29:04 r21 kernel: ffff880560820638 ffff8807f19f7fd8 000000000000fb88 ffff880560820638 Apr 8 01:29:04 r21 kernel: Call Trace: |
| Comments |
| Comment by Peter Jones [ 09/Apr/14 ] |
|
Niu Could you please comment on this issues? Thanks Peter |
| Comment by Niu Yawei (Inactive) [ 09/Apr/14 ] |
/* To avoid deadlock, try to lock page first. */ vmpage = grab_cache_page_nowait(mapping, index); if (unlikely(vmpage == NULL || PageDirty(vmpage) || PageWriteback(vmpage))) { struct ccc_io *cio = ccc_env_io(env); struct cl_page_list *plist = &cio->u.write.cui_queue; /* if the page is already in dirty cache, we have to commit * the pages right now; otherwise, it may cause deadlock * because it holds page lock of a dirty page and request for * more grants. It's okay for the dirty page to be the first * one in commit page list, though. */ if (vmpage != NULL && plist->pl_nr > 0) { unlock_page(vmpage); page_cache_release(vmpage); vmpage = NULL; } /* commit pages and then wait for page lock */ result = vvp_io_write_commit(env, io); if (result < 0) GOTO(out, result); Look into the ll_write_begin(), if the page is in writeback and cui_queue->pl_nr == 0, it could call vvp_io_write_commit() with page locked? Xiong, is it possible? please take a look, thanks. |
| Comment by Jinshan Xiong (Inactive) [ 10/Apr/14 ] |
|
If plist->nr is zero, then there is no page in pending, so vvp_io_write_commit() is a null op essentially. |
| Comment by Jinshan Xiong (Inactive) [ 10/Apr/14 ] |
|
the root cause of this issue is as follows: Deadlocked. I will create a patch soon. |
| Comment by Jinshan Xiong (Inactive) [ 10/Apr/14 ] |
|
can you please try patch: http://review.whamcloud.com/9928? |
| Comment by Shuichi Ihara (Inactive) [ 10/Apr/14 ] |
|
I just tested patch and didn't reproduce same issue so far. Originally, this problem 100% didn't happen, so, I will keep on testing, but it seems patch can solve this problem. |
| Comment by Shuichi Ihara (Inactive) [ 11/Apr/14 ] |
|
more tested, but nothing reproduced issue so far. the patch solved problem. Thanks! |
| Comment by Jinshan Xiong (Inactive) [ 17/Apr/14 ] |
|
patch landed. |