[LU-9515] sanity-benchmark test_iozone: test failed to respond and timed out Created: 16/May/17 Updated: 20/May/19 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.10.0, Lustre 2.10.1 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | James Casper | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Environment: |
trevis-35, full, ZFS |
||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
https://testing.hpdd.intel.com/test_sessions/df55763f-2960-40d5-b78d-bd088d00e6e3 Several hung processes on the client side. Here are two of the longer traces: From client dmesg: [27722.373633] jbd2/vda1-8 D ffffffff8168a1e0 0 272 2 0x00000000 [27722.375266] ffff880036113ac0 0000000000000046 ffff8800360e3ec0 ffff880036113fd8 [27722.376932] ffff880036113fd8 ffff880036113fd8 ffff8800360e3ec0 ffff88007fd16c40 [27722.378628] 0000000000000000 7fffffffffffffff ffff88007ff5b9d0 ffffffff8168a1e0 [27722.380339] Call Trace: [27722.381566] [<ffffffff8168a1e0>] ? bit_wait+0x50/0x50 [27722.383076] [<ffffffff8168c169>] schedule+0x29/0x70 [27722.384533] [<ffffffff81689bc9>] schedule_timeout+0x239/0x2c0 [27722.386051] [<ffffffff81060c1f>] ? kvm_clock_get_cycles+0x1f/0x30 [27722.387580] [<ffffffff8168a1e0>] ? bit_wait+0x50/0x50 [27722.389031] [<ffffffff8168b70e>] io_schedule_timeout+0xae/0x130 [27722.390525] [<ffffffff8168b7a8>] io_schedule+0x18/0x20 [27722.391966] [<ffffffff8168a1f1>] bit_wait_io+0x11/0x50 [27722.393394] [<ffffffff81689d15>] __wait_on_bit+0x65/0x90 [27722.394833] [<ffffffff8168a1e0>] ? bit_wait+0x50/0x50 [27722.396245] [<ffffffff81689dc1>] out_of_line_wait_on_bit+0x81/0xb0 [27722.397743] [<ffffffff810b1be0>] ? wake_bit_function+0x40/0x40 [27722.399215] [<ffffffff8123338a>] __wait_on_buffer+0x2a/0x30 [27722.400687] [<ffffffffa01a6742>] jbd2_journal_commit_transaction+0x1752/0x19a0 [jbd2] [27722.402312] [<ffffffff81029569>] ? __switch_to+0xd9/0x4c0 [27722.403795] [<ffffffffa01aae99>] kjournald2+0xc9/0x260 [jbd2] [27722.405260] [<ffffffff810b1b20>] ? wake_up_atomic_t+0x30/0x30 [27722.406757] [<ffffffffa01aadd0>] ? commit_timeout+0x10/0x10 [jbd2] [27722.408218] [<ffffffff810b0a4f>] kthread+0xcf/0xe0 [27722.409639] [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140 [27722.411170] [<ffffffff816970d8>] ret_from_fork+0x58/0x90 [27722.412601] [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140 and [27723.879985] in:imjournal D ffffffff8168a1e0 0 878 1 0x00000080 [27723.881530] ffff88007b9bf9b0 0000000000000082 ffff88007b9caf10 ffff88007b9bffd8 [27723.883152] ffff88007b9bffd8 ffff88007b9bffd8 ffff88007b9caf10 ffff88007fc16c40 [27723.884762] 0000000000000000 7fffffffffffffff ffff88007ff607e8 ffffffff8168a1e0 [27723.886380] Call Trace: [27723.887549] [<ffffffff8168a1e0>] ? bit_wait+0x50/0x50 [27723.888943] [<ffffffff8168c169>] schedule+0x29/0x70 [27723.890306] [<ffffffff81689bc9>] schedule_timeout+0x239/0x2c0 [27723.891727] [<ffffffff810b1ae5>] ? wake_up_bit+0x25/0x30 [27723.893135] [<ffffffff81060c1f>] ? kvm_clock_get_cycles+0x1f/0x30 [27723.894586] [<ffffffff810eb08c>] ? ktime_get_ts64+0x4c/0xf0 [27723.896001] [<ffffffff8168a1e0>] ? bit_wait+0x50/0x50 [27723.897347] [<ffffffff8168b70e>] io_schedule_timeout+0xae/0x130 [27723.898774] [<ffffffff8168b7a8>] io_schedule+0x18/0x20 [27723.900127] [<ffffffff8168a1f1>] bit_wait_io+0x11/0x50 [27723.901463] [<ffffffff81689d15>] __wait_on_bit+0x65/0x90 [27723.902809] [<ffffffff81180191>] wait_on_page_bit+0x81/0xa0 [27723.904175] [<ffffffff810b1be0>] ? wake_bit_function+0x40/0x40 [27723.905558] [<ffffffff8119126b>] truncate_inode_pages_range+0x3bb/0x740 [27723.907030] [<ffffffffa01fab8c>] ? __ext4_journal_stop+0x3c/0xb0 [ext4] [27723.908476] [<ffffffff812630ea>] ? __dquot_initialize+0x3a/0x1c0 [27723.909876] [<ffffffff8119166e>] truncate_inode_pages_final+0x5e/0x90 [27723.911317] [<ffffffffa01cb46c>] ext4_evict_inode+0x10c/0x4d0 [ext4] [27723.912746] [<ffffffff8121a767>] evict+0xa7/0x170 [27723.914021] [<ffffffff8121b005>] iput+0xf5/0x180 [27723.915286] [<ffffffff81215ae8>] dentry_kill+0x168/0x1b0 [27723.916611] [<ffffffff81215b8c>] dput+0x5c/0xd0 [27723.917853] [<ffffffff8120ff82>] SYSC_renameat2+0x4e2/0x570 [27723.919175] [<ffffffff811b6cf7>] ? do_munmap+0x2c7/0x420 [27723.920469] [<ffffffff81210dce>] SyS_renameat2+0xe/0x10 [27723.921753] [<ffffffff81210e0e>] SyS_rename+0x1e/0x20 [27723.923014] [<ffffffff81697189>] system_call_fastpath+0x16/0x1b |
| Comments |
| Comment by Sarah Liu [ 19/May/17 ] |
|
I think this is a dup of 27821.541469] ll_ost_io00_051 D 0000000000000000 0 4155 2 0x00000080 [27821.543196] ffff8800056338b0 0000000000000046 ffff880005628000 ffff880005633fd8 [27821.545042] ffff880005633fd8 ffff880005633fd8 ffff880005628000 ffff88003fa43ac0 [27821.546908] ffff88003fa43a10 ffff88003fa43ac8 ffff88003fa43a38 0000000000000000 [27821.548720] Call Trace: [27821.550057] [<ffffffff8168c3c9>] schedule+0x29/0x70 [27821.551641] [<ffffffffa06ee67d>] cv_wait_common+0x10d/0x130 [spl] [27821.553295] [<ffffffff810b1b20>] ? wake_up_atomic_t+0x30/0x30 [27821.554898] [<ffffffffa06ee6b5>] __cv_wait+0x15/0x20 [spl] [27821.556527] [<ffffffffa07fd04f>] txg_wait_synced+0xdf/0x120 [zfs] [27821.558233] [<ffffffffa0fb0558>] osd_trans_stop+0x468/0x590 [osd_zfs] [27821.559887] [<ffffffffa10b1a0f>] ofd_trans_stop+0x1f/0x60 [ofd] [27821.561510] [<ffffffffa10b7be4>] ofd_commitrw_write+0x7e4/0x1c80 [ofd] [27821.563249] [<ffffffffa10bbc3f>] ofd_commitrw+0x4af/0xa80 [ofd] [27821.564985] [<ffffffffa0b3f589>] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [27821.566741] [<ffffffffa0e1d57f>] obd_commitrw+0x2ed/0x330 [ptlrpc] [27821.568418] [<ffffffffa0df1290>] tgt_brw_write+0x1010/0x1700 [ptlrpc] [27821.570050] [<ffffffff8132384b>] ? string.isra.7+0x3b/0xf0 [27821.571590] [<ffffffffa0d438a0>] ? target_send_reply_msg+0x170/0x170 [ptlrpc] [27821.573353] [<ffffffffa0ded615>] tgt_request_handle+0x915/0x1320 [ptlrpc] [27821.575082] [<ffffffffa0d95eab>] ptlrpc_server_handle_request+0x21b/0xa90 [ptlrpc] [27821.576847] [<ffffffffa0d93a58>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc] [27821.578513] [<ffffffff810c54d2>] ? default_wake_function+0x12/0x20 [27821.580135] [<ffffffff810ba628>] ? __wake_up_common+0x58/0x90 [27821.581723] [<ffffffffa0d99ed0>] ptlrpc_main+0xaa0/0x1de0 [ptlrpc] [27821.583375] [<ffffffffa0d99430>] ? ptlrpc_register_service+0xe60/0xe60 [ptlrpc] [27821.585117] [<ffffffff810b0a4f>] kthread+0xcf/0xe0 [27821.586606] [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140 [27821.588255] [<ffffffff81697318>] ret_from_fork+0x58/0x90 [27821.589775] [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140 [27821.591455] ll_ost_io00_052 S ffff8800592dd800 0 4156 2 0x00000080 [27821.593137] ffff88000544bd38 0000000000000046 ffff880005628fb0 ffff88000544bfd8 [27821.594891] ffff88000544bfd8 ffff88000544bfd8 ffff880005628fb0 0000000000000000 [27821.596674] ffff880005628fb0 ffff8800667b2400 ffff8800592dd868 ffff8800592dd800 [27821.598448] Call Trace: [27821.599700] [<ffffffff8168c3c9>] schedule+0x29/0x70 [27821.601297] [<ffffffffa0d93ce5>] ptlrpc_wait_event+0x325/0x340 [ptlrpc] [27821.602935] [<ffffffff810c54c0>] ? wake_up_state+0x20/0x20 [27821.604552] [<ffffffffa0d99c3b>] ptlrpc_main+0x80b/0x1de0 [ptlrpc] [27821.606193] [<ffffffff81029569>] ? __switch_to+0xd9/0x4c0 [27821.607804] [<ffffffffa0d99430>] ? ptlrpc_register_service+0xe60/0xe60 [ptlrpc] [27821.609516] [<ffffffff810b0a4f>] kthread+0xcf/0xe0 [27821.611097] [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140 [27821.612804] [<ffffffff81697318>] ret_from_fork+0x58/0x90 [27821.614308] [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140 |