[LU-9515] sanity-benchmark test_iozone: test failed to respond and timed out Created: 16/May/17  Updated: 20/May/19

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.0, Lustre 2.10.1
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: James Casper Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None
Environment:

trevis-35, full, ZFS
EL7, master branch, v2.9.57, b3575


Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

https://testing.hpdd.intel.com/test_sessions/df55763f-2960-40d5-b78d-bd088d00e6e3

Several hung processes on the client side. Here are two of the longer traces:

From client dmesg:

[27722.373633] jbd2/vda1-8     D ffffffff8168a1e0     0   272      2 0x00000000
[27722.375266]  ffff880036113ac0 0000000000000046 ffff8800360e3ec0 ffff880036113fd8
[27722.376932]  ffff880036113fd8 ffff880036113fd8 ffff8800360e3ec0 ffff88007fd16c40
[27722.378628]  0000000000000000 7fffffffffffffff ffff88007ff5b9d0 ffffffff8168a1e0
[27722.380339] Call Trace:
[27722.381566]  [<ffffffff8168a1e0>] ? bit_wait+0x50/0x50
[27722.383076]  [<ffffffff8168c169>] schedule+0x29/0x70
[27722.384533]  [<ffffffff81689bc9>] schedule_timeout+0x239/0x2c0
[27722.386051]  [<ffffffff81060c1f>] ? kvm_clock_get_cycles+0x1f/0x30
[27722.387580]  [<ffffffff8168a1e0>] ? bit_wait+0x50/0x50
[27722.389031]  [<ffffffff8168b70e>] io_schedule_timeout+0xae/0x130
[27722.390525]  [<ffffffff8168b7a8>] io_schedule+0x18/0x20
[27722.391966]  [<ffffffff8168a1f1>] bit_wait_io+0x11/0x50
[27722.393394]  [<ffffffff81689d15>] __wait_on_bit+0x65/0x90
[27722.394833]  [<ffffffff8168a1e0>] ? bit_wait+0x50/0x50
[27722.396245]  [<ffffffff81689dc1>] out_of_line_wait_on_bit+0x81/0xb0
[27722.397743]  [<ffffffff810b1be0>] ? wake_bit_function+0x40/0x40
[27722.399215]  [<ffffffff8123338a>] __wait_on_buffer+0x2a/0x30
[27722.400687]  [<ffffffffa01a6742>] jbd2_journal_commit_transaction+0x1752/0x19a0 [jbd2]
[27722.402312]  [<ffffffff81029569>] ? __switch_to+0xd9/0x4c0
[27722.403795]  [<ffffffffa01aae99>] kjournald2+0xc9/0x260 [jbd2]
[27722.405260]  [<ffffffff810b1b20>] ? wake_up_atomic_t+0x30/0x30
[27722.406757]  [<ffffffffa01aadd0>] ? commit_timeout+0x10/0x10 [jbd2]
[27722.408218]  [<ffffffff810b0a4f>] kthread+0xcf/0xe0
[27722.409639]  [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140
[27722.411170]  [<ffffffff816970d8>] ret_from_fork+0x58/0x90
[27722.412601]  [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140

and

[27723.879985] in:imjournal    D ffffffff8168a1e0     0   878      1 0x00000080
[27723.881530]  ffff88007b9bf9b0 0000000000000082 ffff88007b9caf10 ffff88007b9bffd8
[27723.883152]  ffff88007b9bffd8 ffff88007b9bffd8 ffff88007b9caf10 ffff88007fc16c40
[27723.884762]  0000000000000000 7fffffffffffffff ffff88007ff607e8 ffffffff8168a1e0
[27723.886380] Call Trace:
[27723.887549]  [<ffffffff8168a1e0>] ? bit_wait+0x50/0x50
[27723.888943]  [<ffffffff8168c169>] schedule+0x29/0x70
[27723.890306]  [<ffffffff81689bc9>] schedule_timeout+0x239/0x2c0
[27723.891727]  [<ffffffff810b1ae5>] ? wake_up_bit+0x25/0x30
[27723.893135]  [<ffffffff81060c1f>] ? kvm_clock_get_cycles+0x1f/0x30
[27723.894586]  [<ffffffff810eb08c>] ? ktime_get_ts64+0x4c/0xf0
[27723.896001]  [<ffffffff8168a1e0>] ? bit_wait+0x50/0x50
[27723.897347]  [<ffffffff8168b70e>] io_schedule_timeout+0xae/0x130
[27723.898774]  [<ffffffff8168b7a8>] io_schedule+0x18/0x20
[27723.900127]  [<ffffffff8168a1f1>] bit_wait_io+0x11/0x50
[27723.901463]  [<ffffffff81689d15>] __wait_on_bit+0x65/0x90
[27723.902809]  [<ffffffff81180191>] wait_on_page_bit+0x81/0xa0
[27723.904175]  [<ffffffff810b1be0>] ? wake_bit_function+0x40/0x40
[27723.905558]  [<ffffffff8119126b>] truncate_inode_pages_range+0x3bb/0x740
[27723.907030]  [<ffffffffa01fab8c>] ? __ext4_journal_stop+0x3c/0xb0 [ext4]
[27723.908476]  [<ffffffff812630ea>] ? __dquot_initialize+0x3a/0x1c0
[27723.909876]  [<ffffffff8119166e>] truncate_inode_pages_final+0x5e/0x90
[27723.911317]  [<ffffffffa01cb46c>] ext4_evict_inode+0x10c/0x4d0 [ext4]
[27723.912746]  [<ffffffff8121a767>] evict+0xa7/0x170
[27723.914021]  [<ffffffff8121b005>] iput+0xf5/0x180
[27723.915286]  [<ffffffff81215ae8>] dentry_kill+0x168/0x1b0
[27723.916611]  [<ffffffff81215b8c>] dput+0x5c/0xd0
[27723.917853]  [<ffffffff8120ff82>] SYSC_renameat2+0x4e2/0x570
[27723.919175]  [<ffffffff811b6cf7>] ? do_munmap+0x2c7/0x420
[27723.920469]  [<ffffffff81210dce>] SyS_renameat2+0xe/0x10
[27723.921753]  [<ffffffff81210e0e>] SyS_rename+0x1e/0x20
[27723.923014]  [<ffffffff81697189>] system_call_fastpath+0x16/0x1b


 Comments   
Comment by Sarah Liu [ 19/May/17 ]

I think this is a dup of LU-9247
OSS dmesg

27821.541469] ll_ost_io00_051 D 0000000000000000     0  4155      2 0x00000080
[27821.543196]  ffff8800056338b0 0000000000000046 ffff880005628000 ffff880005633fd8
[27821.545042]  ffff880005633fd8 ffff880005633fd8 ffff880005628000 ffff88003fa43ac0
[27821.546908]  ffff88003fa43a10 ffff88003fa43ac8 ffff88003fa43a38 0000000000000000
[27821.548720] Call Trace:
[27821.550057]  [<ffffffff8168c3c9>] schedule+0x29/0x70
[27821.551641]  [<ffffffffa06ee67d>] cv_wait_common+0x10d/0x130 [spl]
[27821.553295]  [<ffffffff810b1b20>] ? wake_up_atomic_t+0x30/0x30
[27821.554898]  [<ffffffffa06ee6b5>] __cv_wait+0x15/0x20 [spl]
[27821.556527]  [<ffffffffa07fd04f>] txg_wait_synced+0xdf/0x120 [zfs]
[27821.558233]  [<ffffffffa0fb0558>] osd_trans_stop+0x468/0x590 [osd_zfs]
[27821.559887]  [<ffffffffa10b1a0f>] ofd_trans_stop+0x1f/0x60 [ofd]
[27821.561510]  [<ffffffffa10b7be4>] ofd_commitrw_write+0x7e4/0x1c80 [ofd]
[27821.563249]  [<ffffffffa10bbc3f>] ofd_commitrw+0x4af/0xa80 [ofd]
[27821.564985]  [<ffffffffa0b3f589>] ? lprocfs_counter_add+0xf9/0x160 [obdclass]
[27821.566741]  [<ffffffffa0e1d57f>] obd_commitrw+0x2ed/0x330 [ptlrpc]
[27821.568418]  [<ffffffffa0df1290>] tgt_brw_write+0x1010/0x1700 [ptlrpc]
[27821.570050]  [<ffffffff8132384b>] ? string.isra.7+0x3b/0xf0
[27821.571590]  [<ffffffffa0d438a0>] ? target_send_reply_msg+0x170/0x170 [ptlrpc]
[27821.573353]  [<ffffffffa0ded615>] tgt_request_handle+0x915/0x1320 [ptlrpc]
[27821.575082]  [<ffffffffa0d95eab>] ptlrpc_server_handle_request+0x21b/0xa90 [ptlrpc]
[27821.576847]  [<ffffffffa0d93a58>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc]
[27821.578513]  [<ffffffff810c54d2>] ? default_wake_function+0x12/0x20
[27821.580135]  [<ffffffff810ba628>] ? __wake_up_common+0x58/0x90
[27821.581723]  [<ffffffffa0d99ed0>] ptlrpc_main+0xaa0/0x1de0 [ptlrpc]
[27821.583375]  [<ffffffffa0d99430>] ? ptlrpc_register_service+0xe60/0xe60 [ptlrpc]
[27821.585117]  [<ffffffff810b0a4f>] kthread+0xcf/0xe0
[27821.586606]  [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140
[27821.588255]  [<ffffffff81697318>] ret_from_fork+0x58/0x90
[27821.589775]  [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140
[27821.591455] ll_ost_io00_052 S ffff8800592dd800     0  4156      2 0x00000080
[27821.593137]  ffff88000544bd38 0000000000000046 ffff880005628fb0 ffff88000544bfd8
[27821.594891]  ffff88000544bfd8 ffff88000544bfd8 ffff880005628fb0 0000000000000000
[27821.596674]  ffff880005628fb0 ffff8800667b2400 ffff8800592dd868 ffff8800592dd800
[27821.598448] Call Trace:
[27821.599700]  [<ffffffff8168c3c9>] schedule+0x29/0x70
[27821.601297]  [<ffffffffa0d93ce5>] ptlrpc_wait_event+0x325/0x340 [ptlrpc]
[27821.602935]  [<ffffffff810c54c0>] ? wake_up_state+0x20/0x20
[27821.604552]  [<ffffffffa0d99c3b>] ptlrpc_main+0x80b/0x1de0 [ptlrpc]
[27821.606193]  [<ffffffff81029569>] ? __switch_to+0xd9/0x4c0
[27821.607804]  [<ffffffffa0d99430>] ? ptlrpc_register_service+0xe60/0xe60 [ptlrpc]
[27821.609516]  [<ffffffff810b0a4f>] kthread+0xcf/0xe0
[27821.611097]  [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140
[27821.612804]  [<ffffffff81697318>] ret_from_fork+0x58/0x90
[27821.614308]  [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140
Generated at Sat Feb 10 02:26:52 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.