[LU-6797] parallel-scale-nfsv4 test_compilebench: ll_ost_io process in D state Created: 02/Jul/15  Updated: 01/Dec/17

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0, Lustre 2.11.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None
Environment:

server: lustre-master build # 3072 EL7
client: SLES11 SP3


Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for sarah_lw <wei3.liu@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/70d965d4-15a7-11e5-8d31-5254006e85c2.

The sub-test test_compilebench failed with the following error:

test failed to respond and timed out

OST dmesg

[21748.053645] ll_ost_io00_010 D ffff88007fc13680     0  9907      2 0x00000080
[21748.053645]  ffff88006d86b758 0000000000000046 ffff88006d86bfd8 0000000000013680
[21748.053645]  ffff88006d86bfd8 0000000000013680 ffff8800626fb8e0 ffff88007fc13f48
[21748.053645]  ffff88007ff616f0 0000000000000002 ffffffffa013a720 ffff88006d86b7d0
[21748.053645] Call Trace:
[21748.053645]  [<ffffffffa013a720>] ? start_this_handle+0x5d0/0x5d0 [jbd2]
[21748.053645]  [<ffffffff8160a70d>] io_schedule+0x9d/0x140
[21748.053645]  [<ffffffffa013a72e>] sleep_on_shadow_bh+0xe/0x20 [jbd2]
[21748.053645]  [<ffffffff816084d0>] __wait_on_bit+0x60/0x90
[21748.053645]  [<ffffffffa10dc76d>] ? ldiskfs_ext_find_extent+0x23d/0x2e0 [ldiskfs]
[21748.053645]  [<ffffffffa013a720>] ? start_this_handle+0x5d0/0x5d0 [jbd2]
[21748.053645]  [<ffffffff81608587>] out_of_line_wait_on_bit+0x87/0xb0
[21748.053645]  [<ffffffff81098390>] ? autoremove_wake_function+0x40/0x40
[21748.053645]  [<ffffffffa013bc1d>] do_get_write_access+0x2ed/0x500 [jbd2]
[21748.053645]  [<ffffffff811fa56d>] ? __getblk+0x2d/0x2e0
[21748.053645]  [<ffffffffa013be57>] jbd2_journal_get_write_access+0x27/0x40 [jbd2]
[21748.053645]  [<ffffffffa109226b>] __ldiskfs_journal_get_write_access+0x3b/0x80 [ldiskfs]
[21748.053645]  [<ffffffffa10a5990>] ldiskfs_reserve_inode_write+0x70/0xa0 [ldiskfs]
[21748.099070]  [<ffffffffa10a8f90>] ? ldiskfs_dirty_inode+0x40/0x60 [ldiskfs]
[21748.099070]  [<ffffffffa10a5a13>] ldiskfs_mark_inode_dirty+0x53/0x210 [ldiskfs]
[21748.099070]  [<ffffffffa10a8f90>] ldiskfs_dirty_inode+0x40/0x60 [ldiskfs]
[21748.099070]  [<ffffffffa115590f>] osd_attr_set+0x1df/0x630 [osd_ldiskfs]
[21748.099070]  [<ffffffffa0f56d22>] ofd_commitrw_write+0x11a2/0x1b00 [ofd]
[21748.099070]  [<ffffffffa0f5a301>] ofd_commitrw+0x511/0xa30 [ofd]
[21748.099070]  [<ffffffffa09e5c55>] obd_commitrw+0x2ef/0x332 [ptlrpc]
[21748.099070]  [<ffffffffa09c8cd3>] tgt_brw_write+0xea3/0x1650 [ptlrpc]
[21748.099070]  [<ffffffff812dfbab>] ? string.isra.6+0x3b/0xf0
[21748.099070]  [<ffffffffa0925b60>] ? target_send_reply_msg+0x130/0x130 [ptlrpc]
[21748.099070]  [<ffffffffa09c487b>] tgt_request_handle+0x7eb/0x1070 [ptlrpc]
[21748.099070]  [<ffffffffa097472b>] ptlrpc_server_handle_request+0x21b/0xa90 [ptlrpc]
[21748.099070]  [<ffffffffa09722a8>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc]
[21748.099070]  [<ffffffffa0978918>] ptlrpc_main+0xaf8/0x1ea0 [ptlrpc]
[21748.099070]  [<ffffffff810ad8b6>] ? __dequeue_entity+0x26/0x40
[21748.099070]  [<ffffffffa0977e20>] ? ptlrpc_register_service+0xf00/0xf00 [ptlrpc]
[21748.099070]  [<ffffffff8109739f>] kthread+0xcf/0xe0
[21748.099070]  [<ffffffff810972d0>] ? kthread_create_on_node+0x140/0x140
[21748.099070]  [<ffffffff81614f7c>] ret_from_fork+0x7c/0xb0
[21748.099070]  [<ffffffff810972d0>] ? kthread_create_on_node+0x140/0x140
[21748.099070] ll_ost_io00_011 W ffff88007fd13680     0  9908      2 0x00000080
[21748.099070]  ffff88006d877d38 0000000000000046 ffff88006d877fd8 0000000000013680
[21748.099070]  ffff88006d877fd8 0000000000013680 ffff8800626fc440 0000000000000000
[21748.099070]  ffff8800626fc440 ffff88000fbc4c00 ffff8800730c3c68 ffff8800730c3c00
[21748.099070] Call Trace:
[21748.099070]  [<ffffffff8160a409>] schedule+0x29/0x70
[21748.099070]  [<ffffffffa0972535>] ptlrpc_wait_event+0x325/0x340 [ptlrpc]
[21748.099070]  [<ffffffff810a9650>] ? wake_up_state+0x20/0x20
[21748.099070]  [<ffffffffa0978637>] ptlrpc_main+0x817/0x1ea0 [ptlrpc]
[21748.099070]  [<ffffffff810ad8b6>] ? __dequeue_entity+0x26/0x40
[21748.099070]  [<ffffffffa0977e20>] ? ptlrpc_register_service+0xf00/0xf00 [ptlrpc]
[21748.099070]  [<ffffffff8109739f>] kthread+0xcf/0xe0
[21748.099070]  [<ffffffff810972d0>] ? kthread_create_on_node+0x140/0x140
[21748.099070]  [<ffffffff81614f7c>] ret_from_fork+0x7c/0xb0
[21748.099070]  [<ffffffff810972d0>] ? kthread_create_on_node+0x140/0x140
[21748.099070] ll_ost_io00_012 S ffff88007fc13680     0  9909      2 0x00000080
[21748.099070]  ffff880068f4fd38 0000000000000046 ffff880068f4ffd8 0000000000013680
[21748.099070]  ffff880068f4ffd8 0000000000013680 ffff880077ea2220 0000000000000000
[21748.099070]  ffff880077ea2220 ffff880072a19c00 ffff8800730c3c68 ffff8800730c3c00
[21748.099070] Call Trace:
[21748.099070]  [<ffffffff8160a409>] schedule+0x29/0x70
[21748.099070]  [<ffffffffa0972535>] ptlrpc_wait_event+0x325/0x340 [ptlrpc]
[21748.099070]  [<ffffffff810a9650>] ? wake_up_state+0x20/0x20
[21748.099070]  [<ffffffffa0978637>] ptlrpc_main+0x817/0x1ea0 [ptlrpc]
[21748.099070]  [<ffffffff810ad8b6>] ? __dequeue_entity+0x26/0x40
[21748.099070]  [<ffffffffa0977e20>] ? ptlrpc_register_service+0xf00/0xf00 [ptlrpc]
[21748.099070]  [<ffffffff8109739f>] kthread+0xcf/0xe0
[21748.099070]  [<ffffffff810972d0>] ? kthread_create_on_node+0x140/0x140
[21748.099070]  [<ffffffff81614f7c>] ret_from_fork+0x7c/0xb0
[21748.099070]  [<ffffffff810972d0>] ? kthread_create_on_node+0x140/0x140
[21748.099070] ll_ost_io00_013 D ffff88007fc13680     0  9910      2 0x00000080
[21748.099070]  ffff88006d9cf958 0000000000000046 ffff88006d9cffd8 0000000000013680
[21748.099070]  ffff88006d9cffd8 0000000000013680 ffff880077ea38e0 ffff88000b324000
[21748.099070]  ffff88000b3249c0 ffff880006260000 0000000000000000 ffff88007bf0ec90
[21748.099070] Call Trace:
[21748.099070]  [<ffffffff8160a409>] schedule+0x29/0x70
[21748.099070]  [<ffffffffa1156905>] osd_trans_stop+0x1b5/0x550 [osd_ldiskfs]
[21748.099070]  [<ffffffff81098350>] ? wake_up_bit+0x30/0x30
[21748.099070]  [<ffffffffa0f5043f>] ofd_trans_stop+0x1f/0x60 [ofd]
[21748.099070]  [<ffffffffa0f55f68>] ofd_commitrw_write+0x3e8/0x1b00 [ofd]
[21748.099070]  [<ffffffffa0f5a301>] ofd_commitrw+0x511/0xa30 [ofd]
[21748.099070]  [<ffffffffa09e5c55>] obd_commitrw+0x2ef/0x332 [ptlrpc]
[21748.099070]  [<ffffffffa09c8cd3>] tgt_brw_write+0xea3/0x1650 [ptlrpc]
[21748.099070]  [<ffffffff812dfbab>] ? string.isra.6+0x3b/0xf0
[21748.099070]  [<ffffffffa0925b60>] ? target_send_reply_msg+0x130/0x130 [ptlrpc]
[21748.099070]  [<ffffffffa09c487b>] tgt_request_handle+0x7eb/0x1070 [ptlrpc]
[21748.099070]  [<ffffffffa097472b>] ptlrpc_server_handle_request+0x21b/0xa90 [ptlrpc]
[21748.099070]  [<ffffffffa09722a8>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc]
[21748.099070]  [<ffffffffa0978918>] ptlrpc_main+0xaf8/0x1ea0 [ptlrpc]
[21748.099070]  [<ffffffff810ad8b6>] ? __dequeue_entity+0x26/0x40
[21748.099070]  [<ffffffffa0977e20>] ? ptlrpc_register_service+0xf00/0xf00 [ptlrpc]
[21748.099070]  [<ffffffff8109739f>] kthread+0xcf/0xe0
[21748.099070]  [<ffffffff810972d0>] ? kthread_create_on_node+0x140/0x140
[21748.099070]  [<ffffffff81614f7c>] ret_from_fork+0x7c/0xb0
[21748.099070]  [<ffffffff810972d0>] ? kthread_create_on_node+0x140/0x140


 Comments   
Comment by Sarah Liu [ 23/Sep/15 ]

another instance seen in replay-ost-single
https://testing.hpdd.intel.com/test_sets/91ea9c36-6140-11e5-96b6-5254006e85c2

client and server: lustre-master build# 3192 ZFS DKMS

Generated at Sat Feb 10 02:03:19 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.