lli->lli_write_mutex (single shared file performance) (LU-1669)

[LU-5360] Test timeout sanity-hsm test_13 Created: 16/Jul/14  Updated: 14/Nov/14  Resolved: 14/Nov/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.7.0
Fix Version/s: Lustre 2.7.0

Type: Technical task Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-1669 lli->lli_write_mutex (single shared f... Resolved
Rank (Obsolete): 14945

 Description   

This issue was created by maloo for Nathaniel Clark <nathaniel.l.clark@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/9244dd62-0913-11e4-939d-5254006e85c2.

The sub-test test_13 failed with the following error:

test failed to respond and timed out

Info required for matching: sanity-hsm 13

Client console log:

08:02:37:Lustre: 19701:0:(client.c:1926:ptlrpc_expire_one_request()) Skipped 7 previous similar messages
08:02:37:INFO: task lhsmtool_posix:8947 blocked for more than 120 seconds.
08:02:37:      Not tainted 2.6.32-431.20.3.el6.x86_64 #1
08:02:37:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
08:02:37:lhsmtool_posi D 0000000000000001     0  8947      1 0x00000080
08:02:39: ffff88005babfb98 0000000000000086 0000000000000000 ffffffff8100988e
08:02:40: ffff88007b48c0b8 0000000000000000 0000000001abfb58 ffff8800023143c0
08:02:40: ffff88007bb02638 ffff88005babffd8 000000000000fbc8 ffff88007bb02638
08:02:40:Call Trace:
08:02:41: [<ffffffff8100988e>] ? __switch_to+0x26e/0x320
08:02:41: [<ffffffff815296d5>] schedule_timeout+0x215/0x2e0
08:02:42: [<ffffffff81529353>] wait_for_common+0x123/0x180
08:02:42: [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
08:02:42: [<ffffffffa0adaf20>] ? osc_async_upcall+0x0/0x20 [osc]
08:02:43: [<ffffffff8152946d>] wait_for_completion+0x1d/0x20
08:02:43: [<ffffffffa0adaefc>] osc_io_fsync_end+0x8c/0xb0 [osc]
08:02:44: [<ffffffffa054f1f0>] cl_io_end+0x60/0x150 [obdclass]
08:02:45: [<ffffffffa09126c1>] lov_io_end_wrapper+0xf1/0x100 [lov]
08:02:45: [<ffffffffa09157d4>] lov_io_fsync_end+0x84/0x1e0 [lov]
08:02:45: [<ffffffffa054f1f0>] cl_io_end+0x60/0x150 [obdclass]
08:02:46: [<ffffffffa0553f72>] cl_io_loop+0xc2/0x1b0 [obdclass]
08:02:47: [<ffffffffa099f89b>] cl_sync_file_range+0x31b/0x500 [lustre]
08:02:47: [<ffffffffa09a0a2f>] ll_fsync+0x64f/0x7f0 [lustre]
08:02:48: [<ffffffff811bb2b1>] vfs_fsync_range+0xa1/0x100
08:02:48: [<ffffffff811bb37d>] vfs_fsync+0x1d/0x20
08:02:49: [<ffffffff811bb3be>] do_fsync+0x3e/0x60
08:02:50: [<ffffffff811bb410>] sys_fsync+0x10/0x20
08:02:50: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b


 Comments   
Comment by Andreas Dilger [ 16/Jul/14 ]

This test failure only happened once on an unlanded version of the http://review.whamcloud.com/6672 patch.

Comment by Andreas Dilger [ 14/Nov/14 ]

This was fixed as part of the http://review.whamcloud.com/6672 patch landing.

Generated at Sat Feb 10 01:50:49 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.