[LU-4457] Interop 2.5.0<->2.6 failure on test suite sanity test_118d: Multiop failed to block on fsync Created: 08/Jan/14  Updated: 21/Jan/22  Resolved: 21/Jan/22

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.6.0, Lustre 2.7.0, Lustre 2.8.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Cannot Reproduce Votes: 0
Labels: None
Environment:

server: lustre-master build # 1823 RHEL6 ldiskfs
client: 2.5.0


Issue Links:
Duplicate
is duplicated by LU-4480 cfs_fail_timeout id 214 sleeping for... Closed
Severity: 3
Rank (Obsolete): 12220

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/ff6f9f86-77ef-11e3-a6a3-52540035b04c.

The sub-test test_118d failed with the following error:

Multiop failed to block on fsync, pid=26893

test log shows:

== sanity test 118d: Fsync validation inject a delay of the bulk ============ 15:07:14 (1389049634)
7+0 records in
7+0 records out
458752 bytes (459 kB) copied, 0.0035145 s, 131 MB/s
CMD: client-21-ib lctl set_param fail_loc=0x214
fail_loc=0x214
 sanity test_118d: @@@@@@ FAIL: Multiop failed to block on fsync, pid=26893 


 Comments   
Comment by Sarah Liu [ 08/Jan/14 ]

OST console

15:07:43:Lustre: DEBUG MARKER: == sanity test 118d: Fsync validation inject a delay of the bulk ============ 15:07:14 (1389049634)
15:07:43:Lustre: DEBUG MARKER: lctl set_param fail_loc=0x214
15:07:44:LustreError: 8057:0:(fail.c:133:__cfs_fail_timeout_set()) cfs_fail_timeout id 214 sleeping for -1000ms
15:07:44:schedule_timeout: wrong timeout value fffffffffffffc18
15:07:44:Pid: 8057, comm: ll_ost_io00_039 Not tainted 2.6.32-358.23.2.el6_lustre.x86_64 #1
15:07:45:Call Trace:
15:07:45: [<ffffffff8150f529>] ? schedule_timeout+0x2c9/0x2e0
15:07:45: [<ffffffffa04b7921>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
15:07:45: [<ffffffffa04b2d0f>] ? __cfs_fail_timeout_set+0xcf/0x150 [libcfs]
15:07:46: [<ffffffffa0887ec9>] ? cfs_fail_timeout_set.clone.2+0x29/0x30 [ptlrpc]
15:07:46: [<ffffffffa088b94b>] ? tgt_brw_write+0x34b/0x1550 [ptlrpc]
15:07:47: [<ffffffffa04b7921>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
15:07:47: [<ffffffffa088dfea>] ? tgt_handle_request0+0x2ea/0x1490 [ptlrpc]
15:07:47: [<ffffffffa04b7921>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
15:07:47: [<ffffffffa088f5ca>] ? tgt_request_handle+0x43a/0x980 [ptlrpc]
15:07:50: [<ffffffffa0842725>] ? ptlrpc_main+0xd25/0x1970 [ptlrpc]
15:07:50: [<ffffffffa0841a00>] ? ptlrpc_main+0x0/0x1970 [ptlrpc]
15:07:50: [<ffffffff81096a36>] ? kthread+0x96/0xa0
15:07:51: [<ffffffff8100c0ca>] ? child_rip+0xa/0x20
15:07:52: [<ffffffff810969a0>] ? kthread+0x0/0xa0
15:07:52: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
15:07:52:LustreError: 8057:0:(fail.c:137:__cfs_fail_timeout_set()) cfs_fail_timeout id 214 awake
15:07:52:Lustre: DEBUG MARKER: /usr/sbin/lctl mark  sanity test_118d: @@@@@@ FAIL: Multiop failed to block on fsync, pid=26893 
Comment by Sarah Liu [ 19/Nov/14 ]

still hit this error in interop testing between 2.5.3 client and 2.7 server, so reopen it:

https://testing.hpdd.intel.com/test_sets/f2a815bc-6ba5-11e4-88ff-5254006e85c2

Comment by Sarah Liu [ 13/Apr/15 ]

Hit this in interop testing between 2.5.3 client and 2.8 server:
https://testing.hpdd.intel.com/test_sets/b7a8f54a-dfb7-11e4-b5b0-5254006e85c2

Comment by Ashish Purkar (Inactive) [ 20/Jul/16 ]

Is this issue still seen with interop testing recently?

Generated at Sat Feb 10 01:42:54 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.