[LU-1753] Test failure on test suite sanity, subtest test_118i Created: 15/Aug/12  Updated: 07/Jan/16  Resolved: 07/Jan/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.3.0, Lustre 2.4.0
Fix Version/s: Lustre 2.4.0

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Hongchao Zhang
Resolution: Won't Fix Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 4075

 Description   

This issue was created by maloo for bobijam <bobijam@whamcloud.com>

This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/d1a72d3c-d84d-11e1-b8ea-52540035b04c.

The sub-test test_118i failed with the following error:

got error, but should be not, rc=5

Info required for matching: sanity 118i



 Comments   
Comment by Zhenyu Xu [ 15/Aug/12 ]

another hit at https://maloo.whamcloud.com/test_sets/63bf779a-e719-11e1-ac43-52540035b04c

Comment by Andreas Dilger [ 20/Aug/12 ]

18:15:57:Lustre: DEBUG MARKER: == sanity test 118i: Fix error before timeout in recoverable error ============ 18:15:55 (1345338955)
18:16:08:LustreError: 11-0: an error occurred while communicating with 10.10.4.183@tcp. The ost_write operation failed with -5
18:16:08:LustreError: Skipped 4 previous similar messages
18:16:08:LustreError: 2970:0:(osc_request.c:1926:brw_interpret()) lustre-OST0000-osc-ffff880077362400: too many resent retries for object: 941:0, rc = -5.
18:16:11:Lustre: DEBUG MARKER: sanity test_118i: @@@@@@ FAIL: got error, but should be not, rc=5

Comment by Jian Yu [ 26/Aug/12 ]

Another instance:
https://maloo.whamcloud.com/test_sets/42417a68-ef8f-11e1-bdf7-52540035b04c

Comment by nasf (Inactive) [ 29/Aug/12 ]

Another failure instance:
https://maloo.whamcloud.com/test_sets/284e4e2a-f1d7-11e1-87d6-52540035b04c

Comment by Andreas Dilger [ 30/Aug/12 ]

This is failing fairly regularly, bumping priority.

Comment by nasf (Inactive) [ 01/Sep/12 ]

Another failure:

https://maloo.whamcloud.com/test_sets/c093ca76-f49c-11e1-b3b2-52540035b04c

Comment by Li Wei (Inactive) [ 16/Sep/12 ]

https://maloo.whamcloud.com/test_sets/97853650-fe7a-11e1-b4cd-52540035b04c

Comment by Jian Yu [ 18/Sep/12 ]

Another instance:
https://maloo.whamcloud.com/test_sets/aebdde24-01f6-11e2-bc4e-52540035b04c

Comment by Jian Yu [ 19/Sep/12 ]

Another instance:
https://maloo.whamcloud.com/test_sets/7219f60c-020e-11e2-ab94-52540035b04c

Comment by Peter Jones [ 19/Sep/12 ]

Hongchao

Could you please look into this one?

Thanks

Peter

Comment by Ian Colle (Inactive) [ 19/Sep/12 ]

https://maloo.whamcloud.com/test_sets/807ed662-0273-11e2-ab94-52540035b04c

Comment by Hongchao Zhang [ 20/Sep/12 ]

this issue is caused by the failure of "fail_loc" reset,

e.g.
https://maloo.whamcloud.com/test_sets/807ed662-0273-11e2-ab94-52540035b04c

in debug log,

test started at 1348039162
fail_loc set to OBD_FAIL_OST_BRW_WRITE_BULK at 1348039163
first ost_write failed for this fail_loc at 1348039168
fail_loc reset to 0 at 1348039173
last ost_write failed for this fail_loc at 1348039178
test failed at 1348039179

but in console log, the fail_loc is reset at 1348039179

00:19:28:Lustre: DEBUG MARKER: lctl set_param fail_loc=0x20e
00:19:29:Lustre: *** cfs_fail_loc=20e, val=0***
00:19:40:Lustre: DEBUG MARKER: lctl set_param fail_loc=0
00:19:40:Lustre: DEBUG MARKER: /usr/sbin/lctl mark sanity test_118i: @@@@@@ FAIL: got error, but should be not, rc=5

is it related to the test framework?

Comment by Chris Gearing (Inactive) [ 20/Sep/12 ]

When you say test-framework do you mean the test-infrastructure that is Autotest/Maloo/Toro or the test-framework that forms part of Lustre itself.

If the later, which I suspect you mean, then it should be fixed as part of Lustre.

Comment by Hongchao Zhang [ 21/Sep/12 ]

the patch is tracked at http://review.whamcloud.com/#change,4071

Comment by Peter Jones [ 22/Sep/12 ]

Dropping priority as this is known to be a test issue only. We'll land the test correction to master and also to b2_3 if we have another RC

Comment by Li Wei (Inactive) [ 24/Sep/12 ]

https://maloo.whamcloud.com/test_sets/c03bd93c-042e-11e2-aec7-52540035b04c

Comment by John Fuchs-Chesney (Inactive) [ 07/Jan/16 ]

Test issue only.
~ jfc.

Generated at Sat Feb 10 01:19:23 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.