[LU-764] test marked as fail but all subtest pass Created: 23/Sep/11  Updated: 21/Mar/13  Resolved: 21/Mar/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Minh Diep Assignee: Minh Diep
Resolution: Not a Bug Votes: 0
Labels: None
Environment:

Lustre Clients:
Tag: v2_1_0_0_RC2
Distro/Arch: Sles11/x86_64 (kernel version: 2.6.32.43-0.4-default)
Build: https://build.whamcloud.com/job/lustre-master/arch=x86_64,build_type=client,distro=sles11,ib_stack=inkernel/
Network: TCP
ENABLE_QUOTA=yes

Lustre Servers:
Tag: v2_1_0_0_RC2
Distro/Arch: RHEL6/x86_64 (kernel version: 2.6.32-131.6.1.el6)
Build: https://build.whamcloud.com/job/lustre-master/283/arch=x86_64,build_type=cserver,distro=el6,ib_stack=inkernel/
Network: TCP


Severity: 3
Rank (Obsolete): 2191

 Description   

https://maloo.whamcloud.com/test_sets/657179ce-e58c-11e0-9909-52540025f9af

All subtests are PASS but the test is marked as FAIL



 Comments   
Comment by Chris Gearing (Inactive) [ 14/Oct/11 ]

We traced this down to the frame-work in Lustre, it writes the incorrect info to results.yml

Comment by Minh Diep [ 20/Oct/11 ]

I have checked recent reports for this test and haven't seen a failure

Comment by Minh Diep [ 20/Oct/11 ]

actually, ost_pools test has 30 to 31 subtest and this report show only 22 subtests and duration is over 3600s. I believe that the test timed out by autotest but did not marked as TIMEOUT.

Chris,

I think you have worked or increased the timeout value. I think this bug can be closed. what do you think?

Comment by Minh Diep [ 16/Nov/11 ]

Haven't seen this in a while, reopen if reproducible

Comment by Keith Mannthey (Inactive) [ 11/Dec/12 ]

I am seeing this with a b1_8 patch.

https://maloo.whamcloud.com/test_sessions/67735464-3c51-11e2-82b5-52540035b04c

recovery-small shows all green but report fail. It looks like there was an initial failure that somehow cascaded to tests passing but then being marked failed.

Comment by Minh Diep [ 12/Dec/12 ]

The report shows that all subtest passed which is correct. However, at the end of the test, there is a cleanup which failed. I think this results in the entire testsuite being marked failed. It's debatable if this is an error/bug because obviously a failure occur but not in the subtest. I think this is a good/correct behavior.

Comment by Keith Mannthey (Inactive) [ 13/Dec/12 ]

I agree if there is an error in anyway it should be marked as failed. We don't want to ignore errors.

Should the subtest have a way to mark "cleanup failed"?

Comment by Minh Diep [ 13/Dec/12 ]

Unfortunately the way each subtest is run, and the clean up is outside of the subtest, I don't think we can mark it failed in the subtest level

Comment by Keith Mannthey (Inactive) [ 11/Mar/13 ]

I have seen a bit of this over the last week or so on master. This is a good example.

https://maloo.whamcloud.com/test_sessions/bf361f32-8919-11e2-b643-52540035b04c

There was one real error at the very end end of this test. Other than that all subtests "passed" even though 3 whole sections are marked FAILED. What logs do you look at to see the cleanup issues previously mentioned? How can we tell if it is a problem with the patch or some autotest anomaly?

Comment by Keith Mannthey (Inactive) [ 11/Mar/13 ]

Starting to see this a bit on current master.

Comment by Minh Diep [ 21/Mar/13 ]

I looked at the log above, sanity actually ran much longer but maloo only show a few hundreds of second
== sanity test complete, duration 2584 sec == 14:03:39 (1362866619)

I don't think this is the same problem. Please file a TT ticket.

Generated at Sat Feb 10 01:10:11 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.