[LU-7284] test results marked failed although all sub test pass Created: 03/Sep/15  Updated: 14/Dec/21  Resolved: 14/Dec/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Frank Heckes (Inactive) Assignee: WC Triage
Resolution: Cannot Reproduce Votes: 0
Labels: triage
Environment:

shaodow


Story Points: 0.5
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

The overall status of test script has marked failed although all sub test passed:
https://testing.hpdd.intel.com/test_sets/bfdb5482-508d-11e5-95a9-5254006e85c2



 Comments   
Comment by Lee Ochoa [ 03/Sep/15 ]

Frank, looked at the results.yml file and found...

        name: ost-pools
        description: auster ost-pools
        submission: Tue Sep  1 06:19:40 UTC 2015
        report_version: 2
        SubTests:
        -
            name: test_1a
            status: PASS
            duration: 22
            return_code: 0
            error:
.
.
.
        -
            name: test_26
            status: PASS
            duration: 54
            return_code: 0
            error:
        duration: 2173
        status: FAIL

So despite all the tests passing, the test was marked failed in the results file, not sure why. I've updated the ticket's component to autotest, though I'm not sure it was AT's issue either, I just know it's further upstream than maloo

Comment by Jodi Levi (Inactive) [ 04/Sep/15 ]

We will pull this ticket into the next sprint to complete analysis to determine how often this happening and prioritize the effort to fix.

Comment by Charlie Olmstead [ 22/Sep/15 ]

The lustre test framework is responsible for the results.yml file. It might be a correct failure as there was a failure in cleaning up the ost-pools test:

06:55:47:Starting test test_complete in suite ost-pools timeout at 1441094147
06:55:47:== ost-pools test complete, duration 2160 sec == 06:55:40 (1441090540)
06:55:47:CMD: shadow-35vm3 lctl pool_list lustre
06:55:47:Pools from lustre:
06:55:47:rm: cannot remove `/mnt/lustre/d405.sanity-hsm/striped_dir': Directory not empty
06:55:47:rm: cannot remove `/mnt/lustre/d500.sanity-hsm': Directory not empty
06:55:47: ost-pools : @@@@@@ FAIL: remove sub-test dirs failed
06:55:47: Trace dump:
06:55:47: = /usr/lib64/lustre/tests/test-framework.sh:4748:error_noexit()
06:55:47: = /usr/lib64/lustre/tests/test-framework.sh:4779:error()
06:55:47: = /usr/lib64/lustre/tests/test-framework.sh:4293:check_and_cleanup_lustre()
06:55:47: = /usr/lib64/lustre/tests/ost-pools.sh:1574:main()
06:55:47:Dumping lctl log to /logdir/test_logs/2015-09-01/lustre-ppc-el6_6-x86_64-vs-lustre-ppc-el6_6-x86_64_ppc64-review-dne-part-2-1_2_1_191_-70079184457860-005147/ost-pools..*.1441090543.log
06:55:47:CMD: shadow-31.shadow.whamcloud.com,shadow-35vm2,shadow-35vm3,shadow-35vm4,shadow-35vm5,shadow-35vm6,shadow-35vm7 /usr/sbin/lctl dk > /logdir/test_logs/2015-09-01/lustre-ppc-el6_6-x86_64-vs-lustre-ppc-el6_6-x86_64_ppc64-review-dne-part-2-1_2_1_191_-70079184457860-005147/ost-pools..debug_log.\$(hostname -s).1441090543.log;
06:55:47: dmesg > /logdir/test_logs/2015-09-01/lustre-ppc-el6_6-x86_64-vs-lustre-ppc-el6_6-x86_64_ppc64-review-dne-part-2-1_2_1_191_-70079184457860-005147/ost-pools..dmesg.\$(hostname -s).1441090543.log
06:55:59:ost-pools returned 0

Comment by Frank Heckes (Inactive) [ 25/Sep/15 ]

Okay. I'll check the script and error and will eventually convert the ticket into an LU or LDEV ticket.

Comment by Nathaniel Clark [ 09/Oct/15 ]

I've just run into this here:
https://testing.hpdd.intel.com/test_sets/0273cf04-6d4e-11e5-bf10-5254006e85c2

Comment by Frank Heckes (Inactive) [ 12/Oct/15 ]

For ppc test and example Nathaniel presented, auster returns 0 for all subtests, but clean-up fails at the end of the test suite.
Although all information are there, it's cumbersome to dig into the suite log/stdout files for these events.
It's not sure whether the situation has (should) to be handled in autotest. I think it's a problem in the Lustre test suite which should add 'test' (name == 'cleanup') that could pass, fail or timeout. I'll discuss the problem with Charlie.

Comment by Frank Heckes (Inactive) [ 12/Oct/15 ]

The problem is that 'lnet-selftest' in case of https://testing.hpdd.intel.com/test_sets/0273cf04-6d4e-11e5-bf10-5254006e85c2 and
'ost-pools' for session https://testing.hpdd.intel.com/test_sets/bfdb5482-508d-11e5-95a9-5254006e85c2 contain clean-up code that will
cause auster to fail (aka exit with status != 0) if an operation fails. These clean-up commands should be bundled in a test (cleanup would
be a good name) that could FAIL|TIMEOUT|PASS|SKIP. This would make the failure immediately transparent to any user. Otherwise
the clean-up acts like a 'hidden' test for the suite.
We think this is a design flaw of the Lustre test suites especially those mentioned explicitly in the ticket and should be solved within the Lustre
(internal) test framework.

Comment by Emoly Liu [ 08/Feb/18 ]

+1 on master https://testing.hpdd.intel.com/test_sets/e897ff26-0c90-11e8-a6ad-52540065bddc

Generated at Sat Feb 10 02:07:34 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.