[LU-13773] sanity-dom test failure not triggering test suite marked as FAIL Created: 09/Jul/20  Updated: 19/Apr/22

Status: In Progress
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.13.0, Lustre 2.14.0, Lustre 2.12.5
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: James Nunez (Inactive) Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: tests

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

We see, in LU-13759, that a sanityn test that is run as part of sanity-dom can fail and the failure is not propagated to a level that Maloo recognizes it as a failure. There are several problems that are also seen with sanity-dom logs; logs for the failed tests are not collected and client log for successful tests are not collected. Note that this seems like an issue with the test framework and not with Maloo.

For failure at https://testing.whamcloud.com/test_sets/08c6fa9d-e2a3-457b-a1ed-b4318dbf166a, we can see that the sanity-dom/sanityn test 20 failure is recognized, but this does not lead to the sanity-dom test suite being marked at failed. In the results file at https://testing.whamcloud.com/test_sessions/b31ca578-dd4f-4725-9a0c-5e19f3031c69/show_results, we see that the fail is registered, but

    -   name: test_sanityn
    -   name: test_1
        status: PASS
        duration: 4
        return_code: 0
        error: 
…
    -   name: test_19
        status: SKIP
        duration: 2
        return_code: 0
        error: not cache-capable obdfilter
    -   name: test_20
        status: FAIL
        duration: 5
        return_code: 1
        error: 1 page left in cache after lock cancel
    -   name: test_23
        status: PASS
        duration: 65
        return_code: 0
        error: 
…
    -   name: test_51d
        status: PASS
        duration: 435
        return_code: 0
        error: 
    duration: 1129
    status: PASS

We had a related issue where, when a sanity-dom/sanityn test failed, the failure would trigger the whole test suite to fail, but Maloo thinks the last test in sanityn failed which is false; see LU-10589 and one example failure at https://testing.whamcloud.com/test_sets/870c7a78-467f-11e9-9646-52540065bddc. In this case it looks like all logs are not collected.

Patrick Farrell produced a patch which clears the problem with a sub test of a sub suite failure not triggering the suite to be marked as fail at https://review.whamcloud.com/#/c/34186/ . I think the no log collected/displayed issue still exists with this proposed solution.



 Comments   
Comment by Gerrit Updater [ 16/Jul/20 ]

James Nunez (jnunez@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/39409
Subject: LU-13773 tests: testing failure of subscripts
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 7be9f991dc4221dab5df0d2a01a1a863816b9825

Comment by James Nunez (Inactive) [ 24/Jul/20 ]

The patch https://review.whamcloud.com/39409 should fix the problem where sanity and/or sanityn will fail, but sanity-dom does not recognize that the sub-test suite test failed and, thus, Maloo does not recognize that a sub-test suite test failed.

There is one other issue that needs to be corrected; test logs for the sub-test suites tests are not correctly associated with the test results. A solution that Maloo can work with out of the box is to preface the log files with <suite name>.<sub-testsuite_name>_test<test_number>. ... .and make the same change to the results.yml file.

Comment by Gerrit Updater [ 30/Jul/20 ]

James Nunez (jnunez@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/39552
Subject: LU-13773 tests: use TESTLOG_PREFIX in run_one_logged
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 051ace294af11fb1da6241f04471deb092f90cd8

Comment by Gerrit Updater [ 06/Aug/20 ]

James Nunez (jnunez@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/39591
Subject: LU-13773 tests: modify test and log names
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 5366881770fde7fd7cc244b0eff258f316388797

Comment by Gerrit Updater [ 13/Aug/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39552/
Subject: LU-13773 tests: use TESTLOG_PREFIX in run_one_logged
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: cf6ac632a18c3acb3534cd07b717673997a58515

Comment by Gerrit Updater [ 01/Sep/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39409/
Subject: LU-13773 tests: subscript failure propagation
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: e2cb43c409b98b97fdb84a678a93651d2176b242

Generated at Sat Feb 10 03:04:05 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.