[LU-11520] separate tests and bugfixes Created: 15/Oct/18  Updated: 25/Apr/19

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Minor
Reporter: Vitaly Fertman Assignee: Elena Gryaznova
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Related
is related to LU-11642 Data lost after migrate striped dir Resolved
Rank (Obsolete): 9223372036854775807

 Description   

I already made this proposal several times that landing bugfixes in series of 2 patches of a test and a fix on top on the test is a good idea because:

 

  • it explicitly shows you the test actually reproduces a problem, not because we trust a developer;
  • you can revert only a fix at some later point and confirm the test still able to reproduce some problem;
     
    however there were several objections:
  • it will not work with git bisect;
  • malloo does not (and not supposed to) set +1 if test failure occurs;
     
    The proposal which seems to resolve these objections is to add a tag/label “mustfail” to the commit message of the test patch and the malloo is supposed to consider this particular failure as a success (as well as the successful test as a failure) and to set +1 (-1). at the same time the fix sitting on top already does not have this tag/label and therefore the test already must pass.
     
    the proposal was discussed with Andreas o the last LAD and we agreed that it would work, thus filing the ticket for it.


 Comments   
Comment by Andreas Dilger [ 23/Oct/18 ]

I think this is fairly reasonable, and would work for a majority of cases, except where the test causes a node to crash or otherwise hang and not complete the test.

Could you submit a patch pair as an example to show how this will work? I don't think there should be anything needed except e.g. Test-Parameters: envdefinitions=CONF_SANITY_MUSTFAIL=32c or similar, and machinery in test-framework to make this happen. It would need to be specific to each test script, to avoid the eventual future case where there are two different tests with the same number that have the mustfail functionality.

Comment by Alexander Lezhoev [ 24/Oct/18 ]

Probably, something like envdefinitions=MUSTFAIL=conf-sanity/32c would be more handy. This construction can be unwrapped into internal variable like c MUSTFAIL_conf_sanity_32c to be checked before test run.

Comment by Vitaly Fertman [ 06/Nov/18 ]

I think this is fairly reasonable, and would work for a majority of cases, except where the test causes a node to crash or otherwise hang and not complete the test.

why? the test system while stopping the system, gathering the test results, is supposed to check if it is marked as failing and report the success.

Comment by Andreas Dilger [ 06/Nov/18 ]

except where the test causes a node to crash or otherwise hang and not complete the test.

The problem is that if the client or server node crashes then the rest of that test script will be aborted. Also, the pass/fail decision is made at the test script level , there is no mechanism (AFAIK) for the external autotest environment to understand whether this specific test failure should actually be considered a pass.

In the meantime, I think we can make progress with this and handle the majority of patches which do not induce a crash on a client or server node, and further improvements can be implemented at a later time to handle the whole-node failure case.

Comment by Vitaly Fertman [ 06/Nov/18 ]

the full solution seems to introduce this logic at the external level to cover those cases as well, and the t-f solution will be useless or at least redundant.

could we estimate the amount of work needed to introduce it at the autotest environment level now, it is probably worth to follow this way immediately.

Comment by Peter Jones [ 07/Nov/18 ]

..and by "immediately" you mean "at the start of the 2.13 release cycle", right?

Comment by Vitaly Fertman [ 07/Nov/18 ]

I meant just a technical approach, not planning. so no milestones

Comment by Peter Jones [ 07/Nov/18 ]

He ok - thanks for confirming

Comment by Andreas Dilger [ 13/Dec/18 ]

As discussed on this week's LWG call, it makes sense to implement the simple change to parse the MUSTFAIL parameter in build_test_filter() first (either as a simple list of test numbers similar to ONLY, EXCEPT, and ALWAYS_EXCEPT, or including the test name as diff suggests like conf-sanity/32a) to set MUSTFAIL_<test_number> for that script, and then checking the result in run_one_logged() or maybe pass() to consider a failure of this subtest as PASS.

Because we run tests in multiple different configurations (e.g. ldiskfs vs. ZFS, DNE vs. none, Ubuntu vs. RHEL) and the test failures may not be applicable to all configurations, I don't think we can require that the test fails with MUSTFAIL set in order for it to be considered a pass. I guess one option would be to add a similar SHOULDFAIL parameter that allows a conditional failure, while MUSTFAIL should fail in all test configurations?

Implementing this as part of the autotest framework will take more time, and since many failures are not crash/timeout failures we can still get benefits from having this support in test-framework.sh.

Comment by Elena Gryaznova [ 28/Jan/19 ]

MUSTFAIL can be implemented via parameter in build_test_filter(), yes (this also needs the parameter like EXPECTED_FAILURE_IS, which contains for example the particular string in stdout or in nodes logs). But this implementation does cover the cases like crashes or timeouts because t-f does not handle such cases atall and such cases can be tracked only on automation side (not t-f).

So, I do not think that it is reasonable to split such feature into 2 parts: t-f implementation and automation implementation.

Generated at Sat Feb 10 02:44:35 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.