Details
-
Improvement
-
Resolution: Unresolved
-
Minor
-
None
-
3
-
9223372036854775807
Description
Currently the Lustre test-framework runs the test scripts based on the version installed on the client1 node. When testing version interop, there are a number of subtests that need to be excluded because they depend on functionality that is not provided by the server, or are unit tests for bugs that have not yet been fixed on the older server version.
In order to exclude a subtest from being run, an explicit version check is needed in each affected subtest to check the Lustre version currently running on the MDS or OSS. This is normally handled by adding a "skip" check in the subtest based on the MDS or OSS version. This is practical to do when the client is running the newer version of the test script and it should skip the subtest for an older MDS or OSS version, but it is not possible to do this when the client is running an older/tagged version of the test scripts that depend on functionality removed or changed on a newer server, or if that subtest missed the "skip" check when it was written, since the old test script cannot be changed retroactively.
It would be convenient to have a mechanism where the client loads a list of test exclusions from the MDS and OSS at the start of each test script (e.g. in init_test_env()) before the list of tests is generated. That would allow a central location to exclude subtests that should not be run in interop mode, and allow "retroactively" excluding subtests that older clients should not be running.
Of course, this feature cannot be used by already-released client versions (e.g. 2.12.9), but it would at least be possible to include the functionality into current branches (e.g. b2_15, b2_16 (soon), master), so that it is possible to use this functionality for interop testing in future releases.
In terms of the files, a reasonable option would be to have a single file for each test script, like:
lustre/tests/except/sanity.ex lustre/tests/except/sanityn.ex lustre/tests/except/sanity-quota.ex :
so that each script would look for "$LUSTRE/tests/except/$TESTSUITE.ex" and process that at startup. It would probably be easiest if this was in a format that could be passed directly to always_except, but also be usable for other test runners in the future.
Allowing a comparator like "<" or ">=" to specified along with the client version is relatively easily added at this stage, and it should be done now in case of future need (though I can't currently think of a reason to skip a test on a newer client that couldn't have been added into the test itself at that point). This suggests a file format like:
lustre/tests/except/sanity-sec.ex #facet op need_version ticket subtests client >= 2.15.55.156 LU-15675 27a client >= 2.15.55.156 LU-17340 31
lustre/tests/except/sanity.ex #op need_version ticket subtests client >= 2.15.55.114 LU-13706 119d
and this would be processed in init_test_env() something like the following:
local nodes=$(facets_nodes mds1 ost1) local exceptions="$LUSTRE/tests/except/$TESTSUITE.*ex" while facet read op need_version ticket subtests; do local ver=${facet^^*}_VERSION [[ "$facet" =~ "#" ]] && continue (( ${!ver} $op $(version_code $need_version) )) || { echo "need $facet $op $need_version to run '$subtests' due to $ticket" always_except $ticket $subtests } done < <(do_nodes $nodes "cat $exceptions 2> /dev/null | sort -u"
Should there be a more complex mechanism to also allow skip newer/older MDS/OSS versions that the client would use locally with its own skip directives? However, for that to work it would presumably need the MDS/OSS to report that they must be running a version newer/older than what is specified in the file, which is (likely) a paradox since they are not running that version in the first place and cannot possibly describe that the test should be skipped. That needs to be left up to the client's skip directives.
Attachments
Issue Links
- is related to
-
LU-18311 interop: sanity test_312: FAIL: blksz error, actual 4096, expected: 2 * 1 * 4096
- Open
-
LU-17331 conf-sanity test_30b: FAIL: check lustre-OST0000.failover.node failed!
- Resolved
- is related to
-
LU-18355 rolling-downgrade-client2 sanity test_56ei: Did not find any entries with expected projid 1234
- Open
-
LU-13081 Interop master <-> 2.12: sanity test_151 and test_156 fail
- Resolved
-
LU-15695 dt_ladvise(): ASSERTION( dt->do_body_ops->dbo_ladvise )
- Resolved
-
LU-17265 sanity test_39r: atime on client 1699192823 != ost 0x65479ff6
- Resolved
-
LU-18341 interop: sanity-flr test_36: FAIL: write finished before layout version is transmitted
- Resolved