[LU-2524] Tests regressions: tests interrelation introduced. Created: 26/Dec/12  Updated: 09/Oct/21  Resolved: 14/Aug/16

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0, Lustre 2.6.0
Fix Version/s: None

Type: Bug Priority: Blocker
Reporter: Alexander Lezhoev Assignee: James Nunez (Inactive)
Resolution: Won't Fix Votes: 2
Labels: patch, tests
Environment:

1node virtual clustrer, Xperior framework.


Issue Links:
Duplicate
is duplicated by LU-2671 test_mkdir sometimes creates dirs nam... Resolved
is duplicated by LU-2992 conf-sanity 58: `/mnt/mds/OBJECTS': N... Resolved
is duplicated by LU-3687 several sanity tests are failed if ru... Resolved
Related
is related to LU-3514 sanity tests 27c, 77c, 77e, 77h are s... Closed
is related to LU-5047 sanity: test_72a removes all files fr... Resolved
is related to LU-4769 Test failure conf-sanity test_38: umo... Resolved
is related to LU-4536 sanity test_65ic Resolved
is related to LU-4564 Failure on test suite replay-single t... Resolved
is related to LU-4343 sanity-hsm test_228 failure: FAIL: ta... Resolved
Severity: 3
Rank (Obsolete): 5942

 Description   

Following tests depend on results of previously executed tests and cannot be executed separately. That's a regression from 2.2.

sanity.2a
sanity.3b
sanity.4b
sanity.5
sanity.6g
sanity.76
sanity.7a
sanity.7b
sanity.8
sanity.9
sanity.10
sanity.11
sanity.12
sanity.13
sanity.15
sanity.27i
sanity.51a
sanity.56a
sanity.101b

sanityn.1b
sanityn.1c

Typical log:
chmod: cannot access `/mnt/lustre/d0.sanity/d5/d2': No such file or directory
Can't lstat /mnt/lustre/d0.sanity/d5/d2: No such file or directory



 Comments   
Comment by Alexander Lezhoev [ 27/Dec/12 ]

sanity test_2a:
test_mkdir $DIR/$tdir failed if $DIR wasn't created before. test_mkdir -p must be used.

sanity test_3b:
if [ ! -d $DIR/$tdir ]; then
mkdir $DIR/$tdir
fi
failed if $DIR wasn't created before. mkdir -p must be used.

test_4b() {
if [ ! -d $DIR/$tdir ]; then
test_mkdir $DIR/$tdir
fi
failed if $DIR wasn't created before. mkdir -p must be used.

test_5() {
test_mkdir $DIR/$tdir
failed if $DIR wasn't created before. mkdir -p must be used.

...

test_56a() { # was test_56
rm -rf $DIR/$tdir
$SETSTRIPE -d $DIR
test_mkdir $DIR/$tdir
failed if $DIR wasn't created before. mkdir -p must be used. This test depends on test_52b

Comment by Andreas Dilger [ 28/Dec/12 ]

As you are able to test this most easily, please submit a patch that resolves these issues for your test framework.

Comment by Kyr Shatskyy (Inactive) [ 08/Jan/13 ]

Hi Andreas,
Could you please help to understand why such complicated logic required for so simple test scenario?

commit 13b269ab77daca78fdfc374986d4cc34c7e66309
Author: wangdi <di.wang@whamcloud.com>
Date: Wed Nov 14 12:02:43 2012 -0800

LU-1187 tests: Add test_mkdir in sanity for DNE.

1. Replace mkdir with test_mkdir, which will create local or
remote directory according to its testnum, i.e. choose the
mdt index by testnum % mdt_count, then create directory
by lfs mkdir.

Comment by Kyr Shatskyy (Inactive) [ 08/Jan/13 ]

The sanityn failings are related to Xyratex MRP-335, that we haven't yet landed to wc main stream.
The test sanity.76, sanity.101b look not related to the issue.

Note regarding all the rest failing tests. The problem is not that $DIR is not created, the actual problem is that
$tdir variable in fact isn't directory name, but piece of path, so because following instruction is used everywhere:

test_mkdir $DIR/$tdir

and even if the $DIR exists the $tdir can't be created, see test-framework.sh line:

export tdir=d0.${TESTSUITE}/d${base}

I think it make a confusion to assign path to "constant" variable with name dir.
In many places in tests scripts it is logically to assume that tdir is a dir name that is why it is easy to make a mistake. In general, when you type a command "test_mkdir $DIR/$tdir" you expect there will be only one operation.

Comment by Andreas Dilger [ 08/Jan/13 ]

One option is to have test_mkdir default to "mkdir -p" internally, which will ensure that the subdirectories are always created, but I don't know if this will have any side-effect for other tests. I expect it will not.

Another simple option is to default DIR=$MOUNT/d0.${TESTSUITE} then tdir=d.${base} and tfile=f.${testnum}. This would move all of the test files into a subdirectory, but it might also have some side-effects for tests that expect to be run in the root directory. The benefit would be that $tdir and $tfile are both "one level" names and "mkdir -p" is not needed in all of the callers, only when $DIR is first created.

Finally, just using test_mkdir -p $DIR/$tdir directly would have minimal complexity, but does not improve the situation for the future.

Comment by Kyr Shatskyy (Inactive) [ 08/Jan/13 ]

Making test_mkdir default to recursive creation of path may disallow for some specific scenario, that may be used further or used somewhere already, for example, we may want to check mkdir -p fails correctly while trying to create directory by not existing path.

Regarding second option, I wonder why the tdir must not be in the root directory? It can be for example:

tdir=d.${TESTSUITE}.${base} 

I still have a question why the test_mkdir function setting stripes in such a pseudo-random way:

test_nume % mdt_count

I'm not sure that this code because it make test scenario opaque. If we want to test striping then we need special purpose tests.

Comment by Andreas Dilger [ 08/Jan/13 ]

I think the original reason for using tdir=d0.${TESTSUITE} is that it puts the tests into a separate subdirectory for each test script (e.g. so that some existing "d0" with 100k files in it did not cause another test to fail during rm -rf $DIR/[Rdfs][0-9] at the start of sanity.sh and sanityn.sh). However, looking at this again, since this is deleting all d[0-9] directories at the start, it would also delete d0.replay-single and such, so I suspect the extra level of indirection is not helpful.

Looking though the git/cvs annotate commit history for test-framework.sh is always useful to find more information about why code is written in a particular way. This returns commit 9975e4daa, b=13979 and before that 067e8dc6, b=13798 has the original change (https://bugzilla.lustre.org/show_bug.cgi?id=13798#c21). It seems that the original goal was to add the test name into the directory name to make it unique between test scripts, and the implementation to put it in a subdirectory was not the important part.

I would like to comment also that tdir/tfile stuff seems needs some modification (not sure that it makes sense to discuss this here, please let me know if I have to file a new bug):

a) Currently $tdir $tfile stuff does not provide a possibility to work each test
on each own directory because sanity test_1 has the same $tdir as replay-single
test_1 and sanityN test_1, etc. If this $tdir stuff was added to t-f with the
goal to provide the unique dir for each test – then run_one should be changed to
export unique dir name contained $TESTSUITE at the name.

Please submit a patch that changes to tdir=d0.${TESTSUITE}.${base} and see whether it can pass testing or not. It could then also remove all of the "-p" arguments from tests that no longer need that, which will probably speed up the tests a small amount as well.

As for test_mkdir(), it was recently added for testing multiple MDTs in a variety of ways more easily for DNE. This is only pseudo-random, so it is definitely repeatable for the same test configuration (i.e. with the same mdt_count), and I don't think it will hurt anything.

Comment by Di Wang [ 08/Jan/13 ]

test_mkdir does not assume there will be single or multiple dirs needs to be created, if there are more than one dirs needs to be created, -p is definitely needed here, i.e. test_mkdir -p needs to be used. It seems some of the test_mkdir(in sanity.sh) misses -p indeed.

"I'm not sure that this code because it make test scenario opaque. If we want to test striping then we need special purpose tests."

If you want to test special striping, you probably need add special test for it, i.e. using lfs mkdir -i directly, instead of using test_mkdir.

Comment by Kyr Shatskyy (Inactive) [ 14/Jan/13 ]

Posted patch for review:
http://review.whamcloud.com/5022

Comment by Kyr Shatskyy (Inactive) [ 23/Jan/13 ]

Andreas,
after this fix we have following run_one function:

run_one() {
    local testnum=$1
    local message=$2
    tfile=f0.${TESTSUITE}.${testnum}
    export tdir=d0.${TESTSUITE}.${base}
    export TESTNAME=test_$testnum
    local SAVE_UMASK=`umask`
    umask 0022 

    banner "test $testnum: $message"
    test_${testnum} || error "test_$testnum failed with $?"
    cd $SAVE_PWD
    reset_fail_loc
    check_grant ${testnum} || error "check_grant $testnum failed with $?"
    check_catastrophe || error "LBUG/LASSERT detected"
    if [ "$PARALLEL" != "yes" ]; then 
        ps auxww | grep -v grep | grep -q multiop &&
            error "multiop still running"
    fi   
    unset TESTNAME
    unset tdir
    umask $SAVE_UMASK
    return 0
}

What is interesting for me is:

  1. Why tfile is not exported as tdir and TESTNAME are.
  2. Why tfile and tdir are lowercase if they are kind of global variables.
  3. The tdir has following value:
        export tdir=d0.${TESTSUITE}.${base}
    

    Where $base is numerical id "stripped of" the letters. In this case, for example, two tests 31a and 31b with the same base 31 will work within the same working directory and will mess their working stuff.

  4. Both tfile and tdir have "magic" prefixes: f0 and d0 correspondingly. Why not just assign values to them as follows:
    tfile=f${testnum}.${TESTSUITE}
    tdir=d${testnum}.${TESTSUITE}
    

    In this case the code

    rm -rf $DIR/[Rdfs][0-9]*

    should be still working and each test case will have its unique testing directory. Notice, this will cause many failures probably, because we didn't refactor all tests that are related between each other.

Comment by Andreas Dilger [ 24/Jan/13 ]

Kyr,

  • I'm not sure why $tfile is not exported. Perhaps because it was only thought to be relevant to the test being run under run_one()? In any case, I'm also not fond of the inconsistency here.
  • there is not much consistency for upper and lower case for environment variables in the tests. I'd prefer to just keep them lower case, since it would just case a LOT of churn in the tests for very little value if they were changed to uppercase.
  • I think the $tdir being the same subdir was done on purpose so that different subtests would operate on the same files, and only the first one would be set up independently. I think we have moved away from this to some extent, but some tests definitely depend on this behaviour. Other tests (e.g. "lfs find") use it to avoid having to recreate the test files for each subtest. In any case, it cannot be changed until after all such tests are fixed.
  • I'm happy if you want to remove the magic prefix and instead go to only having the $TESTNAME suffix. I don't think this will cause many failures, definitely fewer than changing $tdir to be unique per subtest.

Since each of these changes is potentially large, please submit them as separate patches as needed. I think the #3 change could be made first as part of your current patch, then the "-p" removal, then at the end the $tdir separation, and ideally a separate patch fixing any remaining tests that hard-code the dir/file names instead of using $tdir and $tfile.

Comment by James Nunez (Inactive) [ 31/Jul/13 ]

Similar effort to combine tests that depend on each other: LU-3514

Comment by James Nunez (Inactive) [ 01/Aug/13 ]

Although many of the tests and test numbers have changed, there are still interdependent many tests in out test scripts. In sanity, the following tests depend on results of previously executed tests and will fail when executed separately: 6g, 7a, 7b, 8, 9, 10, 11, 12, 13, 15, 16, 24a, 24b, 24c, 24d, 56a. In many tests, the results of test_mkdir are not checked and some of these tests would fail on the mkdir instead of on a later call if a check on mkdir was in place. For example sanity tests 14, 50, 72b.

I’d like to move this patch/work ahead by submitting the following series of patches that, I think, accomplish what is done in the current patch and in the comments posted here.

Patch 1 Change $tdir to a single directory with the proposed change to the name format d0.${TESTSUITE}/d${base} to d${testnum}.${TESTSUITE}. This makes the “-p” option for (test_)mkdir unnecessary in most cases. Thus, we can remove the “-p” option where possible and add the “-p” option where it is more efficient as in tests 24r, s, t, 31i and 31j. Also, check return codes of test_mkdir and have an useful error message for all calls to error().

Patch 2: Where possible, change test file names and directories to $tfile and $tdir, respectively. Modify tfile with the proposed change to name format ; f.${TESTSUITE}.${testnum} to f${testnum}.${TESTSUITE} . Also, export the $tfile variable.

Patch 3: Combine tests that are directly dependent on each other. For example sanity tests 26d and 26e, 25a and 25b, 34a and 34c, 34a and 34c, 34a and 34c, sanityn 1b and 1c.

I am open to splitting or combining the above proposed patches or modifications to the work done in them if someone feels strongly about it.

Comment by Andreas Dilger [ 01/Aug/13 ]

James, I'm all in favour of this. It needs someone to move the existing patch forward.

Comment by James Nunez (Inactive) [ 07/Aug/13 ]

Patch #1 posted for review http://review.whamcloud.com/#/c/5022

Comment by James Nunez (Inactive) [ 31/Oct/13 ]

Since the patch above was large and touched so many test scripts, I've broken it up in the hopes that autotest will pass and the patch can be landed. There will be a small series of patches with each one modifying a small set of test scripts instead of trying to modify all the test scripts at once. The first patch makes the proposed modifications to $tdir and $tfile and makes the proposed modifications to conf-sanity; remove the "-p" on mkdir when possible, add error messages to calls to error and changes some calls to return with calls to error. Once this patch lands, all the other test scripts can be modified with these changes (improvements?):

http://review.whamcloud.com/#/c/8123/

Comment by James Nunez (Inactive) [ 13/Jan/14 ]

The modification to $tdir and $tfile in patch http://review.whamcloud.com/#/c/8123/ have landed to master.

I'm leaving this ticket open since there is more work to be done that is described here.

Comment by Oleg Drokin [ 25/Jan/14 ]

The patch 8123 introduced a bunch of regressions for tests that depend on another test (the so called subseris - with letters.)

The poster example is sanity test_51b and test_51ba
In the past tdir would be

export tdir=d0.${TESTSUITE}/d${base}

with the new patch it is

export tdir=d${testnum}.${TESTSUITE}

The test_51b crates a huge bunch of dirs and then test_51ba used to delete all of them, but not anymore as they live in separate dirs now.
There are more tests like this.
The end result is (small test) MDS fills up real fast early on and then tests start to fail for no good reason.

This needs to be addressed ASAP!

Comment by Andreas Dilger [ 25/Jan/14 ]

It might be enough to change tdir to use "d$base" instead of "d$testnum". That would get it back to the original config where the subtests shared the same dirs. I don't know if that is good or bad, probably a but if both.

Comment by James Nunez (Inactive) [ 27/Jan/14 ]

Since sanity and other tests have several subtests that are dependent on each other, these subtests cannot be run separately. If they cannot be run alone, then it makes sense to combine the subtests. One of the past versions of this patch combined dependent subtests. So, instead of changing the tdir variable, shouldn't we either combine dependent tests or make them not depend on each other?

EDIT: If combining tests is not desirable, then any dependent subtests need to check that the previous test was run.

Comment by Oleg Drokin [ 27/Jan/14 ]

I guess at least in the case of test_51b and test_51ba it was a pretty bad idea to defer a huge cleanup of current test stuff on to the next test.
As such it certainly makes sense to recombine tests into bigger more self-contained ones.
We probably should not do it to all of them, some discretion needs to be excercised to see how separate do the tests actually look like.

Comment by Andreas Dilger [ 27/Jan/14 ]

I pushed http://review.whamcloud.com/9021 to fix the sanity.sh test_51b/51ba problem.

Comment by Roman (Inactive) [ 28/Jan/14 ]

What is reason to keep dependencies in tests while the jira tickets is about avoiding dependencies? I think good solution should be merging deeply depended tests to one.

Comment by James Nunez (Inactive) [ 05/Feb/14 ]

I've refreshed the patch at http://review.whamcloud.com/#/c/5022.

This patch cleans up code and checks/corrects for sub test dependencies for:
conf-sanity.sh
dne-sanity.sh
functions.sh
sanity-quota.sh
sanity-scrub.sh

I'll do the same code clean up for other tests in the next patch.

Comment by Nathaniel Clark [ 14/Mar/14 ]

enable conf-sanity/58 on ZFS
http://review.whamcloud.com/9675

Comment by James Nunez (Inactive) [ 30/May/14 ]

I've broken up previous patches to modify just one test suite per patch. The patch cleaning up conf-sanity is at http://review.whamcloud.com/10530

More patches will follow.

Comment by Gerrit Updater [ 21/Nov/14 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/10530/
Subject: LU-2524 test: Code clean up for conf-sanity
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 648c73b50abdeeb8c6d646a7b743d748c84451d6

Comment by Gerrit Updater [ 22/Dec/14 ]

James Nunez (james.a.nunez@intel.com) uploaded a new patch: http://review.whamcloud.com/13170
Subject: LU-2524 test: Test suite clean up for replay-single
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: f78ba34a6334cb5997cbdaf9b005882507aab4ea

Comment by Gerrit Updater [ 27/Apr/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13170/
Subject: LU-2524 test: Test suite clean up for replay-single
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 0444a40d9838b868092c78d3bdd4c7c3a00199e6

Comment by Gerrit Updater [ 05/May/15 ]

James Nunez (james.a.nunez@intel.com) uploaded a new patch: http://review.whamcloud.com/14680
Subject: LU-2524 test: Clean up sanity-quota
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: dfc1b0f4ceebd580b74c1701725d778185461f4c

Comment by Gerrit Updater [ 05/May/15 ]

James Nunez (james.a.nunez@intel.com) uploaded a new patch: http://review.whamcloud.com/14685
Subject: LU-2524 test: Clean up replay-ost-single test suite
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: ead20ce593e2b6b1e85a0898ed05013430189f41

Comment by Gerrit Updater [ 20/Aug/15 ]

Andreas Dilger (andreas.dilger@intel.com) merged in patch http://review.whamcloud.com/14685/
Subject: LU-2524 test: Clean up replay-ost-single test suite
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 1d40519afde98abf22bf0e4998effc656f1cb1f0

Comment by Gerrit Updater [ 10/Dec/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/14680/
Subject: LU-2524 test: Clean up sanity-quota
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: cc623e9ff1a0b6e3ffcdfac086a7630f93e70b93

Comment by James A Simmons [ 14/Aug/16 ]

Old blocker for unsupported version

Generated at Sat Feb 10 01:25:56 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.