[LU-1265] should not remove all files when cleanup lustre test Created: 28/Mar/12  Updated: 27/Feb/20

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Minh Diep Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 10905

 Description   

We should not remove all files under mount. This prevent multiple tests being run at the same time

check_and_cleanup_lustre() {
    if [ "$LFSCK_ALWAYS" = "yes" -a "$TESTSUITE" != "lfsck" ]; then
        get_svr_devs
        generate_db
        run_lfsck
    fi

    if is_mounted $MOUNT; then
        [ -n "$DIR" ] && rm -rf $DIR/[Rdfs][0-9]* ||
            error "remove sub-test dirs failed"
        [ "$ENABLE_QUOTA" ] && restore_quota_type || true
    fi

    if [ "$I_UMOUNTED2" = "yes" ]; then
        restore_mount $MOUNT2 || error "restore $MOUNT2 failed"
    fi

    if [ "$I_MOUNTED2" = "yes" ]; then
        cleanup_mount $MOUNT2
    fi

    if [ "$I_MOUNTED" = "yes" ]; then
        cleanupall -f || error "cleanup failed"
        unset I_MOUNTED
    fi
}


 Comments   
Comment by Andreas Dilger [ 27/Feb/20 ]

Minh, I just ran across this ticket by accident. It isn't clear what your objection is here. The code now looks like:

check_and_cleanup_lustre() {                                       
        if is_mounted $MOUNT; then     
                if $DO_CLEANUP; then   
                        [ -n "$DIR" ] && rm -rf $DIR/[Rdfs][0-9]* ||
                                error "remove sub-test dirs failed"
                else
                        echo "skip cleanup"     
                fi

so there is a "$DO_CLEANUP" check, but more importantly (even in the older code) it is only deleting files under "$DIR", which is the per-test-script directory for all the files. While there are some tests that modify "$MOUNT" directly, most tests only operate inside "$DIR". That wouldn't help for running subtests in parallel, but it would (in theory) allow multiple different test scripts to be run in parallel on a single filesystem (e.g. sanity.sh and sanityn.sh), though there are many other obstacles to that in practice, so I don't think that is a problem we can realistically solve.

For running subtests in parallel, which I think is a potentially realistic goal for sanity.sh and maybe sanityn.sh, the correct behaviour is for the subtest to check "[ $PARALLEL == yes ] && skip 'skip parallel run'" so that their changes don't mess up other running tests. I'm not at all against that, if it would speed up testing (e.g. avoid a lot of waiting).

I can definitely imagine that there are many tests that do not check "$PARALLEL" properly before changing global state (stopping servers, changing the default filesystem layout, checking whether a specific amount of space is consumed, etc.), but since we are not running tests in parallel this is not noticed/doesn't matter.

Generated at Sat Feb 10 01:15:05 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.