[LU-14849] sanity test 30d sporadically fails (Lustre 2.14) Created: 13/Jul/21  Updated: 14/Jul/21

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.14.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Xiaolin Zang Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: tests, zfs
Environment:

2 mds, 2 oss and 2 clients. 4 1T disks for each oss. The test apparatus (test-framework.sh) configures each disk as a zfs pool containing an ost dataset with --device-size=400000 (defined in local.sh) using mdfs.lustre.


Epic/Theme: test
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

sanity test 30d (see code below) is a new test in Lustre 2.14.  It runs 10 128M dd in parallel to the same file name.  It fails occasionally with "no space left on device".  We suspect the failure is related to a large file (128M) being repeated created/deleted in a small dataset (400M), causing space recycling problem on the drive.  For now we plan to increase the size of the ost dataset in each pool to 1G to mitigate the problem.

We have the following questions: 1. How should the ost dataset be sized for Luster tests?  We could give it a much larger size but are concerned that will cause problems to other Luster tests.  For example, some tests may be expected to fail with "no space left" error.  With a very large size it may take a long time to reach the failure state.

2. Although the dd file object repeatedly hit a relatively small dataset, it's still within the space limitation with a wide margin.  Then is the out of space error a reasonable behavior?

test_30d() {
        cp $(which dd) $DIR || error "failed to copy dd to $DIR/dd"
        for i in {1..10}; do
                $DIR/dd bs=1M count=128 if=/dev/zero of=$DIR/$tfile &
                local PID=$!
                sleep 1
                $LCTL set_param ldlm.namespaces.*MDT*.lru_size=clear
                wait $PID || error "executing dd from Lustre failed"
                rm -f $DIR/$tfile
        done
        rm -f $DIR/dd
}


Generated at Sat Feb 10 03:13:19 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.