[LU-7658] sanity-krb5 test_90 fails with 'dbench exit with error' Created: 13/Jan/16  Updated: 13/Oct/21  Resolved: 13/Oct/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: James Nunez (Inactive) Assignee: WC Triage
Resolution: Cannot Reproduce Votes: 0
Labels: None
Environment:

Eagle cluster with Lustre 2.7.64


Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Running the sanity-krb5 test suite on Lustre systems with a separate MGS and MDS on the separate nodes, test 90 fails with

'dbench exit with error'

We set SLOW=yes for this set of tests. For sanity-krb5 test 90, that means that we call 'lfs flushctx' on the mount point 6 times more than we not running with the LSOW flag. From test 90:

        if [ "$SLOW" = "no" ]; then
	        total=10
        else
                total=60
	fi

        restore_to_default_flavor
        set_rule $FSNAME any any krb5p
        wait_flavor all2all krb5p

        start_dbench

        for ((n=0;n<$total;n++)); do
                sleep 2
                check_dbench
                echo "flush ctx ($n/$total) ..."
		$LFS flushctx $MOUNT || error "can't flush context on $MOUNT"
        done

From the test logs, debench had finished:

...
flush ctx (13/60) ...
flush ctx (14/60) ...
flush ctx (15/60) ...
flush ctx (16/60) ...
flush ctx (17/60) ...
flush ctx (18/60) ...
dbench 19216 already finished
 sanity-krb5 test_90: @@@@@@ FAIL: dbench  exit with error

This test fails infrequently.

Logs are at:
https://testing.hpdd.intel.com/test_sets/a246e266-b987-11e5-a748-5254006e85c2


Generated at Sat Feb 10 02:10:49 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.