[LU-9557] sanity-quota test 0 fails with 'SLOW IO for quota_usr (user): XX KB/sec' Created: 24/May/17 Updated: 26/May/17 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.10.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | James Nunez (Inactive) | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
sanity-quota test_0 fails with 'SLOW IO for quota_usr (user): 426 KB/sec' The last thing seen in the test_log from test 0 is as expected running as uid/gid/euid/egid 60000/60000/60000/60000, groups: [dd] [if=/dev/zero] [bs=1M] [of=/mnt/lustre/d0.sanity-quota/f0.sanity-quota-0] [count=10] [conv=fsync] 10+0 records in 10+0 records out 10485760 bytes (10 MB) copied, 0.427085 s, 24.6 MB/s sanity-quota test_0: @@@@@@ FAIL: SLOW IO for quota_usr (user): 426 KB/sec The subroutine that is failing is test_quota_performance() in sanity-quota.sh: 423 test_quota_performance() {
424 local TESTFILE="$DIR/$tdir/$tfile-0"
425 local size=$1 # in MB
426 local stime=$(date +%s)
427 $RUNAS $DD of=$TESTFILE count=$size conv=fsync ||
428 quota_error u $TSTUSR "write failure"
429 local etime=$(date +%s)
430 delta=$((etime - stime))
431 if [ $delta -gt 0 ]; then
432 rate=$((size * 1024 / delta))
433 if [ $(facet_fstype $SINGLEMDS) = "zfs" ]; then
434 # LU-2872 - see LU-2887 for fix
435 [ $rate -gt 64 ] ||
436 error "SLOW IO for $TSTUSR (user): $rate KB/sec"
437 else
438 [ $rate -gt 1024 ] ||
439 error "SLOW IO for $TSTUSR (user): $rate KB/sec"
440 fi
441 fi
442 rm -f $TESTFILE
443 }
This test started failing with this error message starting May 12, 2017 and is only failing, so far, in review-dne-part-2. The only patch that modified sanity-quota and landed close to the time this test started to fail was reviewed at https://review.whamcloud.com/26411 and has change-ID I4f22c1e9. There is nothing interesting in the console logs on any node. Here are all the failures so far: |
| Comments |
| Comment by Andreas Dilger [ 26/May/17 ] |
|
It's strange that this test is complaining about running too slow, when "dd" is reporting a much higher speed: 10485760 bytes (10 MB) copied, 0.427085 s, 24.6 MB/s sanity-quota test_0: @@@@@@ FAIL: SLOW IO for quota_usr (user): 426 KB/sec "dd" reports 24.6MB/s, the script calculates 426 KB/s? It would be useful to improve the error message to include stime, etime, and delta. As a side note, it is better to use the bash built-in "$SECONDS" variable to get stime and then calculate delta=$((SECONDS - stime)) directly to avoid two extra forks to call date. |