Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.14.0
-
None
-
ZFS
-
3
-
9223372036854775807
Description
sanity-quota test_61 fails with 'write failed, expect succeed'. The first time this test failed with this error message/case was on 05 May 2020 at https://testing.whamcloud.com/test_sets/f918d217-9487-479e-8294-16936426fe25 for Lustre 2.13.53.162. The failures are mostly seen for ZFS; 200 ZFS failures out of 218 total failures for this test with this error. So far, we are only seeing this failure on master( future 2.14.0).
Looking at a recent failure with no other sanity-quota test failures, at https://testing.whamcloud.com/test_sets/8ced1c69-2747-41c9-a580-8a3c5fcdf857, we see that sanity-quota test 61 fails with ‘Disk quota exceeded’ after increasing the default quota:
set to use default quota set default quota get default quota Disk default usr quota: Filesystem bquota blimit bgrace iquota ilimit igrace /mnt/lustre 20480 20480 0 0 0 10 Test not out of quota running as uid/gid/euid/egid 60000/60000/60000/60000, groups: [dd] [if=/dev/zero] [bs=1M] [of=/mnt/lustre/d61.sanity-quota/f61.sanity-quota-0] [count=10] [oflag=sync] 10+0 records in 10+0 records out 10485760 bytes (10 MB, 10 MiB) copied, 1.71261 s, 6.1 MB/s Test out of quota CMD: trevis-201vm4 lctl set_param -n os[cd]*.*MDT*.force_sync=1 CMD: trevis-201vm3 lctl set_param -n osd*.*OS*.force_sync=1 running as uid/gid/euid/egid 60000/60000/60000/60000, groups: [dd] [if=/dev/zero] [bs=1M] [of=/mnt/lustre/d61.sanity-quota/f61.sanity-quota-0] [count=40] [oflag=sync] dd: error writing '/mnt/lustre/d61.sanity-quota/f61.sanity-quota-0': Disk quota exceeded 20+0 records in 19+0 records out 19922944 bytes (20 MB, 19 MiB) copied, 3.91893 s, 5.1 MB/s Increase default quota CMD: trevis-201vm4 lctl set_param -n os[cd]*.*MDT*.force_sync=1 CMD: trevis-201vm3 lctl set_param -n osd*.*OS*.force_sync=1 running as uid/gid/euid/egid 60000/60000/60000/60000, groups: [dd] [if=/dev/zero] [bs=1M] [of=/mnt/lustre/d61.sanity-quota/f61.sanity-quota-0] [count=40] [oflag=sync] dd: error writing '/mnt/lustre/d61.sanity-quota/f61.sanity-quota-0': Disk quota exceeded 1+0 records in 0+0 records out 0 bytes copied, 0.00246058 s, 0.0 kB/s CMD: trevis-201vm4 /usr/sbin/lctl get_param -n version 2>/dev/null CMD: trevis-201vm4 zpool get all sanity-quota test_61: @@@@@@ FAIL: write failed, expect succeed Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:6273:error() = /usr/lib64/lustre/tests/sanity-quota.sh:159:quota_error() = /usr/lib64/lustre/tests/sanity-quota.sh:3996:test_default_quota() = /usr/lib64/lustre/tests/sanity-quota.sh:4051:test_61()
Looking at the test, we increase the default quota, cancel OSC and MDC locks and sync data. We then run dd again and get the quota exceeded error
3986 log "Increase default quota" 3987 # increase default quota 3988 $LFS setquota $qdtype $qs $((LIMIT*3)) $qh $((LIMIT*3)) $DIR || 3989 error "set default quota failed" 3990 3991 cancel_lru_locks osc 3992 cancel_lru_locks mdc 3993 sync; sync_all_data || true 3994 if [ $qpool == "data" ]; then 3995 $RUNAS $DD of=$TESTFILE count=$((LIMIT*2 >> 10)) oflag=sync || 3996 quota_error $qtype $qid "write failed, expect succeed" 3997 else 3998 $RUNAS createmany -m $TESTFILE $((LIMIT*2)) || 3999 quota_error $qtype $qid "create failed, expect succeed" 4000 4001 unlinkmany $TESTFILE $((LIMIT*2)) 4002 fi
Logs for other failures are at
https://testing.whamcloud.com/test_sets/470f88c5-e555-4352-bd2f-ddb2f281e7b6
https://testing.whamcloud.com/test_sets/a973395d-3213-4b34-ae57-45155f98ee26