[LU-11390] sanity-quota test_61: test timeout Created: 18/Sep/18 Updated: 30/Nov/18 Resolved: 13/Nov/18 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.0 |
| Fix Version/s: | Lustre 2.12.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | Hongchao Zhang |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
This issue was created by maloo for sarah <sarah@whamcloud.com> This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/4b207cce-b97d-11e8-9df3-52540065bddc test_61 failed with the following error: Timeout occurred after 1369 mins, last suite running was sanity-quota, restarting cluster to continue tests cannot find error msg test log set to use default quota
set default quota
get default quota
Disk default grp quota:
Filesystem bquota blimit bgrace iquota ilimit igrace
/mnt/lustre 20480 20480 0 0 0 604800
Test not out of quota
running as uid/gid/euid/egid 60000/60000/60000/60000, groups:
[dd] [if=/dev/zero] [bs=1M] [of=/mnt/lustre/d61.sanity-quota/f61.sanity-quota-0] [count=10] [oflag=sync]
10+0 records in
10+0 records out
10485760 bytes (10 MB) copied, 343.958 s, 30.5 kB/s
Test out of quota
CMD: trevis-38vm4 lctl set_param -n os[cd]*.*MDT*.force_sync=1
CMD: trevis-38vm3 lctl set_param -n osd*.*OS*.force_sync=1
running as uid/gid/euid/egid 60000/60000/60000/60000, groups:
[dd] [if=/dev/zero] [bs=1M] [of=/mnt/lustre/d61.sanity-quota/f61.sanity-quota-0] [count=40] [oflag=sync]
dd: error writing '/mnt/lustre/d61.sanity-quota/f61.sanity-quota-0': Disk quota exceeded
19+0 records in
18+0 records out
18874368 bytes (19 MB) copied, 636.705 s, 29.6 kB/s
Increase default quota
CMD: trevis-38vm4 lctl set_param -n os[cd]*.*MDT*.force_sync=1
CMD: trevis-38vm3 lctl set_param -n osd*.*OS*.force_sync=1
running as uid/gid/euid/egid 60000/60000/60000/60000, groups:
[dd] [if=/dev/zero] [bs=1M] [of=/mnt/lustre/d61.sanity-quota/f61.sanity-quota-0] [count=40] [oflag=sync]
40+0 records in
40+0 records out
41943040 bytes (42 MB) copied, 124.211 s, 338 kB/s
Set quota to override default quota
CMD: trevis-38vm4 lctl set_param -n os[cd]*.*MDT*.force_sync=1
CMD: trevis-38vm3 lctl set_param -n osd*.*OS*.force_sync=1
running as uid/gid/euid/egid 60000/60000/60000/60000, groups:
[dd] [if=/dev/zero] [bs=1M] [of=/mnt/lustre/d61.sanity-quota/f61.sanity-quota-0] [count=40] [oflag=sync]
VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV |
| Comments |
| Comment by Peter Jones [ 19/Sep/18 ] |
|
Hongchao Can you please investigate? Thanks Peter |
| Comment by Hongchao Zhang [ 26/Sep/18 ] |
|
this issue is caused by the wrong over-quota flag sent back to OSC, which cause the IO to be synchronized for each page. static int qsd_op_begin0(const struct lu_env *env, struct qsd_qtype_info *qqi,
struct lquota_id_info *qid, long long space,
int *flags)
{
...
if (flags != NULL) {
out_flags:
LASSERT(qid->lqi_is_blk);
if (rc != 0) {
*flags |= lquota_over_fl(qqi->qqi_qtype);
} else {
__u64 usage;
lqe_read_lock(lqe);
usage = lqe->lqe_usage;
usage += lqe->lqe_pending_write;
usage += lqe->lqe_waiting_write;
usage += qqi->qqi_qsd->qsd_sync_threshold;
qtype_flag = lquota_over_fl(qqi->qqi_qtype);
/* if we should notify client to start sync write */
if (usage >= lqe->lqe_granted - lqe->lqe_pending_rel)
*flags |= qtype_flag;
else
*flags &= ~qtype_flag;
lqe_read_unlock(lqe);
}
}
...
After this, the following write at OSC will be synchronized for each page, which in turn cause the over-quota flag |
| Comment by Gerrit Updater [ 26/Sep/18 ] |
|
Hongchao Zhang (hongchao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33238 |
| Comment by Gerrit Updater [ 13/Nov/18 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33238/ |
| Comment by Peter Jones [ 13/Nov/18 ] |
|
Landed for 2.12 |