Details
-
Bug
-
Resolution: Cannot Reproduce
-
Minor
-
None
-
Lustre 2.4.3, Lustre 2.5.3
-
None
-
We are running lustre 2.5.3 on all our servers, with zfs 0.6.3 on the OSS and ldiskfs/ext4 on the MDS. (all 18 servers are running centos 6.5)
The client nodes are running lustre 2.4.3 on centos 6.6
Description
We have a quota problem on one of our OST.
Here's the error logs:
LustreError: 11-0: lustre1-MDT0000-lwp-OST0008: Communicating with 10.225.8.3@o2ib, operation ldlm_enqueue failed with -3. LustreError: 12476:0:(qsd_handler.c:340:qsd_req_completion()) $$$ DQACQ failed with -3, flags:0x9 qsd:lustre1-OST0008 qtype:grp id:10011 enforced:1 granted:1276244380 pending:0 waiting:128 req:1 usage:1276244415 qunit:0 qtune:0 edquot:0 LustreError: 12476:0:(qsd_handler.c:767:qsd_op_begin0()) $$$ ID isn't enforced on master, it probably due to a legeal race, if this message is showing up constantly, there could be some inconsistence between master & slave, and quota reintegration needs be re-triggered. qsd:lustre1-OST0008 qtype:grp id:10011 enforced:1 granted:1276244380 pending:0 waiting:0 req:0 usage:1276244415 qunit:0 qtune:0 edquot:0
the errors occurs only on this OST. (and for that groupid only)
We set the quotas with these commands:
lfs setquota -g $gid --block-softlimit 40t --block-hardlimit 40t /lustre1 lfs setquota -u $uid --inode-softlimit 1000000 --inode-hardlimit 1000000 /lustre1
and for the group 10011, we have disabled the quotas 1 or 2 days before the errors occur, using:
lfs setquota -g 10011 --block-softlimit 0 --block-hardlimit 0 /lustre1
What does mean "quota reintegration needs be re-triggered"? I guess it's to run an "lfs quotacheck" on the filesystem, right?
Thanks
JS
Attachments
Issue Links
- is related to
-
LU-4404 sanity-quota test_0: FAIL: SLOW IO for quota_usr (user): 50 KB/sec
- Closed