[LU-4857] (qsd_handler.c:344:qsd_req_completion()) $$$ DQACQ failed with -22 Created: 03/Apr/14 Updated: 23/Jun/16 Resolved: 23/Jun/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.2 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | javed shaikh (Inactive) | Assignee: | Niu Yawei (Inactive) |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Environment: |
CentOS 6.4 / Kernel 2.6.32-358.18.1.el6_lustre.x86_64 |
||
| Severity: | 3 |
| Rank (Obsolete): | 13395 |
| Description |
|
hi, i'm seeing following messages from 2.4.2 OSS every few minutes: 2014-04-02T16:32:48+11:00 lemming17 kernel: LustreError: 17701:0:(qsd_handler.c:344:qsd_req_completion()) $$$ DQACQ failed with -22, flags:0x4 qsd:short-OST011f qtype:grp id:6644 enforced:1 granted:1048576 pending:0 waiting:0 req:1 usage:0 qunit:0 qtune:0 edquot:0 surprisingly, the errors are spewed only from two OSSes out of the lot and only for specific OSTs. at the same time MDS is throwing following: 2014-04-02T16:32:36+11:00 gerbil5 kernel: LustreError: 17470:0:(qmt_handler.c:431:qmt_dqacq0()) $$$ Release too much! uuid:short-MDT0000-lwp-OST011f_UUID release:1048576 granted:0, total:354880716 qmt:short-QMT0000 pool:0-dt id:6644 enforced:1 hard:2516582400 soft:12582 91200 granted:354880716 time:0 qunit:1048576 edquot:0 may_rel:0 revoke:0 these errors are seen after all the servers have been rebooted afresh as part of maintenance cycle (some other LBUGs were fixed). any pointers what could be causing it? |
| Comments |
| Comment by Niu Yawei (Inactive) [ 03/Apr/14 ] |
|
This looks like the quota slave index files on master and slave are not synchronized somehow. 1. Did the error message show up without a stop, or it disappeared after a while? If force reintegration can't resolve the problem, could you collect lustre log with D_QUOTA and D_TRACE enabled? (execute following instructions on the OSS to collect log)
|
| Comment by javed shaikh (Inactive) [ 04/Apr/14 ] |
|
1. error msgs rolled every 10mins $ cat /proc/fs/lustre/osd-ldiskfs/short-OST0115/quota_slave/info target name: short-OST0115 pool ID: 0 type: dt quota enabled: g conn to master: setup space acct: ug user uptodate: glb[0],slv[0],reint[0] group uptodate: glb[1],slv[1],reint[0] i'll observe for a day and then get back. thanks. |
| Comment by javed shaikh (Inactive) [ 07/Apr/14 ] |
|
update. |
| Comment by javed shaikh (Inactive) [ 07/Apr/14 ] |
|
this ticket can be closed. thanks. |
| Comment by Niu Yawei (Inactive) [ 08/Apr/14 ] |
|
Hi, Javed, is there any error messages in the MDT or OST before this start? I'd like to see why the reintegration failed. |
| Comment by javed shaikh (Inactive) [ 08/Apr/14 ] |
|
just checked for one of affected OSTs. no error msgs. only msgs related to client evictions, orphan obj deletions. |
| Comment by javed shaikh (Inactive) [ 14/Apr/14 ] |
|
just got this from another OSS: 2014-04-10T17:56:09+10:00 lemming2 kernel: Lustre: short-OST008e: haven't heard from client 4ab84a25-da95-49af-a512-e406eb4af84f (at 10.9.3.2@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8820371ff400, cur 1397116569 expire 1397116419 last 1397116342 2014-04-10T17:56:09+10:00 lemming2 kernel: Lustre: Skipped 20 previous similar messages 2014-04-11T09:20:49+10:00 lemming2 kernel: LustreError: 25971:0:(ldlm_resource.c:1165:ldlm_resource_get()) short-OST008e: lvbo_init failed for resource 0xb65a43:0x0: rc = -2 2014-04-11T10:08:10+10:00 lemming2 kernel: LustreError: 25593:0:(qsd_handler.c:344:qsd_req_completion()) $$$ DQACQ failed with -22, flags:0x4 qsd:short-OST008e qtype:grp id:4093 enforced:1 granted:65536 pending:0 waiting:0 req:1 usage:0 qunit:0 qtune:0 edquot:0 though there's one more OSS where the DQACQ error occurred on OST without prior errors on that OST. |
| Comment by Niu Yawei (Inactive) [ 23/Jun/16 ] |
|
dup of |