[LU-6629] sanity-benchmark test_bonnie: DQACQ failed with -22 Created: 21/May/15 Updated: 30/Jan/17 Resolved: 18/Nov/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.8.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Maloo | Assignee: | Niu Yawei (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | zfs | ||
| Environment: |
lustre-master build #3029 |
||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
This issue was created by maloo for sarah_lw <wei3.liu@intel.com> This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/bcb260f2-fe71-11e4-a865-5254006e85c2. The sub-test test_bonnie failed with the following error: test failed to respond and timed out this may be a dup of 04:45:24:Lustre: DEBUG MARKER: == sanity-benchmark test bonnie: bonnie++ == 04:14:01 (1431922441) 04:45:24:Lustre: DEBUG MARKER: /usr/sbin/lctl mark min OST has 1969152kB available, using 3844624kB file size 04:45:24:Lustre: DEBUG MARKER: min OST has 1969152kB available, using 3844624kB file size 04:45:24:LNet: Service thread pid 3517 completed after 77.68s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). 04:45:24:LNet: Skipped 8 previous similar messages 04:45:24:LNet: Service thread pid 3556 completed after 84.32s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). 04:45:24:LustreError: 7365:0:(qsd_handler.c:340:qsd_req_completion()) $$$ DQACQ failed with -22, flags:0x4 qsd:lustre-OST0004 qtype:grp id:500 enforced:1 granted:1048576 pending:0 waiting:0 req:1 usage:0 qunit:0 qtune:0 edquot:0 04:45:24:LustreError: 7365:0:(qsd_handler.c:340:qsd_req_completion()) Skipped 12 previous similar messages 04:45:24:LustreError: 7364:0:(qsd_handler.c:340:qsd_req_completion()) $$$ DQACQ failed with -22, flags:0x4 qsd:lustre-OST0004 qtype:grp id:500 enforced:1 granted:1048576 pending:0 waiting:0 req:1 usage:0 qunit:0 qtune:0 edquot:0 04:45:24:LustreError: 7364:0:(qsd_handler.c:340:qsd_req_completion()) Skipped 11 previous similar messages 04:45:24:LustreError: 7364:0:(qsd_handler.c:340:qsd_req_completion()) $$$ DQACQ failed with -22, flags:0x4 qsd:lustre-OST0004 qtype:grp id:500 enforced:1 granted:1048576 pending:0 waiting:0 req:1 usage:0 qunit:0 qtune:0 edquot:0 04:45:24:LustreError: 7364:0:(qsd_handler.c:340:qsd_req_completion()) Skipped 11 previous similar messages 04:45:24:LNet: Service thread pid 29001 completed after 45.78s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). 04:45:24:LNet: Skipped 11 previous similar messages 05:14:26:********** Timeout by autotest system ********** |
| Comments |
| Comment by Andreas Dilger [ 22/May/15 ] |
|
I also see in the MDS logs: 04:15:58:LustreError: 14209:0:(qmt_handler.c:420:qmt_dqacq0()) $$$ Release too much! uuid:lustre-MDT0000-lwp-OST0004_UUID release:1048576 granted:0, total:4194304 qmt:lustre-QMT0000 pool:0-dt id:500 enforced:1 hard:8533324 soft:8126976 granted:4194304 time:0 qunit:1048576 edquot:0 may_rel:0 revoke:0 |
| Comment by Niu Yawei (Inactive) [ 27/May/15 ] |
|
Looks two slaves (OST4 & OST5) are not synced with master, I can't see from log how this happened, but I think this should not be the cause of "too many service threads, or there were not enough hardware resources". |
| Comment by Niu Yawei (Inactive) [ 16/Jun/15 ] |
|
There was a defect could leads to quota slave reconnect without invalidate global locks, that could result in the quota slave & master not synced at the end. I think this has been fixed by 4f53536d002c13886210b672b657795baa067144 |
| Comment by Niu Yawei (Inactive) [ 23/Jul/15 ] |
|
If the error message of "04:15:58:LustreError: 14209:0:(qmt_handler.c:420:qmt_dqacq0()) $$$ Release too much! " were not seen on master anymore, I think we can close this ticket. This should have been fixed by following changes in the commit of 4f53536d002c13886210b672b657795baa067144 : + /* Note: lw_client is needed in MDS-MDS failover during update log
+ * processing, so we needs to allow lw_client to be connected at
+ * anytime, instead of only the initial connection */
+ lw_client = (data->ocd_connect_flags & OBD_CONNECT_LIGHTWEIGHT) != 0;
+
if (lustre_msg_get_op_flags(req->rq_reqmsg) & MSG_CONNECT_INITIAL) {
mds_conn = (data->ocd_connect_flags & OBD_CONNECT_MDS) != 0;
- lw_client = (data->ocd_connect_flags &
- OBD_CONNECT_LIGHTWEIGHT) != 0;
|
| Comment by Niu Yawei (Inactive) [ 18/Nov/16 ] |
|
Patch landed. |