[LU-340] system hang when running sanity-quota on RHEL5-x86_64-OFED Created: 17/May/11 Updated: 01/Apr/13 Resolved: 01/Apr/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.1.0, Lustre 2.1.1 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Sarah Liu | Assignee: | Niu Yawei (Inactive) |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Environment: |
lustre-master/RHEL5-x86_64/#120/ofa build |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 6100 | ||||||||
| Description |
|
system hang when running sanity-quota on RHEL5-x86_64-ofa build. Please see the attachment for all the logs. |
| Comments |
| Comment by Peter Jones [ 18/May/11 ] |
|
Niu Please look into this quotas issue when you get a chance Thanks Peter |
| Comment by Niu Yawei (Inactive) [ 19/May/11 ] |
|
From the log we can see all pdflush threads on client were waiting on page lock, whereas the dd thread was holding the page lock to do synchronous IO, because of something wrong with group quota, the synchronous I/O can't finish in time, which caused the pdflush threads stalled. What confused me is that there were lots of "dqacq/dqrel failed! (rc:-5)" errors while setting group quota, but setting user quota was done successfully, and the user quota limit tests passed also. Looks there are only two possible cases that dqacq_handler() return -EIO, one is OBD_FAIL_OBD_DQACQ and another is ll_sb_has_quota_active() checking fails. Hi, Sarah Is it repeatable? What's the /proc/fs/lustre/fail_loc on mds? Thanks. |
| Comment by Sarah Liu [ 19/May/11 ] |
yes, it can be reproduced. |
| Comment by Niu Yawei (Inactive) [ 19/May/11 ] |
|
Is the D_QUOTA enabled? can we get the debug log on MDS? |
| Comment by Sarah Liu [ 20/May/11 ] |
no. I can give you debug log tomorrow. please tell me the debug mask |
| Comment by Niu Yawei (Inactive) [ 20/May/11 ] |
|
I think the default + D_QUOTA will be fine, thank you, Sarah. |
| Comment by Niu Yawei (Inactive) [ 22/May/11 ] |
|
Thank you, Sarah. I think the debug_log confirmed that dqacq_handler failed for group quota not enabled or fail_loc set. Could you try the following commands on client-5 to see what will happen? (quotacheck then set group quota): |
| Comment by Sarah Liu [ 24/May/11 ] |
|
[root@client-15 ~]# lfs quotacheck -ug /mnt/lustre/ |
| Comment by Niu Yawei (Inactive) [ 26/May/11 ] |
|
When I logon to the system, I found that "lfs quotaon -ug" can't turn on the local fs group quota on mds, though it can be successfully executed and no any abnormal messages in the debug log. The local fs group quota can be enabled by a "lfs quotaon -g", and after the "lfs quotaon -g" executed, the system returned back to normal status, the group quota can be enable/disabled by "lfs quotaon/off -ug" again. This bug appeared only on ofa build server, so I suspect it's ofa build related, will continue the investigation when I have time and spare nodes. |
| Comment by Jian Yu [ 29/Aug/11 ] |
|
Lustre Clients: Lustre Servers: sanity-quota test 1 hung: https://maloo.whamcloud.com/test_sets/842c0928-cfc6-11e0-8d02-52540025f9af Dmesg on MDS (fat-amd-1-ib) showed: Lustre: DEBUG MARKER: == test 1: Block hard limit (normal use and out of quota) === == 01:51:35 Lustre: DEBUG MARKER: User quota (limit: 95511 kbytes) Lustre: DEBUG MARKER: Write ... Lustre: DEBUG MARKER: Done Lustre: DEBUG MARKER: Write out of block quota ... Lustre: DEBUG MARKER: -------------------------------------- Lustre: DEBUG MARKER: Group quota (limit: 95511 kbytes) LustreError: 8250:0:(ldlm_lib.c:2341:target_handle_dqacq_callback()) dqacq/dqrel failed! (rc:-5) LustreError: 8251:0:(ldlm_lib.c:2341:target_handle_dqacq_callback()) dqacq/dqrel failed! (rc:-5) LustreError: 6520:0:(quota_context.c:708:dqacq_completion()) acquire qunit got error! (rc:-5) LustreError: 6520:0:(quota_master.c:1263:mds_init_slave_blimits()) error mds adjust local block quota! (rc:-5) LustreError: 6520:0:(quota_master.c:1442:mds_set_dqblk()) init slave blimits failed! (rc:-5) <~snip~> |
| Comment by Jian Yu [ 30/Aug/11 ] |
|
Lustre Branch: master The same failure occurred while running sanity-quota test: https://maloo.whamcloud.com/test_sets/4115f084-d2de-11e0-8d02-52540025f9af |
| Comment by Jian Yu [ 16/Feb/12 ] |
|
Lustre Tag: v2_1_1_0_RC2 The same issue occurred: https://maloo.whamcloud.com/test_sets/f95cf180-584c-11e1-9df1-5254004bbbd3 |
| Comment by Niu Yawei (Inactive) [ 01/Apr/13 ] |
|
Fixed in |