[LU-2284] Test failure on test suite sanity-quota, subtest test_7d Created: 05/Nov/12 Updated: 09/Dec/12 Resolved: 09/Dec/12 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0 |
| Fix Version/s: | Lustre 2.4.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Maloo | Assignee: | Niu Yawei (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | NFBlocker | ||
| Environment: |
SLES11 SP2 client |
||
| Severity: | 3 |
| Rank (Obsolete): | 5468 |
| Description |
|
This issue was created by maloo for sarah <sarah@whamcloud.com> This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/ba0c6718-2709-11e2-b04c-52540035b04c. The sub-test test_7d failed with the following error:
== sanity-quota test 7d: Quota reintegration (Transfer index in multiple bulks) ====================== 13:56:52 (1352066212)
Waiting for local destroys to complete
Creating test directory
CMD: fat-intel-3vm7 lctl set_param fail_val=0
fail_val=0
CMD: fat-intel-3vm7 lctl set_param fail_loc=0
fail_loc=0
CMD: fat-intel-3vm8 lctl set_param fail_val=0
fail_val=0
CMD: fat-intel-3vm8 lctl set_param fail_loc=0
fail_loc=0
CMD: fat-intel-3vm8 lctl set_param fail_val=0
fail_val=0
CMD: fat-intel-3vm8 lctl set_param fail_loc=0
fail_loc=0
CMD: fat-intel-3vm8 lctl set_param fail_val=0
fail_val=0
CMD: fat-intel-3vm8 lctl set_param fail_loc=0
fail_loc=0
CMD: fat-intel-3vm8 lctl set_param fail_val=0
fail_val=0
CMD: fat-intel-3vm8 lctl set_param fail_loc=0
fail_loc=0
CMD: fat-intel-3vm8 lctl set_param fail_val=0
fail_val=0
CMD: fat-intel-3vm8 lctl set_param fail_loc=0
fail_loc=0
CMD: fat-intel-3vm8 lctl set_param fail_val=0
fail_val=0
CMD: fat-intel-3vm8 lctl set_param fail_loc=0
fail_loc=0
CMD: fat-intel-3vm8 lctl set_param fail_val=0
fail_val=0
CMD: fat-intel-3vm8 lctl set_param fail_loc=0
fail_loc=0
CMD: fat-intel-3vm7 /usr/sbin/lctl conf_param lustre.quota.ost=none
CMD: fat-intel-3vm8 /usr/sbin/lctl dl
CMD: fat-intel-3vm8 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-OST0000.quota_slave.enabled
Waiting 90 secs for update
CMD: fat-intel-3vm8 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-OST0000.quota_slave.enabled
CMD: fat-intel-3vm8 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-OST0000.quota_slave.enabled
CMD: fat-intel-3vm8 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-OST0000.quota_slave.enabled
CMD: fat-intel-3vm8 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-OST0000.quota_slave.enabled
CMD: fat-intel-3vm8 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-OST0000.quota_slave.enabled
CMD: fat-intel-3vm8 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-OST0000.quota_slave.enabled
Updated after 6s: wanted 'none' got 'none'
CMD: fat-intel-3vm8 /usr/sbin/lctl dl
CMD: fat-intel-3vm8 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-OST0001.quota_slave.enabled
CMD: fat-intel-3vm8 /usr/sbin/lctl dl
CMD: fat-intel-3vm8 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-OST0002.quota_slave.enabled
CMD: fat-intel-3vm8 /usr/sbin/lctl dl
CMD: fat-intel-3vm8 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-OST0003.quota_slave.enabled
CMD: fat-intel-3vm8 /usr/sbin/lctl dl
CMD: fat-intel-3vm8 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-OST0004.quota_slave.enabled
CMD: fat-intel-3vm8 /usr/sbin/lctl dl
CMD: fat-intel-3vm8 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-OST0005.quota_slave.enabled
CMD: fat-intel-3vm8 /usr/sbin/lctl dl
CMD: fat-intel-3vm8 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-OST0006.quota_slave.enabled
setquota failed: Invalid argument
CMD: fat-intel-3vm7 lctl set_param fail_val=0
fail_val=0
CMD: fat-intel-3vm7 lctl set_param fail_loc=0x608
fail_loc=0x608
CMD: fat-intel-3vm7 /usr/sbin/lctl conf_param lustre.quota.ost=u
CMD: fat-intel-3vm8 /usr/sbin/lctl dl
CMD: fat-intel-3vm8 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-OST0000.quota_slave.enabled
Waiting 90 secs for update
CMD: fat-intel-3vm8 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-OST0000.quota_slave.enabled
CMD: fat-intel-3vm8 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-OST0000.quota_slave.enabled
CMD: fat-intel-3vm8 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-OST0000.quota_slave.enabled
CMD: fat-intel-3vm8 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-OST0000.quota_slave.enabled
CMD: fat-intel-3vm8 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-OST0000.quota_slave.enabled
CMD: fat-intel-3vm8 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-OST0000.quota_slave.enabled
Updated after 6s: wanted 'u' got 'u'
CMD: fat-intel-3vm8 /usr/sbin/lctl dl
CMD: fat-intel-3vm8 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-OST0001.quota_slave.enabled
CMD: fat-intel-3vm8 /usr/sbin/lctl dl
CMD: fat-intel-3vm8 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-OST0002.quota_slave.enabled
CMD: fat-intel-3vm8 /usr/sbin/lctl dl
CMD: fat-intel-3vm8 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-OST0003.quota_slave.enabled
CMD: fat-intel-3vm8 /usr/sbin/lctl dl
CMD: fat-intel-3vm8 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-OST0004.quota_slave.enabled
CMD: fat-intel-3vm8 /usr/sbin/lctl dl
CMD: fat-intel-3vm8 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-OST0005.quota_slave.enabled
CMD: fat-intel-3vm8 /usr/sbin/lctl dl
CMD: fat-intel-3vm8 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-OST0006.quota_slave.enabled
CMD: fat-intel-3vm8 /usr/sbin/lctl dl
CMD: fat-intel-3vm8 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-OST0000.quota_slave.info |
grep user | awk '{ print \$3 }'
CMD: fat-intel-3vm8 /usr/sbin/lctl dl
CMD: fat-intel-3vm8 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-OST0001.quota_slave.info |
grep user | awk '{ print \$3 }'
CMD: fat-intel-3vm8 /usr/sbin/lctl dl
CMD: fat-intel-3vm8 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-OST0002.quota_slave.info |
grep user | awk '{ print \$3 }'
CMD: fat-intel-3vm8 /usr/sbin/lctl dl
CMD: fat-intel-3vm8 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-OST0003.quota_slave.info |
grep user | awk '{ print \$3 }'
CMD: fat-intel-3vm8 /usr/sbin/lctl dl
CMD: fat-intel-3vm8 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-OST0004.quota_slave.info |
grep user | awk '{ print \$3 }'
CMD: fat-intel-3vm8 /usr/sbin/lctl dl
CMD: fat-intel-3vm8 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-OST0005.quota_slave.info |
grep user | awk '{ print \$3 }'
CMD: fat-intel-3vm8 /usr/sbin/lctl dl
CMD: fat-intel-3vm8 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-OST0006.quota_slave.info |
grep user | awk '{ print \$3 }'
CMD: fat-intel-3vm7 lctl set_param fail_val=0
fail_val=0
CMD: fat-intel-3vm7 lctl set_param fail_loc=0
fail_loc=0
running as uid/gid/euid/egid 60000/60000/60000/60000, groups:
[dd] [if=/dev/zero] [bs=1M] [of=/mnt/lustre/d0.sanity-quota/d7/f.sanity-quota.7d] [count=21] [oflag=sync]
dd: writing `/mnt/lustre/d0.sanity-quota/d7/f.sanity-quota.7d': Disk quota exceeded
20+0 records in
19+0 records out
19922944 bytes (20 MB) copied, 47.8415 s, 416 kB/s
running as uid/gid/euid/egid 60001/60001/60001/60001, groups:
[dd] [if=/dev/zero] [bs=1M] [of=/mnt/lustre/d0.sanity-quota/d7/f.sanity-quota.7d-1] [count=21] [oflag=sync]
21+0 records in
21+0 records out
22020096 bytes (22 MB) copied, 2.24614 s, 9.8 MB/s
Disk quotas for user quota_2usr (uid 60001):
Filesystem kbytes quota limit grace files quota limit grace
/mnt/lustre 21504 14450480 15173004 - 1 1048576 1101004 -
lustre-MDT0000_UUID
0 - 0 - 1 - 262144 -
lustre-OST0000_UUID
0 - 0 - - - - -
lustre-OST0001_UUID
0 - 0 - - - - -
lustre-OST0002_UUID
0 - 20480 - - - - -
lustre-OST0003_UUID
21504 - 1048576 - - - - -
lustre-OST0004_UUID
0 - 0 - - - - -
lustre-OST0005_UUID
0 - 0 - - - - -
lustre-OST0006_UUID
0 - 0 - - - - -
Files for user (quota_2usr):
File: `/mnt/lustre/d0.sanity-quota/d7/f.sanity-quota.7d-1'
Size: 22020096 Blocks: 43008 IO Block: 2097152 regular file
Device: 2c54f966h/743766374d Inode: 144117285630509373 Links: 1
Access: (0644/-rw-r--r--) Uid: (60001/quota_2usr) Gid: (60001/quota_2usr)
Access: 2012-11-04 13:58:04.000000000 -0800
Modify: 2012-11-04 13:58:07.000000000 -0800
Change: 2012-11-04 13:58:07.000000000 -0800
Birth: -
sanity-quota test_7d: @@@@@@ FAIL: quota_2usr write success, expect EDQUOT
|
| Comments |
| Comment by Jodi Levi (Inactive) [ 06/Nov/12 ] |
|
Can you please have a look at this one? |
| Comment by Niu Yawei (Inactive) [ 06/Nov/12 ] |
|
The setquota for TSTUSR2 failed: setquota failed: Invalid argument That's why the quota_2usr write didn't get EDQUOT. Unfortunately, I didn't find any helpful information from the MDS log. Sarah, if it can be reproduced, could you reproduce it and collect debuglog from MDS (with D_TRACE & D_QUOTA enabled), I think that'll be helpful to see why the setquota failed with -EINVAL. Thanks. |
| Comment by Sarah Liu [ 09/Nov/12 ] |
|
I tried twice but cannot reproduce it. |
| Comment by Peter Jones [ 09/Nov/12 ] |
|
Sarah how frequently is this testing failing when it is being run by auto test? |
| Comment by Sarah Liu [ 09/Nov/12 ] |
Twice, here is another failure. https://maloo.whamcloud.com/test_sets/c03ea79c-2a13-11e2-94cd-52540035b04c |
| Comment by Niu Yawei (Inactive) [ 11/Nov/12 ] |
|
http://review.whamcloud.com/4510 I made a debug patch which enables D_TRACE & D_QUOTA before setquota in test_7d, hope we can reproduce it in auto-test. |
| Comment by Andreas Dilger [ 23/Nov/12 ] |
|
This is reported as failing 11 times in the past week, according to: https://maloo.whamcloud.com/test_sets/query?utf8=%E2%9C%93&test_set[test_set_script_id]=&test_set[status]=&test_set[query_bugs]=LU-2284&test_session[test_host]=&test_session[test_group]=&test_session[user_id]=&test_session[query_date]=&test_session[query_recent_period]=&test_node[os_type_id]=&test_node[distribution_type_id]=&test_node[architecture_type_id]=&test_node[file_system_type_id]=&test_node[lustre_branch_id]=&test_node_network[network_type_id]=&commit=Update+results The most recent of which is: Do any of these failures have the debug information you need to resolve this problem? |
| Comment by Niu Yawei (Inactive) [ 27/Nov/12 ] |
This one is based on e464b9bbac8e0e900777a4e8a655e156fb901c5a, which doesn't include the debug patch. |
| Comment by Niu Yawei (Inactive) [ 28/Nov/12 ] |
|
I found few failures with debug patch, for example: https://maloo.whamcloud.com/test_sets/dc835e9c-3711-11e2-a300-52540035b04c 00040000:00000001:1.0:1353843577.744099:0:24544:0:(qmt_entry.c:399:qmt_validate_limits()) Process entered 00040000:00000001:1.0:1353843577.744100:0:24544:0:(qmt_entry.c:403:qmt_validate_limits()) Process leaving (rc=18446744073709551594 : -22 : ffffffffffffffea) 00040000:00000001:1.0:1353843577.744101:0:24544:0:(qmt_handler.c:241:qmt_setquota()) Process leaving via out (rc=18446744073709551594 : -22 : 0xffffffffffffffea) The setquota for quota_2usr failed, because it's trying to set invalid limits (soft > hard), however, we only specified hardlimit in test_7d, which means the softlimit is read from old settings. |
| Comment by Niu Yawei (Inactive) [ 28/Nov/12 ] |
|
http://review.whamcloud.com/4695 The D_QUOTA needs be re-enabled after each mds/ost restart, otherwise, we'll lose the D_QUOTA log. |
| Comment by Peter Jones [ 30/Nov/12 ] |
|
Landed for 2.4 |
| Comment by Niu Yawei (Inactive) [ 02/Dec/12 ] |
|
The landed patch is just another debug patch, I hope to see more information in the future test to see why the softlimit for quota_2usr wasn't 0. |
| Comment by Niu Yawei (Inactive) [ 02/Dec/12 ] |
|
The log shows the limits for quota_2usr(60001) were not zeros (both for block & inode limits): inode limits: 00040000:04000000:0.0:1354380042.909110:0:30239:0:(qmt_handler.c:173:qmt_getquota()) $$$ getquota qmt:lustre-QMT0000 pool:0-md id:60001 enforced:1 hard:1101004 soft:1048576 granted:0 time:0 qunit:262144 edquot:0 may_rel:0 revoke:0 block limits: 00040000:00000001:0.0:1354380042.909123:0:30239:0:(lquota_entry.c:289:lqe_read()) Process entered 00040000:04000000:0.0:1354380042.909124:0:30239:0:(lquota_entry.c:296:lqe_read()) $$$ read qmt:lustre-QMT0000 pool:0-dt id:60001 enforced:0 hard:0 soft:0 granted:0 time:0 qunit:0 edquot:0 may_rel:0 revoke:0 00040000:00000001:0.0:1354380042.909126:0:30239:0:(qmt_entry.c:67:qmt_lqe_read()) Process entered 00040000:00000001:0.0:1354380042.909127:0:30239:0:(lquota_disk.c:579:lquota_disk_read()) Process entered 00000004:00000001:0.0:1354380042.909129:0:30239:0:(osd_handler.c:3034:osd_index_iam_lookup()) Process entered 00000004:00000001:0.0:1354380042.909140:0:30239:0:(osd_handler.c:3078:osd_index_iam_lookup()) Process leaving (rc=1 : 1 : 1) 00040000:00000001:0.0:1354380042.909141:0:30239:0:(lquota_disk.c:590:lquota_disk_read()) Process leaving (rc=0 : 0 : 0) 00040000:04000000:0.0:1354380042.909142:0:30239:0:(qmt_entry.c:100:qmt_lqe_read()) $$$ read qmt:lustre-QMT0000 pool:0-dt id:60001 enforced:1 hard:15173004 soft:14450480 granted:0 time:0 qunit:0 edquot:0 may_rel:0 revoke:0 00040000:00000001:0.0:1354380042.909144:0:30239:0:(qmt_entry.c:101:qmt_lqe_read()) Process leaving (rc=0 : 0 : 0) I really don't know where those limits came from. Another thing worth mentioning is that some test result reporting the backend filesystem is zfs, but the debug log shows it's actually a ldiskfs backend (osd-ldiskfs symbols in the debuglog). Which means all test failures were on ldiskfs backend, I suspect there is some defect lurking in iam? |
| Comment by Andreas Dilger [ 03/Dec/12 ] |
|
The original failure for this patch was:
but all of the recent failures are reporting a different:
Is this original problem fixed? |
| Comment by Niu Yawei (Inactive) [ 03/Dec/12 ] |
|
Hi, Andreas. They are same failure, the original test script doesn't fail test on setquota failure, we changed this in the first debug patch. |
| Comment by Niu Yawei (Inactive) [ 05/Dec/12 ] |
|
http://review.whamcloud.com/4752 The log shows the limits of quota_2usr has been non-zero before the test_7d, unfortunately, we can't see when it becomes non-zero since the log was truncated. To catch more log, this debug patch verifies the limits at the test starting and each end of test. |
| Comment by Niu Yawei (Inactive) [ 07/Dec/12 ] |
|
I see where the limits came from now: it's the left-over from other tests which has quota enabled (see setup_quota of t-f), so we'd just simply clear all the limits before testing: |
| Comment by Niu Yawei (Inactive) [ 09/Dec/12 ] |
|
patch landed for 2.4 |