[LU-4306] Test failure sanity-quota test_4a: failed create before timer expired Created: 25/Nov/13 Updated: 01/Feb/16 Resolved: 24/Nov/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.6.0, Lustre 2.8.0, Lustre 2.5.4 |
| Fix Version/s: | Lustre 2.8.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Maloo | Assignee: | Niu Yawei (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 11796 | ||||||||
| Description |
|
This issue was created by maloo for Nathaniel Clark <nathaniel.l.clark@intel.com> This issue relates to the following test suite run: The sub-test test_4a failed with the following error:
Info required for matching: sanity-quota 4a |
| Comments |
| Comment by nasf (Inactive) [ 12/Feb/14 ] |
|
Another failure instance: https://maloo.whamcloud.com/test_sets/24552ba6-93d7-11e3-854a-52540035b04c |
| Comment by Jian Yu [ 27/Mar/14 ] |
|
More instance on master branch: |
| Comment by Nathaniel Clark [ 21/Apr/14 ] |
|
Another instance on master review-dne-part-2: |
| Comment by Cliff White (Inactive) [ 25/Sep/14 ] |
|
Another instance on master: |
| Comment by Jian Yu [ 23/Oct/14 ] |
|
More instances on Lustre b2_5 branch: |
| Comment by Oleg Drokin [ 02/Nov/14 ] |
|
Another one on master: https://testing.hpdd.intel.com/test_sets/ddac071c-62b4-11e4-b9a7-5254006e85c2 |
| Comment by nasf (Inactive) [ 07/Nov/14 ] |
|
Another failure instance on b2_5: |
| Comment by Jian Yu [ 11/Nov/14 ] |
|
More instances on Lustre b2_5 branch: |
| Comment by Bob Glossman (Inactive) [ 30/Dec/14 ] |
|
another on master: |
| Comment by Peter Jones [ 09/Feb/15 ] |
|
Niu It seems that this issue crops up occasionally so it would be good for you to look into it when you have a chance Peter |
| Comment by Frank Zago (Inactive) [ 09/Feb/15 ] |
|
https://testing.hpdd.intel.com/test_sessions/b32b138e-af1c-11e4-98a3-5254006e85c2 |
| Comment by Andreas Dilger [ 09/Feb/15 ] |
|
This failed 12 times over the past two weeks, out of about 690 test runs. It has failed across multiple different configs (review, review-dne-part-2, review-zfs). |
| Comment by Niu Yawei (Inactive) [ 10/Feb/15 ] |
|
I checked one of the log, seems the grace time is too short (5 seconds), so sometimes it has already expired before first create. See this one: 00040000:04000000:1.0:1392173914.897101:0:591:0:(qmt_entry.c:364:qmt_slv_write()) $$$ write slv [0x200000005:0x1017:0x0] granted:11 qmt:lustre-QMT0000 pool:0-md id:60000 enforced:1 hard:0 soft:10 granted:11 time:1392173919 qunit:1024 edquot:0 may_rel:0 revoke:4300823235 00040000:04000000:1.0:1392173914.897121:0:591:0:(qmt_entry.c:258:qmt_glb_write()) $$$ write glb qmt:lustre-QMT0000 pool:0-md id:60000 enforced:1 hard:0 soft:10 granted:11 time:1392173919 qunit:1024 edquot:0 may_rel:0 revoke:4300823235 00040000:04000000:1.0:1392173914.897134:0:591:0:(qmt_handler.c:560:qmt_dqacq0()) $$$ dqacq ends count:1 ver:4 rc:0 qmt:lustre-QMT0000 pool:0-md id:60000 enforced:1 hard:0 soft:10 granted:11 time:1392173919 qunit:1024 edquot:0 may_rel:0 revoke:4300823235 Current time is 1392173914, the test script write 11 files to exceed softlimit, then set the expiration time 5 seconds later 1392173919. 00040000:04000000:1.0:1392173919.551696:0:589:0:(qmt_handler.c:390:qmt_dqacq0()) $$$ dqacq starts uuid:lustre-MDT0000-lwp-MDT0000_UUID flags:0x1 wanted:1 usage:0 qmt:lustre-QMT0000 pool:0-md id:60000 enforced:1 hard:0 soft:10 granted:11 time:1392173919 qunit:1024 edquot:0 may_rel:0 revoke:4300823235 00040000:04000000:1.0:1392173919.551703:0:589:0:(qmt_entry.c:311:qmt_slv_read()) $$$ read slv [0x200000005:0x1017:0x0] qmt:lustre-QMT0000 pool:0-md id:60000 enforced:1 hard:0 soft:10 granted:11 time:1392173919 qunit:1024 edquot:0 may_rel:0 revoke:4300823235 00040000:04000000:1.0:1392173919.551726:0:589:0:(qmt_entry.c:330:qmt_slv_read()) $$$ successful slv read 11 qmt:lustre-QMT0000 pool:0-md id:60000 enforced:1 hard:0 soft:10 granted:11 time:1392173919 qunit:1024 edquot:0 may_rel:0 revoke:4300823235 00040000:04000000:1.0:1392173919.551732:0:589:0:(qmt_entry.c:473:qmt_adjust_edquot()) $$$ changing edquot flag qmt:lustre-QMT0000 pool:0-md id:60000 enforced:1 hard:0 soft:10 granted:11 time:1392173919 qunit:1024 edquot:1 may_rel:0 revoke:4300823235 00040000:04000000:1.0:1392173919.551777:0:589:0:(qmt_handler.c:560:qmt_dqacq0()) $$$ dqacq ends count:0 ver:0 rc:-122 qmt:lustre-QMT0000 pool:0-md id:60000 enforced:1 hard:0 soft:10 granted:11 time:1392173919 qunit:1024 edquot:1 may_rel:0 revoke:4300823235 The "Create before timer goes off" arrived on 1392173919, it's too late, grace has expired. I think we just need to bump the grace time in test script to make it more tolerance on timing. |
| Comment by Gerrit Updater [ 10/Feb/15 ] |
|
Niu Yawei (yawei.niu@intel.com) uploaded a new patch: http://review.whamcloud.com/13704 |
| Comment by Niu Yawei (Inactive) [ 10/Feb/15 ] |
|
patch to master: http://review.whamcloud.com/13704 |
| Comment by Jian Yu [ 27/Feb/15 ] |
|
The same failure occurred on Lustre b2_5 branch: |
| Comment by Gerrit Updater [ 03/Mar/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13704/ |
| Comment by James Nunez (Inactive) [ 22/May/15 ] |
|
Here are two new failures on master with the new grace time: |
| Comment by Peter Jones [ 25/May/15 ] |
|
Landed for 2.8 |
| Comment by Niu Yawei (Inactive) [ 02/Jun/15 ] |
09:35:20:Create files to exceed soft limit 09:35:20:running as uid/gid/euid/egid 60000/60000/60000/60000, groups: 09:35:20: [createmany] [-m] [/mnt/lustre/d4a.sanity-quota/f4a.sanity-quota-0_] [11] 09:35:20:total: 11 creates in 0.02 seconds: 553.32 creates/second 09:35:31:Create file before timer goes off 09:35:31:running as uid/gid/euid/egid 60000/60000/60000/60000, groups: 09:35:31: [touch] [/mnt/lustre/d4a.sanity-quota/f4a.sanity-quota-0_before] 09:35:31:touch: cannot touch `/mnt/lustre/d4a.sanity-quota/f4a.sanity-quota-0_before': Disk quota exceeded Look at one of the failure on master, creating 11 files finished quickly, however "sync; sleep1; sync;" took 11 seconds, which made the grace time expired. I'll look closer to see why sync took such a long time. |
| Comment by James Nunez (Inactive) [ 03/Jun/15 ] |
|
I think we're still seeing this issue or something that looks like it: |
| Comment by nasf (Inactive) [ 06/Jun/15 ] |
|
Hit it on b2_7_fe: on b_ieel3_0: |
| Comment by Sarah Liu [ 01/Jul/15 ] |
|
hit this issue in interop testing between master server and 2.7.0 client: https://testing.hpdd.intel.com/test_sets/0f689a1a-120d-11e5-a29c-5254006e85c2 |
| Comment by Gerrit Updater [ 03/Jul/15 ] |
|
Niu Yawei (yawei.niu@intel.com) uploaded a new patch: http://review.whamcloud.com/15483 |
| Comment by Gerrit Updater [ 10/Jul/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/15483/ |
| Comment by Peter Jones [ 10/Jul/15 ] |
|
Landed for 2.8 |
| Comment by Saurabh Tandan (Inactive) [ 28/Oct/15 ] |
|
Encountered another instance for tag 2.7.62 for interop EL7 Server/2.7.0 Client: https://testing.hpdd.intel.com/test_sets/70b97be0-7b3a-11e5-8b50-5254006e85c2 |
| Comment by Peter Jones [ 24/Nov/15 ] |
|
Saurabh Could you please open a new ticket to track the interop issue Thanks Peter |
| Comment by Saurabh Tandan (Inactive) [ 25/Nov/15 ] |
|
|