[LU-4306] Test failure sanity-quota test_4a: failed create before timer expired Created: 25/Nov/13  Updated: 01/Feb/16  Resolved: 24/Nov/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.6.0, Lustre 2.8.0, Lustre 2.5.4
Fix Version/s: Lustre 2.8.0

Type: Bug Priority: Major
Reporter: Maloo Assignee: Niu Yawei (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-7477 Interop master<->2.7.0 : sanity-quota... Resolved
Severity: 3
Rank (Obsolete): 11796

 Description   

This issue was created by maloo for Nathaniel Clark <nathaniel.l.clark@intel.com>

This issue relates to the following test suite run:
http://maloo.whamcloud.com/test_sets/132128a6-4dd9-11e3-a167-52540035b04c
http://maloo.whamcloud.com/test_sets/b7421fca-5096-11e3-b42b-52540035b04c

The sub-test test_4a failed with the following error:

failed create before timer expired, but expect success

Info required for matching: sanity-quota 4a



 Comments   
Comment by nasf (Inactive) [ 12/Feb/14 ]

Another failure instance:

https://maloo.whamcloud.com/test_sets/24552ba6-93d7-11e3-854a-52540035b04c

Comment by Jian Yu [ 27/Mar/14 ]

More instance on master branch:
https://maloo.whamcloud.com/test_sets/843d939a-b575-11e3-8664-52540035b04c

Comment by Nathaniel Clark [ 21/Apr/14 ]

Another instance on master review-dne-part-2:
https://maloo.whamcloud.com/test_sets/b0cd4760-c21e-11e3-8030-52540035b04c

Comment by Cliff White (Inactive) [ 25/Sep/14 ]

Another instance on master:
https://testing.hpdd.intel.com/test_sets/3916406a-4495-11e4-943e-5254006e85c2

Comment by Jian Yu [ 23/Oct/14 ]

More instances on Lustre b2_5 branch:
https://testing.hpdd.intel.com/test_sets/6a32ffe0-5a92-11e4-969e-5254006e85c2
https://testing.hpdd.intel.com/test_sets/933cf168-5d74-11e4-89c8-5254006e85c2

Comment by Oleg Drokin [ 02/Nov/14 ]

Another one on master: https://testing.hpdd.intel.com/test_sets/ddac071c-62b4-11e4-b9a7-5254006e85c2

Comment by nasf (Inactive) [ 07/Nov/14 ]

Another failure instance on b2_5:
https://testing.hpdd.intel.com/test_sets/4f235fd2-65c9-11e4-91f3-5254006e85c2

Comment by Jian Yu [ 11/Nov/14 ]

More instances on Lustre b2_5 branch:
https://testing.hpdd.intel.com/test_sets/a36550b2-68a5-11e4-a63a-5254006e85c2
https://testing.hpdd.intel.com/test_sets/4f235fd2-65c9-11e4-91f3-5254006e85c2
https://testing.hpdd.intel.com/test_sets/6116285c-625c-11e4-bd8b-5254006e85c2
https://testing.hpdd.intel.com/test_sets/7c5e513e-609f-11e4-b952-5254006e85c2
https://testing.hpdd.intel.com/test_sets/b0bf01e6-48c9-11e4-be93-5254006e85c2

Comment by Bob Glossman (Inactive) [ 30/Dec/14 ]

another on master:
https://testing.hpdd.intel.com/test_sets/eb85e9a4-8fe4-11e4-92df-5254006e85c2

Comment by Peter Jones [ 09/Feb/15 ]

Niu

It seems that this issue crops up occasionally so it would be good for you to look into it when you have a chance

Peter

Comment by Frank Zago (Inactive) [ 09/Feb/15 ]

https://testing.hpdd.intel.com/test_sessions/b32b138e-af1c-11e4-98a3-5254006e85c2

Comment by Andreas Dilger [ 09/Feb/15 ]

This failed 12 times over the past two weeks, out of about 690 test runs. It has failed across multiple different configs (review, review-dne-part-2, review-zfs).

Comment by Niu Yawei (Inactive) [ 10/Feb/15 ]

I checked one of the log, seems the grace time is too short (5 seconds), so sometimes it has already expired before first create.

See this one:

00040000:04000000:1.0:1392173914.897101:0:591:0:(qmt_entry.c:364:qmt_slv_write()) $$$ write slv [0x200000005:0x1017:0x0] granted:11 qmt:lustre-QMT0000 pool:0-md id:60000 enforced:1 hard:0 soft:10 granted:11 time:1392173919 qunit:1024 edquot:0 may_rel:0 revoke:4300823235
00040000:04000000:1.0:1392173914.897121:0:591:0:(qmt_entry.c:258:qmt_glb_write()) $$$ write glb qmt:lustre-QMT0000 pool:0-md id:60000 enforced:1 hard:0 soft:10 granted:11 time:1392173919 qunit:1024 edquot:0 may_rel:0 revoke:4300823235
00040000:04000000:1.0:1392173914.897134:0:591:0:(qmt_handler.c:560:qmt_dqacq0()) $$$ dqacq ends count:1 ver:4 rc:0 qmt:lustre-QMT0000 pool:0-md id:60000 enforced:1 hard:0 soft:10 granted:11 time:1392173919 qunit:1024 edquot:0 may_rel:0 revoke:4300823235

Current time is 1392173914, the test script write 11 files to exceed softlimit, then set the expiration time 5 seconds later 1392173919.

00040000:04000000:1.0:1392173919.551696:0:589:0:(qmt_handler.c:390:qmt_dqacq0()) $$$ dqacq starts uuid:lustre-MDT0000-lwp-MDT0000_UUID flags:0x1 wanted:1 usage:0 qmt:lustre-QMT0000 pool:0-md id:60000 enforced:1 hard:0 soft:10 granted:11 time:1392173919 qunit:1024 edquot:0 may_rel:0 revoke:4300823235
00040000:04000000:1.0:1392173919.551703:0:589:0:(qmt_entry.c:311:qmt_slv_read()) $$$ read slv [0x200000005:0x1017:0x0] qmt:lustre-QMT0000 pool:0-md id:60000 enforced:1 hard:0 soft:10 granted:11 time:1392173919 qunit:1024 edquot:0 may_rel:0 revoke:4300823235
00040000:04000000:1.0:1392173919.551726:0:589:0:(qmt_entry.c:330:qmt_slv_read()) $$$ successful slv read 11 qmt:lustre-QMT0000 pool:0-md id:60000 enforced:1 hard:0 soft:10 granted:11 time:1392173919 qunit:1024 edquot:0 may_rel:0 revoke:4300823235
00040000:04000000:1.0:1392173919.551732:0:589:0:(qmt_entry.c:473:qmt_adjust_edquot()) $$$ changing edquot flag qmt:lustre-QMT0000 pool:0-md id:60000 enforced:1 hard:0 soft:10 granted:11 time:1392173919 qunit:1024 edquot:1 may_rel:0 revoke:4300823235
00040000:04000000:1.0:1392173919.551777:0:589:0:(qmt_handler.c:560:qmt_dqacq0()) $$$ dqacq ends count:0 ver:0 rc:-122 qmt:lustre-QMT0000 pool:0-md id:60000 enforced:1 hard:0 soft:10 granted:11 time:1392173919 qunit:1024 edquot:1 may_rel:0 revoke:4300823235

The "Create before timer goes off" arrived on 1392173919, it's too late, grace has expired.

I think we just need to bump the grace time in test script to make it more tolerance on timing.

Comment by Gerrit Updater [ 10/Feb/15 ]

Niu Yawei (yawei.niu@intel.com) uploaded a new patch: http://review.whamcloud.com/13704
Subject: LU-4306 test: bump grace time in test_4a of s-q
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 0e53e705c72fa8b6db9be997c075a9429d0b4f37

Comment by Niu Yawei (Inactive) [ 10/Feb/15 ]

patch to master: http://review.whamcloud.com/13704

Comment by Jian Yu [ 27/Feb/15 ]

The same failure occurred on Lustre b2_5 branch:
https://testing.hpdd.intel.com/test_sets/ffcc11bc-be1b-11e4-82d1-5254006e85c2

Comment by Gerrit Updater [ 03/Mar/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13704/
Subject: LU-4306 test: bump grace time in test_4a of s-q
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 5641279101af968b88e9929c4b9d79f9bb15a6ac

Comment by James Nunez (Inactive) [ 22/May/15 ]

Here are two new failures on master with the new grace time:
review-dne-part-2 - 2015-05-22 09:30:20 - https://testing.hpdd.intel.com/test_sets/78093098-00a1-11e5-9650-5254006e85c2
review-dne-part-2 - 2015-05-22 06:54:44 - https://testing.hpdd.intel.com/test_sets/4b94c754-008b-11e5-9650-5254006e85c2

Comment by Peter Jones [ 25/May/15 ]

Landed for 2.8

Comment by Niu Yawei (Inactive) [ 02/Jun/15 ]
09:35:20:Create files to exceed soft limit
09:35:20:running as uid/gid/euid/egid 60000/60000/60000/60000, groups:
09:35:20: [createmany] [-m] [/mnt/lustre/d4a.sanity-quota/f4a.sanity-quota-0_] [11]
09:35:20:total: 11 creates in 0.02 seconds: 553.32 creates/second
09:35:31:Create file before timer goes off
09:35:31:running as uid/gid/euid/egid 60000/60000/60000/60000, groups:
09:35:31: [touch] [/mnt/lustre/d4a.sanity-quota/f4a.sanity-quota-0_before]
09:35:31:touch: cannot touch `/mnt/lustre/d4a.sanity-quota/f4a.sanity-quota-0_before': Disk quota exceeded

Look at one of the failure on master, creating 11 files finished quickly, however "sync; sleep1; sync;" took 11 seconds, which made the grace time expired. I'll look closer to see why sync took such a long time.

Comment by James Nunez (Inactive) [ 03/Jun/15 ]

I think we're still seeing this issue or something that looks like it:
2015-06-02 03:56:00 - https://testing.hpdd.intel.com/test_sets/4d2dad90-0916-11e5-9307-5254006e85c2

Comment by nasf (Inactive) [ 06/Jun/15 ]

Hit it on b2_7_fe:
https://testing.hpdd.intel.com/test_sets/82f6b070-0b5f-11e5-b33d-5254006e85c2

on b_ieel3_0:
https://testing.hpdd.intel.com/test_sets/795365ba-0b03-11e5-982e-5254006e85c2

Comment by Sarah Liu [ 01/Jul/15 ]

hit this issue in interop testing between master server and 2.7.0 client:

https://testing.hpdd.intel.com/test_sets/0f689a1a-120d-11e5-a29c-5254006e85c2

Comment by Gerrit Updater [ 03/Jul/15 ]

Niu Yawei (yawei.niu@intel.com) uploaded a new patch: http://review.whamcloud.com/15483
Subject: LU-4306 tests: improve test_4a of s-q
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: f45b3300ca04f213575020f892cce6ba1b2e577f

Comment by Gerrit Updater [ 10/Jul/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/15483/
Subject: LU-4306 tests: improve test_4a of s-q
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: a838fe8b43a8e3afb35d4854745b4876726f82c5

Comment by Peter Jones [ 10/Jul/15 ]

Landed for 2.8

Comment by Saurabh Tandan (Inactive) [ 28/Oct/15 ]

Encountered another instance for tag 2.7.62 for interop EL7 Server/2.7.0 Client: https://testing.hpdd.intel.com/test_sets/70b97be0-7b3a-11e5-8b50-5254006e85c2

Comment by Peter Jones [ 24/Nov/15 ]

Saurabh

Could you please open a new ticket to track the interop issue

Thanks

Peter

Comment by Saurabh Tandan (Inactive) [ 25/Nov/15 ]

LU-7477 has been created to track the interop issue.

Generated at Sat Feb 10 01:41:32 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.