[LU-1209] sanity.sh subtest test_133d failed with "samedir_rename_size count error" Created: 13/Mar/12 Updated: 16/Apr/14 Resolved: 16/Apr/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.1, Lustre 2.6.0 |
| Fix Version/s: | Lustre 2.3.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Maloo | Assignee: | Di Wang |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | yuc2 | ||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 4079 | ||||||||||||||||
| Description |
|
This issue was created by maloo for Andreas Dilger <adilger@whamcloud.com> This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/025d7aac-6ce3-11e1-9174-5254004bbbd3. This subtest is reporting 14% failure over the past 100 runs, so there must be some race or unexpected result causing the failure. The sub-test test_133d failed with the following error:
Info required for matching: sanity 133d |
| Comments |
| Comment by Andreas Dilger [ 13/Mar/12 ] |
|
Di, can you please take a look into why this is failing. It would also be good to search in Maloo (Results->Search->Subtests) for other cases of this test 133d failure, mark them with this bug, and add the test URLs here for future reference. |
| Comment by Andreas Dilger [ 13/Mar/12 ] |
|
It seems this may relate to a test script interop issue. I noticed that all if the test failures are taking about twice as long as the passes, but this may also relate to improvements from the 2.2 pdirops. |
| Comment by Di Wang [ 14/Mar/12 ] |
|
I check the maloo results, it seems others is related with 1193. But for this one, client and server are running the same version. Unfortunately, the log is not enough for me to figure out the reason. I will add more info in the test. And I also make a patch to check lustre version to check whether the server is capable to run some tests as you said in 1193. Please check. http://review.whamcloud.com/#change,2309 |
| Comment by Oleg Drokin [ 27/Apr/12 ] |
|
another occurence in https://maloo.whamcloud.com/test_sets/c00eebb6-9039-11e1-98a1-525400d2bfa6 with available logs seems that there test also failed that was the root error, but cannot see if the original report was about the same issue or not due to no logs. In any case it might be related |
| Comment by Di Wang [ 28/Apr/12 ] |
|
Oleg, could you please land http://review.whamcloud.com/#change,2309 ? So I can have more info here. Thanks. |
| Comment by Build Master (Inactive) [ 29/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 29/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 29/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 29/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 29/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 29/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 29/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 29/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 29/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 29/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 29/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 29/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 29/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 29/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 29/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 29/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 29/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Ian Colle (Inactive) [ 03/Jul/12 ] |
|
Happened again: https://maloo.whamcloud.com/test_sets/63204c18-c4d4-11e1-af06-52540035b04c |
| Comment by Andreas Dilger [ 07/Jul/12 ] |
|
Again: https://maloo.whamcloud.com/test_sets/2d51efd8-c7fe-11e1-ba35-52540035b04c It is currently reporting a 29% failure rate in Maloo. |
| Comment by Andreas Dilger [ 07/Jul/12 ] |
== sanity test 133d: Verifying rename_stats ====== 19:02:48 (1341626568)
mdt.lustre-MDT0000.rename_stats
mdt.lustre-MDT0000.rename_stats=clear
total: 512 creates in 0.91 seconds: 565.37 creates/second
source rename dir size: 32K
target rename dir size: 4K
mdt.lustre-MDT0000.rename_stats=
rename_stats:
- snapshot_time: 1341626570.213120
- same_dir
64KB: { sample: 1, pct: 100, cum_pct: 100 }
/usr/lib64/lustre/tests/sanity.sh: line 7552: [: : integer expression expected
sanity test_133d: @@@@@@ FAIL: samedir_rename_size error
|
| Comment by Di Wang [ 07/Jul/12 ] |
|
Hmm, the dir size is 32K, but somehow it record rename under 64K size. it seems dir size is not consistency during the whole process. |
| Comment by Di Wang [ 07/Jul/12 ] |
|
Ah, we should get dir size after rename, since rename will change the dir size. Here is the fix http://review.whamcloud.com/#change,3298 |
| Comment by Jodi Levi (Inactive) [ 27/Sep/12 ] |
|
Please reopen ticket if additional work is needed. |
| Comment by Sarah Liu [ 08/Apr/13 ] |
|
Hit this issue again when upgrade from 1.8.9 to 2.4 and then add one new MDT: https://maloo.whamcloud.com/test_sets/a02cc9b2-9ec5-11e2-975f-52540035b04c |
| Comment by Li Wei (Inactive) [ 10/Apr/13 ] |
|
https://maloo.whamcloud.com/test_sets/ecc68362-a14f-11e2-b1c3-52540035b04c |
| Comment by Jian Yu [ 13/Dec/13 ] |
The above patch exists on Lustre b2_4 branch build #67. However, the test still failed: |
| Comment by Nathaniel Clark [ 03/Mar/14 ] |
|
Hit issue on review-zfs on master (pre 2.6): |
| Comment by James Nunez (Inactive) [ 15/Apr/14 ] |
|
Hit this on review-ldiskfs https://maloo.whamcloud.com/test_sets/ecfe391a-c41c-11e3-a793-52540035b04c |
| Comment by Bob Glossman (Inactive) [ 15/Apr/14 ] |
|
another, in review_dne_part-1 |
| Comment by Andreas Dilger [ 16/Apr/14 ] |
|
The failures recently reported against this bug are actually caused by http://review.whamcloud.com/7803 landing (incorrectly allowed by TEI-1508). Oleg has submitted http://review.whamcloud.com/9978 to fix the regression. I've been marking all related failures with |