[LU-71] metabench failures Created: 09/Feb/11 Updated: 19/Mar/11 Resolved: 19/Mar/11 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.0.0, Lustre 2.1.0 |
| Fix Version/s: | Lustre 2.1.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Oleg Drokin | Assignee: | nasf (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Bugzilla ID: | 20,581 |
| Rank (Obsolete): | 4247 |
| Description |
|
Originally the bugzilla bug was about two separate issues as I understand it, but now it mostly revolves around a hash collision issue. |
| Comments |
| Comment by nasf (Inactive) [ 01/Mar/11 ] |
|
The new patch needs to be verified on Hyperion before landed to lustre-2.1: |
| Comment by Cliff White (Inactive) [ 07/Mar/11 ] |
|
Patch is under test on Hyperion - hyperion-sanity results: 000: Table of 824 tasks with up to 4097 system calls 000: Table of 824 tasks with up to 65 system calls IOR file-per-process independent single-shared-file single-shared-file independent These results are comparable to previous 2.1 runs Mar 6 00:00:13 ehyperion571 mrshd[19987]: root@ehyperion0 as root: cmd='rdistd -S' |
| Comment by Cliff White (Inactive) [ 08/Mar/11 ] |
|
Okay the second pass of the test has also failed metabench, with the same failure - this may be a cause for concern. metabench -w /p/l_wham/white215/hyperion.14374/metabench -k -c 16384 -C -z 2097152 785.6744 2669.24 152 0.0464 3272.76 [03/07/2011 22:28:48] Leaving time_file_creation with proc_id = 823 Client errors: MDS hyperion720 No errors on any OSTs. |
| Comment by nasf (Inactive) [ 08/Mar/11 ] |
|
Thanks Cliff. From the test result we can say that the patch works as we expected. The failure case you attached is for test without my patch, right? > Client errors: It is corresponding to original Lustre code without my patch. |
| Comment by Cliff White (Inactive) [ 08/Mar/11 ] |
|
no, the failure case I attached is with your patch – |
| Comment by Cliff White (Inactive) [ 08/Mar/11 ] |
|
I took the RPMs from build 347 – |
| Comment by nasf (Inactive) [ 08/Mar/11 ] |
|
Very strange, according to the error message, the line corresponding to "(dir.c:316:ll_get_dir_page()) dir page locate:" is that: page = ll_dir_page_locate(dir, &lhash, &start, &end); Such code section is just line 316 for original master without patch. For patched master, it is line 340. I have checked the source code for build 347 you used. http://build.whamcloud.com/job/reviews-centos5/347/ So would you please to login ehyperion571 to check the kernel version for further confirm. Thanks! |
| Comment by Cliff White (Inactive) [ 10/Mar/11 ] |
|
There was a mistake in the kernel build. NERSC Time: 4735.13 |
| Comment by nasf (Inactive) [ 10/Mar/11 ] |
|
Thanks Cliff, this bug has blocked us for a long time. It is really helpful. |
| Comment by nasf (Inactive) [ 19/Mar/11 ] |
|
patch has been merged into lustre-2.1 candidate. |