[LU-1625] Test failure on test suite parallel-scale-nfsv4, subtest test_metabench Created: 12/Jul/12 Updated: 22/Feb/13 Resolved: 19/Aug/12 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.3.0, Lustre 2.1.3, Lustre 1.8.8 |
| Fix Version/s: | Lustre 2.3.0, Lustre 2.1.3, Lustre 1.8.9 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Maloo | Assignee: | Keith Mannthey (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 4488 |
| Description |
|
This issue was created by maloo for sarah <sarah@whamcloud.com> This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/4a115426-cba8-11e1-8847-52540035b04c. The sub-test test_metabench failed with the following error:
From the log, this test took more than 35 minutes before it was ended. I check several pass runs, it usual takes less than 1800s, so the test may just be killed by the system. |
| Comments |
| Comment by Peter Jones [ 13/Jul/12 ] |
|
Yangsheng Could you please look into this one? Thanks Peter |
| Comment by Yang Sheng [ 20/Jul/12 ] |
|
Looks like this just test timeout. There about 1 hours between compilebeach & metabeach. Lustre: DEBUG MARKER: == parallel-scale-nfsv4 test compilebench: compilebench == 12:49:59 (1342036199) Lustre: DEBUG MARKER: == parallel-scale-nfsv4 test metabench: metabench == 13:53:31 (1342040011) Is the SLOW=yes has seted? |
| Comment by Sarah Liu [ 20/Jul/12 ] |
|
I think yes, the full test suite run by Autotest should set SLOW=yes as usual. |
| Comment by Yang Sheng [ 24/Jul/12 ] |
|
So looks compilebeach works normal. Just metabench was killed by timeout. But got less info from the logs. I'll trying to search other failed instance to investigate. |
| Comment by Yang Sheng [ 25/Jul/12 ] |
|
I suspect this issue relate to some nfs problem. from stacktrace, the nfsv4-svc thread always running same location. I'll do more check for that. |
| Comment by Peter Jones [ 08/Aug/12 ] |
|
Bobijam will help with this one |
| Comment by Zhenyu Xu [ 08/Aug/12 ] |
|
Sarah, what the timeout rule for autotest? as in https://maloo.whamcloud.com/test_sessions/3b113b66-e157-11e1-b541-52540035b04c, I saw parallel-scale-nfsv3 can run 9499 seconds, while parallel-scale-nfsv4 timed out in 3600 seconds. |
| Comment by Sarah Liu [ 09/Aug/12 ] |
|
Bobi, I think for each test, timeout is set to 3600s, 9499s was the total number for 5 tests |
| Comment by Andreas Dilger [ 10/Aug/12 ] |
|
Minh, could you please just change the default compilebench numbers to "2" and "2" for parallel-scale-nfs.sh. This is a trivial change, and reduces the testing time, rather than making it take longer, and I don't think the benefits of testing NFS for such a long time is matched by the number of users who use NFS. I think this is simply the following: test_compilebench() {
export cbench_IDIRS=${cbench_IDIRS:-2}
export cbench_RUNS=${cbench_RUNS:-2}
run_compilebench
}
run_test compilebench "compilebench"
If there are other sub-parts of parallel-scale-nfsv4 that are taking a long time, I think they can also be shortened in a similar manner. |
| Comment by Minh Diep [ 10/Aug/12 ] |
|
patch to reduce parallel-scale for nfs |
| Comment by Jian Yu [ 13/Aug/12 ] |
|
RHEL6.3/x86_64 (2.1.3 Server + 1.8.8-wc1 Client): |
| Comment by Peter Jones [ 15/Aug/12 ] |
|
Patch landed for 2.1.3 and 2.3. If there are still issues with this with Minh's changes in place then please reopen |
| Comment by Keith Mannthey (Inactive) [ 16/Aug/12 ] |
|
One extra change is need. I missed part of the needed patch. http://review.whamcloud.com/3701 has been pushed to fix this issue. |
| Comment by Peter Jones [ 19/Aug/12 ] |
|
Extra tweak landed too |
| Comment by Emoly Liu [ 03/Jan/13 ] |
|
Patch for b1_8 is at http://review.whamcloud.com/4949 |