[LU-1625] Test failure on test suite parallel-scale-nfsv4, subtest test_metabench - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Blocker
Fix Version/s: Lustre 2.3.0, Lustre 2.1.3, Lustre 1.8.9
Affects Version/s: Lustre 2.3.0, Lustre 2.1.3, Lustre 1.8.8
Labels:
None

Severity:
3
Rank (Obsolete):
4488

Description

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/4a115426-cba8-11e1-8847-52540035b04c.

The sub-test test_metabench failed with the following error:

test failed to respond and timed out

From the log, this test took more than 35 minutes before it was ended. I check several pass runs, it usual takes less than 1800s, so the test may just be killed by the system.

Attachments

Issue Links

Trackbacks

Lustre 2.1.3 release testing tracker Lustre 2.1.3 RC1 Tag: v213RC1 Build:

Changelog 2.1 Changes from version 2.1.2 to version 2.1.3 Server support for kernels: 2.6.18308.13.1.el5 (RHEL5) 2.6.32279.2.1.el6 (RHEL6) Client support for unpatched kernels: 2.6.18308.13.1.el5 (RHEL5) 2.6.32279.2.1....

Activity

[LU-1625] Test failure on test suite parallel-scale-nfsv4, subtest test_metabench

Minh Diep added a comment - 10/Aug/12 1:27 AM

patch to reduce parallel-scale for nfs
http://review.whamcloud.com/#change,3596

Minh Diep added a comment - 10/Aug/12 1:27 AM patch to reduce parallel-scale for nfs http://review.whamcloud.com/#change,3596

Andreas Dilger added a comment - 10/Aug/12 12:55 AM

Minh, could you please just change the default compilebench numbers to "2" and "2" for parallel-scale-nfs.sh. This is a trivial change, and reduces the testing time, rather than making it take longer, and I don't think the benefits of testing NFS for such a long time is matched by the number of users who use NFS.

I think this is simply the following:

test_compilebench() {
    export cbench_IDIRS=${cbench_IDIRS:-2}
    export cbench_RUNS=${cbench_RUNS:-2}

    run_compilebench
}       
run_test compilebench "compilebench"

If there are other sub-parts of parallel-scale-nfsv4 that are taking a long time, I think they can also be shortened in a similar manner.

Andreas Dilger added a comment - 10/Aug/12 12:55 AM Minh, could you please just change the default compilebench numbers to "2" and "2" for parallel-scale-nfs.sh. This is a trivial change, and reduces the testing time, rather than making it take longer, and I don't think the benefits of testing NFS for such a long time is matched by the number of users who use NFS. I think this is simply the following: test_compilebench() { export cbench_IDIRS=${cbench_IDIRS:-2} export cbench_RUNS=${cbench_RUNS:-2} run_compilebench } run_test compilebench "compilebench" If there are other sub-parts of parallel-scale-nfsv4 that are taking a long time, I think they can also be shortened in a similar manner.

Sarah Liu added a comment - 09/Aug/12 12:52 PM

Bobi, I think for each test, timeout is set to 3600s, 9499s was the total number for 5 tests

Sarah Liu added a comment - 09/Aug/12 12:52 PM Bobi, I think for each test, timeout is set to 3600s, 9499s was the total number for 5 tests

Zhenyu Xu added a comment - 08/Aug/12 11:17 PM

Sarah,

what the timeout rule for autotest? as in https://maloo.whamcloud.com/test_sessions/3b113b66-e157-11e1-b541-52540035b04c, I saw parallel-scale-nfsv3 can run 9499 seconds, while parallel-scale-nfsv4 timed out in 3600 seconds.

Zhenyu Xu added a comment - 08/Aug/12 11:17 PM Sarah, what the timeout rule for autotest? as in https://maloo.whamcloud.com/test_sessions/3b113b66-e157-11e1-b541-52540035b04c , I saw parallel-scale-nfsv3 can run 9499 seconds, while parallel-scale-nfsv4 timed out in 3600 seconds.

Peter Jones added a comment - 08/Aug/12 1:46 AM

Bobijam will help with this one

Peter Jones added a comment - 08/Aug/12 1:46 AM Bobijam will help with this one

Yang Sheng added a comment - 25/Jul/12 2:50 AM

I suspect this issue relate to some nfs problem. from stacktrace, the nfsv4-svc thread always running same location. I'll do more check for that.

Yang Sheng added a comment - 25/Jul/12 2:50 AM I suspect this issue relate to some nfs problem. from stacktrace, the nfsv4-svc thread always running same location. I'll do more check for that.

Yang Sheng added a comment - 24/Jul/12 12:18 AM

So looks compilebeach works normal. Just metabench was killed by timeout. But got less info from the logs. I'll trying to search other failed instance to investigate.

Yang Sheng added a comment - 24/Jul/12 12:18 AM So looks compilebeach works normal. Just metabench was killed by timeout. But got less info from the logs. I'll trying to search other failed instance to investigate.

Sarah Liu added a comment - 20/Jul/12 12:46 PM

I think yes, the full test suite run by Autotest should set SLOW=yes as usual.

Sarah Liu added a comment - 20/Jul/12 12:46 PM I think yes, the full test suite run by Autotest should set SLOW=yes as usual.

Yang Sheng added a comment - 20/Jul/12 2:47 AM

Looks like this just test timeout. There about 1 hours between compilebeach & metabeach.

Lustre: DEBUG MARKER: == parallel-scale-nfsv4 test compilebench: compilebench == 12:49:59 (1342036199)
Lustre: DEBUG MARKER: == parallel-scale-nfsv4 test metabench: metabench == 13:53:31 (1342040011)

Is the SLOW=yes has seted?

Yang Sheng added a comment - 20/Jul/12 2:47 AM Looks like this just test timeout. There about 1 hours between compilebeach & metabeach. Lustre: DEBUG MARKER: == parallel-scale-nfsv4 test compilebench: compilebench == 12:49:59 (1342036199) Lustre: DEBUG MARKER: == parallel-scale-nfsv4 test metabench: metabench == 13:53:31 (1342040011) Is the SLOW=yes has seted?

Peter Jones added a comment - 13/Jul/12 4:53 PM

Yangsheng

Could you please look into this one?

Thanks

Peter

Peter Jones added a comment - 13/Jul/12 4:53 PM Yangsheng Could you please look into this one? Thanks Peter

People

Assignee:: Keith Mannthey (Inactive)

Reporter:: Maloo

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 12/Jul/12 3:22 AM

Updated:: 22/Feb/13 11:19 AM

Resolved:: 19/Aug/12 4:02 PM