Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1625

Test failure on test suite parallel-scale-nfsv4, subtest test_metabench

Details

    • 3
    • 4488

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/4a115426-cba8-11e1-8847-52540035b04c.

      The sub-test test_metabench failed with the following error:

      test failed to respond and timed out

      From the log, this test took more than 35 minutes before it was ended. I check several pass runs, it usual takes less than 1800s, so the test may just be killed by the system.

      Attachments

        Issue Links

          Activity

            [LU-1625] Test failure on test suite parallel-scale-nfsv4, subtest test_metabench
            mdiep Minh Diep added a comment -

            patch to reduce parallel-scale for nfs
            http://review.whamcloud.com/#change,3596

            mdiep Minh Diep added a comment - patch to reduce parallel-scale for nfs http://review.whamcloud.com/#change,3596

            Minh, could you please just change the default compilebench numbers to "2" and "2" for parallel-scale-nfs.sh. This is a trivial change, and reduces the testing time, rather than making it take longer, and I don't think the benefits of testing NFS for such a long time is matched by the number of users who use NFS.

            I think this is simply the following:

            test_compilebench() {
                export cbench_IDIRS=${cbench_IDIRS:-2}
                export cbench_RUNS=${cbench_RUNS:-2}
            
                run_compilebench
            }       
            run_test compilebench "compilebench"
            

            If there are other sub-parts of parallel-scale-nfsv4 that are taking a long time, I think they can also be shortened in a similar manner.

            adilger Andreas Dilger added a comment - Minh, could you please just change the default compilebench numbers to "2" and "2" for parallel-scale-nfs.sh. This is a trivial change, and reduces the testing time, rather than making it take longer, and I don't think the benefits of testing NFS for such a long time is matched by the number of users who use NFS. I think this is simply the following: test_compilebench() { export cbench_IDIRS=${cbench_IDIRS:-2} export cbench_RUNS=${cbench_RUNS:-2} run_compilebench } run_test compilebench "compilebench" If there are other sub-parts of parallel-scale-nfsv4 that are taking a long time, I think they can also be shortened in a similar manner.
            sarah Sarah Liu added a comment -

            Bobi, I think for each test, timeout is set to 3600s, 9499s was the total number for 5 tests

            sarah Sarah Liu added a comment - Bobi, I think for each test, timeout is set to 3600s, 9499s was the total number for 5 tests
            bobijam Zhenyu Xu added a comment -

            Sarah,

            what the timeout rule for autotest? as in https://maloo.whamcloud.com/test_sessions/3b113b66-e157-11e1-b541-52540035b04c, I saw parallel-scale-nfsv3 can run 9499 seconds, while parallel-scale-nfsv4 timed out in 3600 seconds.

            bobijam Zhenyu Xu added a comment - Sarah, what the timeout rule for autotest? as in https://maloo.whamcloud.com/test_sessions/3b113b66-e157-11e1-b541-52540035b04c , I saw parallel-scale-nfsv3 can run 9499 seconds, while parallel-scale-nfsv4 timed out in 3600 seconds.
            pjones Peter Jones added a comment -

            Bobijam will help with this one

            pjones Peter Jones added a comment - Bobijam will help with this one
            ys Yang Sheng added a comment -

            I suspect this issue relate to some nfs problem. from stacktrace, the nfsv4-svc thread always running same location. I'll do more check for that.

            ys Yang Sheng added a comment - I suspect this issue relate to some nfs problem. from stacktrace, the nfsv4-svc thread always running same location. I'll do more check for that.
            ys Yang Sheng added a comment -

            So looks compilebeach works normal. Just metabench was killed by timeout. But got less info from the logs. I'll trying to search other failed instance to investigate.

            ys Yang Sheng added a comment - So looks compilebeach works normal. Just metabench was killed by timeout. But got less info from the logs. I'll trying to search other failed instance to investigate.
            sarah Sarah Liu added a comment -

            I think yes, the full test suite run by Autotest should set SLOW=yes as usual.

            sarah Sarah Liu added a comment - I think yes, the full test suite run by Autotest should set SLOW=yes as usual.
            ys Yang Sheng added a comment -

            Looks like this just test timeout. There about 1 hours between compilebeach & metabeach.

            Lustre: DEBUG MARKER: == parallel-scale-nfsv4 test compilebench: compilebench == 12:49:59 (1342036199)
            Lustre: DEBUG MARKER: == parallel-scale-nfsv4 test metabench: metabench == 13:53:31 (1342040011)
            

            Is the SLOW=yes has seted?

            ys Yang Sheng added a comment - Looks like this just test timeout. There about 1 hours between compilebeach & metabeach. Lustre: DEBUG MARKER: == parallel-scale-nfsv4 test compilebench: compilebench == 12:49:59 (1342036199) Lustre: DEBUG MARKER: == parallel-scale-nfsv4 test metabench: metabench == 13:53:31 (1342040011) Is the SLOW=yes has seted?
            pjones Peter Jones added a comment -

            Yangsheng

            Could you please look into this one?

            Thanks

            Peter

            pjones Peter Jones added a comment - Yangsheng Could you please look into this one? Thanks Peter

            People

              keith Keith Mannthey (Inactive)
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: