Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1625

Test failure on test suite parallel-scale-nfsv4, subtest test_metabench

Details

    • 3
    • 4488

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/4a115426-cba8-11e1-8847-52540035b04c.

      The sub-test test_metabench failed with the following error:

      test failed to respond and timed out

      From the log, this test took more than 35 minutes before it was ended. I check several pass runs, it usual takes less than 1800s, so the test may just be killed by the system.

      Attachments

        Issue Links

          Activity

            [LU-1625] Test failure on test suite parallel-scale-nfsv4, subtest test_metabench
            emoly.liu Emoly Liu added a comment -

            Patch for b1_8 is at http://review.whamcloud.com/4949

            emoly.liu Emoly Liu added a comment - Patch for b1_8 is at http://review.whamcloud.com/4949
            pjones Peter Jones added a comment -

            Extra tweak landed too

            pjones Peter Jones added a comment - Extra tweak landed too

            One extra change is need. I missed part of the needed patch. http://review.whamcloud.com/3701 has been pushed to fix this issue.

            keith Keith Mannthey (Inactive) added a comment - One extra change is need. I missed part of the needed patch. http://review.whamcloud.com/3701 has been pushed to fix this issue.
            pjones Peter Jones added a comment -

            Patch landed for 2.1.3 and 2.3. If there are still issues with this with Minh's changes in place then please reopen

            pjones Peter Jones added a comment - Patch landed for 2.1.3 and 2.3. If there are still issues with this with Minh's changes in place then please reopen
            yujian Jian Yu added a comment -

            RHEL6.3/x86_64 (2.1.3 Server + 1.8.8-wc1 Client):
            https://maloo.whamcloud.com/test_sets/7422aff4-e42a-11e1-b6d3-52540035b04c

            yujian Jian Yu added a comment - RHEL6.3/x86_64 (2.1.3 Server + 1.8.8-wc1 Client): https://maloo.whamcloud.com/test_sets/7422aff4-e42a-11e1-b6d3-52540035b04c
            mdiep Minh Diep added a comment -

            patch to reduce parallel-scale for nfs
            http://review.whamcloud.com/#change,3596

            mdiep Minh Diep added a comment - patch to reduce parallel-scale for nfs http://review.whamcloud.com/#change,3596

            Minh, could you please just change the default compilebench numbers to "2" and "2" for parallel-scale-nfs.sh. This is a trivial change, and reduces the testing time, rather than making it take longer, and I don't think the benefits of testing NFS for such a long time is matched by the number of users who use NFS.

            I think this is simply the following:

            test_compilebench() {
                export cbench_IDIRS=${cbench_IDIRS:-2}
                export cbench_RUNS=${cbench_RUNS:-2}
            
                run_compilebench
            }       
            run_test compilebench "compilebench"
            

            If there are other sub-parts of parallel-scale-nfsv4 that are taking a long time, I think they can also be shortened in a similar manner.

            adilger Andreas Dilger added a comment - Minh, could you please just change the default compilebench numbers to "2" and "2" for parallel-scale-nfs.sh. This is a trivial change, and reduces the testing time, rather than making it take longer, and I don't think the benefits of testing NFS for such a long time is matched by the number of users who use NFS. I think this is simply the following: test_compilebench() { export cbench_IDIRS=${cbench_IDIRS:-2} export cbench_RUNS=${cbench_RUNS:-2} run_compilebench } run_test compilebench "compilebench" If there are other sub-parts of parallel-scale-nfsv4 that are taking a long time, I think they can also be shortened in a similar manner.
            sarah Sarah Liu added a comment -

            Bobi, I think for each test, timeout is set to 3600s, 9499s was the total number for 5 tests

            sarah Sarah Liu added a comment - Bobi, I think for each test, timeout is set to 3600s, 9499s was the total number for 5 tests
            bobijam Zhenyu Xu added a comment -

            Sarah,

            what the timeout rule for autotest? as in https://maloo.whamcloud.com/test_sessions/3b113b66-e157-11e1-b541-52540035b04c, I saw parallel-scale-nfsv3 can run 9499 seconds, while parallel-scale-nfsv4 timed out in 3600 seconds.

            bobijam Zhenyu Xu added a comment - Sarah, what the timeout rule for autotest? as in https://maloo.whamcloud.com/test_sessions/3b113b66-e157-11e1-b541-52540035b04c , I saw parallel-scale-nfsv3 can run 9499 seconds, while parallel-scale-nfsv4 timed out in 3600 seconds.
            pjones Peter Jones added a comment -

            Bobijam will help with this one

            pjones Peter Jones added a comment - Bobijam will help with this one
            ys Yang Sheng added a comment -

            I suspect this issue relate to some nfs problem. from stacktrace, the nfsv4-svc thread always running same location. I'll do more check for that.

            ys Yang Sheng added a comment - I suspect this issue relate to some nfs problem. from stacktrace, the nfsv4-svc thread always running same location. I'll do more check for that.

            People

              keith Keith Mannthey (Inactive)
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: