Details

    • 3
    • 14015

    Description

      This issue was created by maloo for Nathaniel Clark <nathaniel.l.clark@intel.com>

      This issue relates to the following test suite run:
      http://maloo.whamcloud.com/test_sets/7f09a2f6-dd9d-11e3-9262-52540035b04c
      https://maloo.whamcloud.com/test_sets/99ea9712-dc88-11e3-9450-52540035b04c

      The sub-test test_47 failed with the following error:

      test failed to respond and timed out

      Info required for matching: conf-sanity 47

      Attachments

        Issue Links

          Activity

            [LU-5079] conf-sanity test_47 timeout

            Alexander, could you please take a look at LU-5900, LU-5901, LU-5902 failures? This looks to be caused by the http://review.whamcloud.com/11213 patch backported to b2_5 (http://review.whamcloud.com/12365). Is this something that is just causing the tests to fail that won't affect real users, or is there something bad in b2_5 that will cause recovery problems for users also?

            adilger Andreas Dilger added a comment - Alexander, could you please take a look at LU-5900 , LU-5901 , LU-5902 failures? This looks to be caused by the http://review.whamcloud.com/11213 patch backported to b2_5 ( http://review.whamcloud.com/12365 ). Is this something that is just causing the tests to fail that won't affect real users, or is there something bad in b2_5 that will cause recovery problems for users also?
            yujian Jian Yu added a comment -

            The patches caused regression failures LU-5900, LU-5901 and LU-5902 on master and b2_5 branches.

            yujian Jian Yu added a comment - The patches caused regression failures LU-5900 , LU-5901 and LU-5902 on master and b2_5 branches.
            pjones Peter Jones added a comment -

            It has landed to master now

            pjones Peter Jones added a comment - It has landed to master now

            Reopen this bug until the replay-vbr test patch has been landed to master, otherwise it will not be tracked properly.

            adilger Andreas Dilger added a comment - Reopen this bug until the replay-vbr test patch has been landed to master, otherwise it will not be tracked properly.
            yujian Jian Yu added a comment -

            Here is the patch for master branch to speed up replay-vbr test 7*: http://review.whamcloud.com/12490
            And here is the test result: https://testing.hpdd.intel.com/test_sets/0ae4ab72-5fcb-11e4-895a-5254006e85c2

            With the above patch, total run time for replay-vbr test 7* was reduced from 18796s to 4742s.

            yujian Jian Yu added a comment - Here is the patch for master branch to speed up replay-vbr test 7*: http://review.whamcloud.com/12490 And here is the test result: https://testing.hpdd.intel.com/test_sets/0ae4ab72-5fcb-11e4-895a-5254006e85c2 With the above patch, total run time for replay-vbr test 7* was reduced from 18796s to 4742s .

            Patch landed to Master.

            jlevi Jodi Levi (Inactive) added a comment - Patch landed to Master.
            yujian Jian Yu added a comment -

            patch http://review.whamcloud.com/#/c/11213/

            I tried to manually run replay-vbr test and found it finally passed and took about 8 hours:
            https://testing.hpdd.intel.com/test_sets/18fb6cc0-5eb1-11e4-a2a3-5254006e85c2

            Among the sub-tests, test 7e took 3876s, which exceeded the 3600s timeout value set by autotest system. That was why the test was stopped in autotest runs.

            yujian Jian Yu added a comment - patch http://review.whamcloud.com/#/c/11213/ I tried to manually run replay-vbr test and found it finally passed and took about 8 hours: https://testing.hpdd.intel.com/test_sets/18fb6cc0-5eb1-11e4-a2a3-5254006e85c2 Among the sub-tests, test 7e took 3876s, which exceeded the 3600s timeout value set by autotest system. That was why the test was stopped in autotest runs.

            I tried http://review.whamcloud.com/12365 on top of LLNL's 2.5.3-1chaos tag, and saw lots of problems. I don't know if they were actually cause by it or not. Experience is recorded in LU-5805.

            morrone Christopher Morrone (Inactive) added a comment - I tried http://review.whamcloud.com/12365 on top of LLNL's 2.5.3-1chaos tag, and saw lots of problems. I don't know if they were actually cause by it or not. Experience is recorded in LU-5805 .
            yujian Jian Yu added a comment - patch http://review.whamcloud.com/#/c/11213/ The above patch introduced regression failure in replay-vbr test 7e. On master branch: https://testing.hpdd.intel.com/test_sets/92fe44e0-5b80-11e4-a35f-5254006e85c2 On b2_5 branch: https://testing.hpdd.intel.com/test_sets/4d02235e-59c2-11e4-aa32-5254006e85c2 https://testing.hpdd.intel.com/test_sets/25213bfe-59c2-11e4-aa32-5254006e85c2 https://testing.hpdd.intel.com/test_sets/3238224a-59bc-11e4-816e-5254006e85c2 https://testing.hpdd.intel.com/test_sets/166cc778-59bc-11e4-816e-5254006e85c2
            yujian Jian Yu added a comment -

            Here is the back-ported patch for Lustre b2_5 branch: http://review.whamcloud.com/12365

            yujian Jian Yu added a comment - Here is the back-ported patch for Lustre b2_5 branch: http://review.whamcloud.com/12365

            This appears to be related to the LU-5077 problems. Since the patch for LU-4578 is in the b2_5 branch I believe this might be the source of our recovery problems. I was seeing recovery issues on 2.5 in my test bed until I applied the patch for this ticket here.

            simmonsja James A Simmons added a comment - This appears to be related to the LU-5077 problems. Since the patch for LU-4578 is in the b2_5 branch I believe this might be the source of our recovery problems. I was seeing recovery issues on 2.5 in my test bed until I applied the patch for this ticket here.

            People

              yujian Jian Yu
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: