Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4349

conf-sanity test_47: test failed to respond and timed out

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.6.0, Lustre 2.5.4
    • Lustre 2.6.0
    • None
    • lustre-master build # 1791 RHEL6 zfs
    • 3
    • 11914

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/1d5f9fd8-5c6a-11e3-9d08-52540035b04c.

      The sub-test test_47 failed with the following error:

      test failed to respond and timed out

      Info required for matching: conf-sanity 47

      This test took 3600s and failed with timeout, cannot find error logs. I checked
      a PASS conf-sanity run on zfs, the same sub test takes about 134s to complete:

      https://maloo.whamcloud.com/sub_tests/b9351066-5d53-11e3-956b-52540035b04c

      Attachments

        Issue Links

          Activity

            [LU-4349] conf-sanity test_47: test failed to respond and timed out
            bogl Bob Glossman (Inactive) added a comment - may be the same issue, seen in b2_5: https://maloo.whamcloud.com/test_sets/550c0fde-a521-11e3-9fee-52540035b04c
            sarah Sarah Liu added a comment -

            Hit this issue again during tag-2.5.56 testing in DNE mode

            https://maloo.whamcloud.com/test_sets/92cc87f8-a02c-11e3-947c-52540035b04c

            sarah Sarah Liu added a comment - Hit this issue again during tag-2.5.56 testing in DNE mode https://maloo.whamcloud.com/test_sets/92cc87f8-a02c-11e3-947c-52540035b04c

            Andreas, while this was happened in the same test, the failure in LU-4413 looked different than in LU-4349. There was no stuck in recovery but after. So I doubt it is related actually, it looks more like coincidence.

            tappro Mikhail Pershin added a comment - Andreas, while this was happened in the same test, the failure in LU-4413 looked different than in LU-4349 . There was no stuck in recovery but after. So I doubt it is related actually, it looks more like coincidence.

            If this was fixed by landing the LU-4413 patch, then it should be linked to that bug.

            adilger Andreas Dilger added a comment - If this was fixed by landing the LU-4413 patch, then it should be linked to that bug.

            Please reopen it when hit it again in the future.

            yong.fan nasf (Inactive) added a comment - Please reopen it when hit it again in the future.

            Hm… with the patch http://review.whamcloud.com/8997 (for LU-4413) landed to master, I did not met conf-sanity failure. So I prefer to close this bug.

            yong.fan nasf (Inactive) added a comment - Hm… with the patch http://review.whamcloud.com/8997 (for LU-4413 ) landed to master, I did not met conf-sanity failure. So I prefer to close this bug.

            Here is the summary for recent conf-sanity test_47 failure. I am not sure they are really the same as LU-4349, but it is true that we are still trouble with conf-sanity failure. If it is not the same, I will open a new ticket.

            https://maloo.whamcloud.com/sub_tests/query?utf8=✓&test_set%5Btest_set_script_id%5D=7f66aa20-3db2-11e0-80c0-52540025f9af&sub_test%5Bsub_test_script_id%5D=8392c2be-3db2-11e0-80c0-52540025f9af&sub_test%5Bstatus%5D=TIMEOUT&sub_test%5Bquery_bugs%5D=&test_session%5Btest_host%5D=&test_session%5Btest_group%5D=&test_session%5Buser_id%5D=&test_session%5Bquery_date%5D=&test_session%5Bquery_recent_period%5D=604800&test_node%5Bos_type_id%5D=&test_node%5Bdistribution_type_id%5D=&test_node%5Barchitecture_type_id%5D=&test_node%5Bfile_system_type_id%5D=&test_node%5Blustre_branch_id%5D=&test_node_network%5Bnetwork_type_id%5D=&commit=Update+results

            yong.fan nasf (Inactive) added a comment - Here is the summary for recent conf-sanity test_47 failure. I am not sure they are really the same as LU-4349 , but it is true that we are still trouble with conf-sanity failure. If it is not the same, I will open a new ticket. https://maloo.whamcloud.com/sub_tests/query?utf8= ✓&test_set%5Btest_set_script_id%5D=7f66aa20-3db2-11e0-80c0-52540025f9af&sub_test%5Bsub_test_script_id%5D=8392c2be-3db2-11e0-80c0-52540025f9af&sub_test%5Bstatus%5D=TIMEOUT&sub_test%5Bquery_bugs%5D=&test_session%5Btest_host%5D=&test_session%5Btest_group%5D=&test_session%5Buser_id%5D=&test_session%5Bquery_date%5D=&test_session%5Bquery_recent_period%5D=604800&test_node%5Bos_type_id%5D=&test_node%5Bdistribution_type_id%5D=&test_node%5Barchitecture_type_id%5D=&test_node%5Bfile_system_type_id%5D=&test_node%5Blustre_branch_id%5D=&test_node_network%5Bnetwork_type_id%5D=&commit=Update+results

            I don't see evidences that this is the same failure, in original bug there was recovery stuck with message like this:
            kernel: Lustre: lustre-OST0000: Denying connection for new client lustre-MDT0000-mdtlov_UUID (at 10.10.18.247@tcp), waiting for all 5 known clients (3 recovered, 1 in progress, and 0 evicted) to recover in 0:12

            In your log there are no such messages, there is nothing useful in log actually. If there will be another test failures with recovery stuck let me know. I'd wait for more instances of this bug in master to say for sure this bug still exists, until that this looks more related to patch applied.

            tappro Mikhail Pershin added a comment - I don't see evidences that this is the same failure, in original bug there was recovery stuck with message like this: kernel: Lustre: lustre-OST0000: Denying connection for new client lustre-MDT0000-mdtlov_UUID (at 10.10.18.247@tcp), waiting for all 5 known clients (3 recovered, 1 in progress, and 0 evicted) to recover in 0:12 In your log there are no such messages, there is nothing useful in log actually. If there will be another test failures with recovery stuck let me know. I'd wait for more instances of this bug in master to say for sure this bug still exists, until that this looks more related to patch applied.

            I think I met the same failure with #8785 applied.

            https://maloo.whamcloud.com/test_sets/fdfb4102-8355-11e3-bedf-52540035b04c

            yong.fan nasf (Inactive) added a comment - I think I met the same failure with #8785 applied. https://maloo.whamcloud.com/test_sets/fdfb4102-8355-11e3-bedf-52540035b04c
            yujian Jian Yu added a comment -

            By searching on Maloo, I found that after patch #8785 landed on master branch 3 days ago, conf-sanity test 47 all passed (more than 60 test runs, except several failed ones because the patches were not rebased). So, the issue was fixed. Let's close this ticket.

            yujian Jian Yu added a comment - By searching on Maloo, I found that after patch #8785 landed on master branch 3 days ago, conf-sanity test 47 all passed (more than 60 test runs, except several failed ones because the patches were not rebased). So, the issue was fixed. Let's close this ticket.

            Yu Jian,
            Since Change, 8785 has landed, can you verify this is fixed and close the ticket or comment?
            Thank you!

            jlevi Jodi Levi (Inactive) added a comment - Yu Jian, Since Change, 8785 has landed, can you verify this is fixed and close the ticket or comment? Thank you!

            People

              tappro Mikhail Pershin
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: