Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8899

Restart test-group from next test when a test hangs

Details

    • Improvement
    • Resolution: Low Priority
    • Major
    • None
    • None
    • 5
    • 7
    • Test Infrastructure
    • 5338
    • DCO-2016_Jun20_Jul10

    Description

      As an engineer, when autotest starts over from hanging on a particular test, it should start from the next test rather than starting over completely, so that the time to get through the entire test suite is not as long.

      QUESTION: IN TT-926, WE ASK THAT WHEN A TEST FAILS WE STOP TESTING AND NOT CONTINUE TO NEXT TEST. IS THIS CONTRADICTING THAT REQUEST?

      Attachments

        Activity

          [LU-8899] Restart test-group from next test when a test hangs
          adilger Andreas Dilger made changes -
          Resolution New: Low Priority [ 10100 ]
          Status Original: Reopened [ 4 ] New: Resolved [ 5 ]
          maolson Mark A Olson (Inactive) made changes -
          Labels Original: groomed-lustre-test groomed-lustre-testing triaged New: groomed-lustre-test triaged
          maolson Mark A Olson (Inactive) made changes -
          Labels Original: groomed-lustre-testing triaged New: groomed-lustre-test groomed-lustre-testing triaged

          The --start-at option only applies to test suites listed in the command and does not apply to the "-g GROUP Test group file (Overrides tests listed on command line)" option which Autotest uses. Can this be changed to apply when -g is used? It would need to factor in the value of the -S option as well.

          colmstea Charlie Olmstead added a comment - The --start-at option only applies to test suites listed in the command and does not apply to the "-g GROUP Test group file (Overrides tests listed on command line)" option which Autotest uses. Can this be changed to apply when -g is used? It would need to factor in the value of the -S option as well.
          yujian Jian Yu added a comment -

          The Lustre TF is responsible for writing the results.yml file it should also be responsible for knowing where it should resume from when this flag is given.

          For writing the results.yml file, init_logging() called by each test suite (e.g., sanity.sh, sanityn.sh, etc.) does that.
          For knowing where it should resume, "--start-at" option for auster does that.

          yujian Jian Yu added a comment - The Lustre TF is responsible for writing the results.yml file it should also be responsible for knowing where it should resume from when this flag is given. For writing the results.yml file, init_logging() called by each test suite (e.g., sanity.sh, sanityn.sh, etc.) does that. For knowing where it should resume, "--start-at" option for auster does that.

          I understand the Lustre test-framework isn't responsible for rebooting/re-provisioning the nodes. Once a test hangs, Autotest would re-provision the group of nodes and then call auster with a flag to restart the tests from where it left off. The Lustre TF is responsible for writing the results.yml file it should also be responsible for knowing where it should resume from when this flag is given.

          colmstea Charlie Olmstead added a comment - I understand the Lustre test-framework isn't responsible for rebooting/re-provisioning the nodes. Once a test hangs, Autotest would re-provision the group of nodes and then call auster with a flag to restart the tests from where it left off. The Lustre TF is responsible for writing the results.yml file it should also be responsible for knowing where it should resume from when this flag is given.
          yujian Jian Yu added a comment -

          Hi Charlie,
          After a subtest hangs, the test nodes usually need to be rebooted or re-provisioned by autotest system, which could not be done by Lustre test framework. Autotest knows which subtest hangs, and can just start running the next subtest by performing auster with "--start-at" option.

          yujian Jian Yu added a comment - Hi Charlie, After a subtest hangs, the test nodes usually need to be rebooted or re-provisioned by autotest system, which could not be done by Lustre test framework. Autotest knows which subtest hangs, and can just start running the next subtest by performing auster with "--start-at" option.
          colmstea Charlie Olmstead made changes -
          Labels Original: AT_Improvement_For_Operations groomed-lustre-testing triaged New: groomed-lustre-testing triaged
          colmstea Charlie Olmstead made changes -
          Component/s Original: Autotest [ 10290 ]
          Key Original: DCO-144 New: LU-8899
          Workflow Original: TEI [ 28643 ] New: Sub-task Blocking [ 49287 ]
          Project Original: DataCenter Ops [ 10410 ] New: Lustre [ 10000 ]
          colmstea Charlie Olmstead made changes -
          Summary Original: Restart autotest from next test when a test hangs New: Restart test-group from next test when a test hangs

          People

            colmstea Charlie Olmstead
            chris Chris Gearing (Inactive)
            Votes:
            3 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: