Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6206

conf-sanity test_53a: Insane OST thread counts

Details

    • Bug
    • Resolution: Incomplete
    • Minor
    • None
    • None
    • None
    • 3
    • 17360

    Description

      This issue was created by maloo for Bob Glossman <bob.glossman@intel.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/29448bbe-ac56-11e4-b832-5254006e85c2.

      The sub-test test_53a failed with the following error:

      $'Assertion 27 failed: (($tmax2 < $tmin)) (expanded: ((4 < 3)))
      Insane OST thread counts'
      

      Suspect this may be due to the very recent landing of LU-1214 ptlrpc: start minimum service threads.
      It made changes in the close neighborhood of the failed tests.

      Info required for matching: conf-sanity 53a

      Attachments

        Issue Links

          Activity

            [LU-6206] conf-sanity test_53a: Insane OST thread counts

            Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: http://review.whamcloud.com/13823
            Subject: LU-6206 ptlrpc: start minimum service threads
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 34ec2c93de940ecca8dcf2aa6c760cf8c1b133cc

            gerrit Gerrit Updater added a comment - Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: http://review.whamcloud.com/13823 Subject: LU-6206 ptlrpc: start minimum service threads Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 34ec2c93de940ecca8dcf2aa6c760cf8c1b133cc

            PS: we are also investigating how/why Maloo marked this test failure with Verified +1 when it clearly failed the review-dne-part-1 test.

            We know that review-zfs is currently optional so the presence of a failure in that test wouldn't itself cause Maloo to mark the overall result -1, but the other failure should have. It should be noted that (excluding the hit from this issue) we have resolved a major blocker for ZFS testing (LU-5242) and are expecting to change review-zfs test results to be enforced in the near future now that we expect it to pass regularly.

            adilger Andreas Dilger added a comment - PS: we are also investigating how/why Maloo marked this test failure with Verified +1 when it clearly failed the review-dne-part-1 test. We know that review-zfs is currently optional so the presence of a failure in that test wouldn't itself cause Maloo to mark the overall result -1, but the other failure should have. It should be noted that (excluding the hit from this issue) we have resolved a major blocker for ZFS testing ( LU-5242 ) and are expecting to change review-zfs test results to be enforced in the near future now that we expect it to pass regularly.

            Thanks James.

            adilger Andreas Dilger added a comment - Thanks James.

            Yes Bob it was that one line in the test script that was breaking everything. I will push a new version of LU-1214 with that fix. The lesson to learn here is never trust Maloo claiming to pass but to always look at the logs.

            simmonsja James A Simmons added a comment - Yes Bob it was that one line in the test script that was breaking everything. I will push a new version of LU-1214 with that fix. The lesson to learn here is never trust Maloo claiming to pass but to always look at the logs.

            Patches that are currently based on a tree with that patch in it should rebase to after the reverted patch. We are also looking to disable these tests temporarily so that any patches that haven't hit this failure yet can pass.

            adilger Andreas Dilger added a comment - Patches that are currently based on a tree with that patch in it should rebase to after the reverted patch. We are also looking to disable these tests temporarily so that any patches that haven't hit this failure yet can pass.

            We've reverted the original patch from master.

            adilger Andreas Dilger added a comment - We've reverted the original patch from master.
            jlevi Jodi Levi (Inactive) added a comment - We have reverted http://review.whamcloud.com/#/c/2876

            I think you are looking for something like:

            Test-Parameters: fortestonly testlist=conf-sanity,conf-sanity,conf-sanity envdefintions=ONLY=53a

            in the commit header.
            see https://wiki.hpdd.intel.com/display/PUB/Changing+Test+Parameters+with+Gerrit+Commit+Messages

            When ready to land probably need to edit the commit header & remove Test-parameters.

            bogl Bob Glossman (Inactive) added a comment - I think you are looking for something like: Test-Parameters: fortestonly testlist=conf-sanity,conf-sanity,conf-sanity envdefintions=ONLY=53a in the commit header. see https://wiki.hpdd.intel.com/display/PUB/Changing+Test+Parameters+with+Gerrit+Commit+Messages When ready to land probably need to edit the commit header & remove Test-parameters.

            Yep, just confirmed it is the conf-sanity test changes that break things. Let see if your above change is all that is needed. BTW what is the string I should add to the patch to only run this test?

            simmonsja James A Simmons added a comment - Yep, just confirmed it is the conf-sanity test changes that break things. Let see if your above change is all that is needed. BTW what is the string I should add to the patch to only run this test?
            bogl Bob Glossman (Inactive) added a comment - - edited

            I suspect this line in conf-sanity.sh:

            lassert 27 "$msg" '(($tmax2 < $tmin))' || return $?

            It looks entirely logic reversed to me. think it should be:

            lassert 27 "$msg" '(($tmin < $tmax2))' || return $?

            Of course there could be additional flaws introduced by the bad commit covered up by always hitting this one first.

            bogl Bob Glossman (Inactive) added a comment - - edited I suspect this line in conf-sanity.sh: lassert 27 "$msg" '(($tmax2 < $tmin))' || return $? It looks entirely logic reversed to me. think it should be: lassert 27 "$msg" '(($tmin < $tmax2))' || return $? Of course there could be additional flaws introduced by the bad commit covered up by always hitting this one first.

            Looking into why it fails now.

            simmonsja James A Simmons added a comment - Looking into why it fails now.

            People

              adilger Andreas Dilger
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: