[LU-6206] conf-sanity test_53a: Insane OST thread counts Created: 04/Feb/15  Updated: 03/Jun/16  Resolved: 03/Jun/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Andreas Dilger
Resolution: Incomplete Votes: 0
Labels: None

Issue Links:
Related
is related to LU-1214 PTLRPC related modules cleanup Resolved
is related to LU-6563 conf-sanity test_53b: FAIL: Assertion... Resolved
is related to LU-1214 PTLRPC related modules cleanup Resolved
Severity: 3
Rank (Obsolete): 17360

 Description   

This issue was created by maloo for Bob Glossman <bob.glossman@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/29448bbe-ac56-11e4-b832-5254006e85c2.

The sub-test test_53a failed with the following error:

$'Assertion 27 failed: (($tmax2 < $tmin)) (expanded: ((4 < 3)))
Insane OST thread counts'

Suspect this may be due to the very recent landing of LU-1214 ptlrpc: start minimum service threads.
It made changes in the close neighborhood of the failed tests.

Info required for matching: conf-sanity 53a



 Comments   
Comment by Bob Glossman (Inactive) [ 04/Feb/15 ]

more, all reported in the last day:
https://testing.hpdd.intel.com/test_sets/c3cdcb1c-ac30-11e4-b832-5254006e85c2
https://testing.hpdd.intel.com/test_sets/e5314afc-ac3c-11e4-bc6f-5254006e85c2
https://testing.hpdd.intel.com/test_sets/b1220374-ac3b-11e4-b832-5254006e85c2

Comment by Jodi Levi (Inactive) [ 04/Feb/15 ]

Patch landed yesterday in LU-1214: http://review.whamcloud.com/#/c/2876/

Comment by Bob Glossman (Inactive) [ 04/Feb/15 ]

looking at the event history of Patch Set 17 in http://review.whamcloud.com/#/c/2876 I see FAILs in conf-sanity. don't think it should have gotten +maloo. suspect some TEI fault there. shouldn't have landed.

Comment by James A Simmons [ 04/Feb/15 ]

Looking into why it fails now.

Comment by Bob Glossman (Inactive) [ 04/Feb/15 ]

I suspect this line in conf-sanity.sh:

lassert 27 "$msg" '(($tmax2 < $tmin))' || return $?

It looks entirely logic reversed to me. think it should be:

lassert 27 "$msg" '(($tmin < $tmax2))' || return $?

Of course there could be additional flaws introduced by the bad commit covered up by always hitting this one first.

Comment by James A Simmons [ 04/Feb/15 ]

Yep, just confirmed it is the conf-sanity test changes that break things. Let see if your above change is all that is needed. BTW what is the string I should add to the patch to only run this test?

Comment by Bob Glossman (Inactive) [ 04/Feb/15 ]

I think you are looking for something like:

Test-Parameters: fortestonly testlist=conf-sanity,conf-sanity,conf-sanity envdefintions=ONLY=53a

in the commit header.
see https://wiki.hpdd.intel.com/display/PUB/Changing+Test+Parameters+with+Gerrit+Commit+Messages

When ready to land probably need to edit the commit header & remove Test-parameters.

Comment by Jodi Levi (Inactive) [ 04/Feb/15 ]

We have reverted http://review.whamcloud.com/#/c/2876

Comment by Andreas Dilger [ 04/Feb/15 ]

We've reverted the original patch from master.

Comment by Andreas Dilger [ 04/Feb/15 ]

Patches that are currently based on a tree with that patch in it should rebase to after the reverted patch. We are also looking to disable these tests temporarily so that any patches that haven't hit this failure yet can pass.

Comment by James A Simmons [ 04/Feb/15 ]

Yes Bob it was that one line in the test script that was breaking everything. I will push a new version of LU-1214 with that fix. The lesson to learn here is never trust Maloo claiming to pass but to always look at the logs.

Comment by Andreas Dilger [ 04/Feb/15 ]

Thanks James.

Comment by Andreas Dilger [ 04/Feb/15 ]

PS: we are also investigating how/why Maloo marked this test failure with Verified +1 when it clearly failed the review-dne-part-1 test.

We know that review-zfs is currently optional so the presence of a failure in that test wouldn't itself cause Maloo to mark the overall result -1, but the other failure should have. It should be noted that (excluding the hit from this issue) we have resolved a major blocker for ZFS testing (LU-5242) and are expecting to change review-zfs test results to be enforced in the near future now that we expect it to pass regularly.

Comment by Gerrit Updater [ 20/Feb/15 ]

Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: http://review.whamcloud.com/13823
Subject: LU-6206 ptlrpc: start minimum service threads
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 34ec2c93de940ecca8dcf2aa6c760cf8c1b133cc

Comment by Gerrit Updater [ 01/May/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13823/
Subject: LU-6206 ptlrpc: start minimum service threads
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: cafa669a43062c5097d40803b9ba14e66edbae25

Comment by Gerrit Updater [ 05/May/15 ]

Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: http://review.whamcloud.com/14682
Subject: Revert "LU-6206 ptlrpc: start minimum service threads"
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 241fd8ca40271665c399d79ded71d0a8d28a247a

Comment by Gerrit Updater [ 06/May/15 ]

Andreas Dilger (andreas.dilger@intel.com) merged in patch http://review.whamcloud.com/14682/
Subject: Revert "LU-6206 ptlrpc: start minimum service threads"
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: e15535ff63bf90b93fa33393ed5d7f7f85813895

Comment by Andreas Dilger [ 03/Jun/16 ]

This test is no longer failing. Will leave cleanups to some future date.

Generated at Sat Feb 10 01:58:10 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.