[LU-6563] conf-sanity test_53b: FAIL: Assertion 25 failed: (($tstarted >= $tmin2)) (expanded: ((7 >= 8))) Created: 04/May/15  Updated: 06/May/15  Resolved: 06/May/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: None

Type: Bug Priority: Blocker
Reporter: Jian Yu Assignee: WC Triage
Resolution: Fixed Votes: 0
Labels: dne
Environment:

Lustre build: https://build.hpdd.intel.com/job/lustre-master/3009
MDSCOUNT=2


Issue Links:
Related
is related to LU-6206 conf-sanity test_53a: Insane OST thre... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

conf-sanity test 53b failed as follows under DNE configuration:

CMD: shadow-21vm8 /usr/sbin/lctl set_param mds.MDS.mdt.threads_min=8
mds.MDS.mdt.threads_min=8
CMD: shadow-21vm8 /usr/sbin/lctl set_param mds.MDS.mdt.threads_max=142
mds.MDS.mdt.threads_max=142
CMD: shadow-21vm8 lctl get_param -n mds.MDS.mdt.threads_min
CMD: shadow-21vm8 lctl get_param -n mds.MDS.mdt.threads_max
checking (($tmin2 == ($tmin + $nthrs))) (((8 == (6 + 2))))...
checking (($tmax2 == ($tmax - $nthrs))) (((142 == (144 - 2))))...
CMD: shadow-21vm8 lctl get_param -n mds.MDS.mdt.threads_started
checking (($tstarted >= $tmin2)) (((7 >= 8)))...
 conf-sanity test_53b: @@@@@@ FAIL: Assertion 25 failed: (($tstarted >= $tmin2)) (expanded: ((7 >= 8)))

Maloo report: https://testing.hpdd.intel.com/test_sets/2a054c68-f25c-11e4-9f61-5254006e85c2



 Comments   
Comment by Jian Yu [ 04/May/15 ]

The failure has started occurring consistently on master branch since 2015-05-02:
https://testing.hpdd.intel.com/sub_tests/query?utf8=%E2%9C%93&test_set[test_set_script_id]=7f66aa20-3db2-11e0-80c0-52540025f9af&sub_test[sub_test_script_id]=286c0182-40a7-11e0-8bad-52540025f9af&sub_test[status]=FAIL&sub_test[query_bugs]=&test_session[test_host]=&test_session[test_group]=&test_session[user_id]=&test_session[query_date]=&test_session[query_recent_period]=&test_node[os_type_id]=&test_node[distribution_type_id]=&test_node[architecture_type_id]=&test_node[file_system_type_id]=&test_node[lustre_branch_id]=24a6947e-04a9-11e1-bb5f-52540025f9af&test_node_network[network_type_id]=&commit=Update+results

Comment by Andreas Dilger [ 05/May/15 ]

Hi Yu Jian, please don't use URL shortening services like tinyurl.com. That link may not stick around forever, and it isn't possible to see what it is actually linking to without following the link. Please just use the full URL.

Comment by Andreas Dilger [ 05/May/15 ]

Looks like the patch http://review.whamcloud.com/13823 is causing test failures. I will submit a reversion patch.

Comment by Andreas Dilger [ 05/May/15 ]

Patch to revert the change:
http://review.whamcloud.com/14682

Comment by James A Simmons [ 05/May/15 ]

Ugh, conf-sanity 53X failing again. Such touchy code.

Comment by Andreas Dilger [ 06/May/15 ]

I think the problem is that the test is increasing threads_min but it doesn't necessarily do anything to trigger the threads to start. The test probably needs to do something like "touch" or similar before sleeping to ensure the service thread is triggered and will check the ptlrpc_threads_enough() condition. The service thread probably handles some RPCs naturally via ping or DLM lock callback similar some of the time, but not consistently, which is why it is failing intermittently.

Comment by Andreas Dilger [ 06/May/15 ]

Have landed patch to revert this.

Generated at Sat Feb 10 02:01:18 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.