[LU-3352] conf-sanity test_73: failover nids haven't changed Created: 15/May/13  Updated: 13/Jul/15  Resolved: 10/Jul/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: Lustre 2.4.1, Lustre 2.5.0

Type: Bug Priority: Blocker
Reporter: Maloo Assignee: Emoly Liu
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-3006 failover nids added after format don'... Resolved
is related to LU-3244 Test failure on test suite conf-sanit... Resolved
Severity: 3
Rank (Obsolete): 8287

 Description   

This issue was created by maloo for Andreas Dilger <andreas.dilger@intel.com>

This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/e5d0ffde-bd82-11e2-a548-52540035b04c.

The sub-test test_73 failed with the following error in the test output:

failover nids haven't changed

Info required for matching: conf-sanity 73



 Comments   
Comment by Andreas Dilger [ 15/May/13 ]

This might be related to the recent changes in http://review.whamcloud.com/5982 and http://review.whamcloud.com/6216 that change the way that the target and MGS are notified about configuration changes.

Comment by Peter Jones [ 16/May/13 ]

Emoly

Could you please look into this one?

Thanks

Peter

Comment by Keith Mannthey (Inactive) [ 24/May/13 ]

Another one here. I marked the test run as well. https://maloo.whamcloud.com/test_sets/2b2181d0-bd2e-11e2-a548-52540035b04c

Comment by Nathaniel Clark [ 02/Jun/13 ]

I think this may be infiniband only (similar to LU-2200).

another instance:
https://maloo.whamcloud.com/test_sets/e5d0ffde-bd82-11e2-a548-52540035b04c
https://maloo.whamcloud.com/test_sets/792ed836-c225-11e2-a892-52540035b04c

Comment by Emoly Liu [ 03/Jun/13 ]

I think this may be infiniband only (similar to LU-2200).

Seems so. I will check it. Thanks.

Comment by Emoly Liu [ 03/Jun/13 ]

The maloo search result showed that conf-sanity test_73 had never passed with networktype=o2ib since it was created.

Comment by Emoly Liu [ 05/Jun/13 ]

This problem is probably caused by the script

do_facet ost1 "$TUNEFS --failnode=1.2.3.4@tcp $(ostdevname 1)"

removing "@tcp" should work on IB network.

Comment by Emoly Liu [ 05/Jun/13 ]

http://review.whamcloud.com/6550

Comment by Keith Mannthey (Inactive) [ 05/Jun/13 ]

Should we remove all of @tpc from the test scripts?

Comment by Emoly Liu [ 06/Jun/13 ]

OK, I will check if there are other improper NID settings in out test scripts.

Comment by Bob Glossman (Inactive) [ 18/Jun/13 ]

another instance:
https://maloo.whamcloud.com/test_sets/ce5f8ca0-d7bb-11e2-b179-52540035b04c

Comment by Keith Mannthey (Inactive) [ 18/Jun/13 ]

The patch for test_73 is Acked and waiting for Gatekeeper.

Comment by Jodi Levi (Inactive) [ 09/Jul/13 ]

Can this ticket be closed now that the patch has landed to Master?

Comment by Emoly Liu [ 10/Jul/13 ]

patch landed

Comment by Jian Yu [ 12/Jul/13 ]

http://review.whamcloud.com/6550

The patch also needs to be cherry-picked to Lustre b2_4 branch to resolve interop failure with old servers.

Comment by Jian Yu [ 15/Aug/13 ]

Patch was cherry-picked to Lustre b2_4 branch.

Generated at Sat Feb 10 01:33:10 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.