[LU-10490] replay-single test suite fails with no subtest failures Created: 10/Jan/18  Updated: 16/Dec/22

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.11.0, Lustre 2.10.2, Lustre 2.12.0, Lustre 2.10.3, Lustre 2.12.3
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: James Nunez (Inactive) Assignee: Mikhail Pershin
Resolution: Unresolved Votes: 0
Labels: failing_tests

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

All tests are run and pass or are skipped while running replay-single, but the test suite is marked as FAIL. Looking at the suite_log, there are no failures on cleanup. Looking at the node console logs, there’s nothing interesting in them.

One interesting thing is that we do failover an MDS after all tests have been run. From the suite_log:

== replay-single test complete, duration 15106 sec == 11:07:13 (1515496033)
replay-single: SKIP: test_90 not functional with FAILURE_MODE=HARD, affected: ost1,ost2,ost3,ost4,ost5,ost6,ost7
CMD: trevis-10vm8 /usr/sbin/lctl dl
Failing mds1 on trevis-10vm8
+ pm -h powerman --off trevis-10vm8
Command completed successfully
reboot facets: mds1
+ pm -h powerman --on trevis-10vm8
Command completed successfully
Failover mds1 to trevis-10vm7
…

This only fails for failover test sessions for both master (2.11), b2_10 and other branches. I have found example of this failure as far back as I look in Maloo. This failure started before or on January 11, 2017 with logs for this early failure at https://testing.hpdd.intel.com/test_sets/3024ce30-d81e-11e6-8cf3-5254006e85c2.

Logs for the latest example of this failure at https://testing.hpdd.intel.com/test_sets/e5a1e6ee-f52d-11e7-8c23-52540065bddc
Other logs for this failure are at:
https://testing.hpdd.intel.com/test_sets/f13e1392-f60e-11e7-94c7-52540065bddc
https://testing.hpdd.intel.com/test_sets/579b4ba0-f3cb-11e7-a169-52540065bddc
https://testing.hpdd.intel.com/test_sets/da02996a-e690-11e7-8027-52540065bddc
https://testing.hpdd.intel.com/test_sets/ad79a576-a862-11e7-bb19-5254006e85c2



 Comments   
Comment by Colin Faber [ 28/Sep/22 ]

Hi tappro 

Can you take a look at this one as well?

Thank you!

Generated at Sat Feb 10 02:35:34 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.