Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.11.0, Lustre 2.10.2, Lustre 2.12.0, Lustre 2.10.3, Lustre 2.12.3
-
3
-
9223372036854775807
Description
All tests are run and pass or are skipped while running replay-single, but the test suite is marked as FAIL. Looking at the suite_log, there are no failures on cleanup. Looking at the node console logs, there’s nothing interesting in them.
One interesting thing is that we do failover an MDS after all tests have been run. From the suite_log:
== replay-single test complete, duration 15106 sec == 11:07:13 (1515496033) replay-single: SKIP: test_90 not functional with FAILURE_MODE=HARD, affected: ost1,ost2,ost3,ost4,ost5,ost6,ost7 CMD: trevis-10vm8 /usr/sbin/lctl dl Failing mds1 on trevis-10vm8 + pm -h powerman --off trevis-10vm8 Command completed successfully reboot facets: mds1 + pm -h powerman --on trevis-10vm8 Command completed successfully Failover mds1 to trevis-10vm7 …
This only fails for failover test sessions for both master (2.11), b2_10 and other branches. I have found example of this failure as far back as I look in Maloo. This failure started before or on January 11, 2017 with logs for this early failure at https://testing.hpdd.intel.com/test_sets/3024ce30-d81e-11e6-8cf3-5254006e85c2.
Logs for the latest example of this failure at https://testing.hpdd.intel.com/test_sets/e5a1e6ee-f52d-11e7-8c23-52540065bddc
Other logs for this failure are at:
https://testing.hpdd.intel.com/test_sets/f13e1392-f60e-11e7-94c7-52540065bddc
https://testing.hpdd.intel.com/test_sets/579b4ba0-f3cb-11e7-a169-52540065bddc
https://testing.hpdd.intel.com/test_sets/da02996a-e690-11e7-8027-52540065bddc
https://testing.hpdd.intel.com/test_sets/ad79a576-a862-11e7-bb19-5254006e85c2