[LU-13167] sanity-flr test_33b: read mirror too slow without ost2, from 1 to 4 Created: 22/Jan/20 Updated: 06/Jul/23 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.13.0, Lustre 2.12.3, Lustre 2.12.5, Lustre 2.12.6, Lustre 2.12.9 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | Jian Yu |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
This issue was created by maloo for liuying <emoly.liu@intel.com> This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/69b56d06-3c3b-11ea-b0f4-52540065bddc test_33b failed with the following error: onyx-66vm8: == rpc test complete, duration -o sec ================================================================ 07:30:01 (1579591801) onyx-66vm8: onyx-66vm8.onyx.whamcloud.com: executing _wait_recovery_complete *.lustre-OST0001.recovery_status 1475 onyx-66vm8: *.lustre-OST0001.recovery_status status: COMPLETE sanity-flr test_33b: @@@@@@ FAIL: read mirror too slow without ost2, from 1 to 4 VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV |
| Comments |
| Comment by Jian Yu [ 30/Jan/20 ] |
|
The failure also occurred on Lustre b2_12 branch: |
| Comment by Artem Blagodarenko (Inactive) [ 22/Jan/22 ] |
|
+1 https://testing.whamcloud.com/test_sets/e8b442a3-09c2-45a4-9860-348fc1ea9015 |
| Comment by Andreas Dilger [ 21/Apr/23 ] |
|
This test fails intermittently, but steadily over time. I suspect like many tests that are measuring performance when running in a VM that there are inconsistencies in the system performance that may be causing this. However, I also don't like to disable the test completely or ignore all errors (with error_not_in_vm) because we very rarely run these tests on real hardware, so there would be no way to detect if the test started failing regularly. Ideally, the test framework could monitor the failure rate (e.g. 1/30 failures on a branch) and then consider it a real failure if a patch failed more often than that. It might be possible to work something into test-framework.sh that took "error_not_in_vm" with some failure rate 1/R, and if the subtest failed it would automatically set ONLY_REPEAT for the current test and then re-run it R times to ensure it wasn't failing more frequently. That would still have some chance of failing or missing issues, but would usually avoid the need for gratuitous test restarts, and be more likely to catch changes in intermittent failures than developers are... |