[LU-13514] conf-sanity test_32a: Timeout occurred after 143 mins Created: 04/May/20  Updated: 07/Dec/23  Resolved: 29/Jan/22

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.14.0, Lustre 2.12.6
Fix Version/s: Lustre 2.12.6, Lustre 2.15.0

Type: Bug Priority: Major
Reporter: Maloo Assignee: Yang Sheng
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-11643 create disk images for Lustre 2.10 an... Resolved
is related to LU-11643 create disk images for Lustre 2.10 an... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Chris Horn <hornc@cray.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/4960ea6e-2914-4b3d-a77d-e0e5a0a4c9a6

test_32a failed with the following error:

Timeout occurred after 143 mins, last suite running was conf-sanity

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
conf-sanity test_32a - Timeout occurred after 143 mins, last suite running was conf-sanity



 Comments   
Comment by Chris Horn [ 05/May/20 ]

+1 on master https://testing.whamcloud.com/test_sets/c0c4e2b5-ac3b-4739-b5fe-e317575a774a

Comment by Sebastien Buisson [ 14/May/20 ]

I got multiple occurrences of this problem after rebase of my patches on top of master branch, eg:
https://testing.whamcloud.com/test_sets/0952e3ff-a20e-4461-afba-6042dd35d042

Comment by Chris Horn [ 14/May/20 ]

+1 on master https://testing.whamcloud.com/test_sessions/cc02ea29-8465-4511-ac15-7690c947d11e

Comment by Sebastien Buisson [ 14/May/20 ]

It seems that no recent testing managed to pass conf-sanity because of this problem.
Raising ticket priority.

Comment by Gerrit Updater [ 15/May/20 ]

Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/38615
Subject: LU-13514 tests: test conf-sanity test_32a - base
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: de0d8d26b4ab6deb4618f04c0d82f6489d9b8ab9

Comment by Gerrit Updater [ 15/May/20 ]

Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/38616
Subject: LU-13514 tests: test conf-sanity test_32a - 1
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 631c2ad1096597af7b8a99db0fef5e55283246c3

Comment by Gerrit Updater [ 15/May/20 ]

Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/38617
Subject: LU-13514 tests: test conf-sanity test_32a - 2
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 90693299e21be79621a4891313b3c1a712ed434a

Comment by Sebastien Buisson [ 15/May/20 ]

The patches above show that conf-sanity test_32a fails with tip of master branch (results of https://review.whamcloud.com/38615), and passes when we revert commit 6b979daaff "LU-11643 tests: add new images and tests for upgrade tests" (results of https://review.whamcloud.com/3861).

So I think commit 6b979daaff "LU-11643 tests: add new images and tests for upgrade tests" needs to be reverted for now, as it blocks testing of all new patches.

Comment by Andreas Dilger [ 15/May/20 ]

Sebastien, do you know what is broken in the tests, and why/how that patch passed testing before it landed? Is it just because the testing is now slower and timing out, or is there a code defect (hang)?

Comment by Sebastien Buisson [ 15/May/20 ]

Well, commit 6b979daaff "LU-11643 tests: add new images and tests for upgrade tests" explicitly adds stuff that conf-sanity test_32c goes through. 2 more patches landed after this one, but one is adding a new test script, and the other changes the file lustre/osd-zfs/osd_scrub.c. So they cannot be responsible for failure in review-dne-part-3 test group, which runs on ldiskfs.

The explanation I see for the test failure in master now is that patch https://review.whamcloud.com/35049 (6b979daaff "LU-11643 tests: add new images and tests for upgrade tests") was tested on a too old branch. I can see that patchset 16 was based on commit a83c820f89 "LU-12312 lnet: handle no discovery flag", that dates back from April, 23rd.

Comment by Andreas Dilger [ 19/May/20 ]

I see that conf-sanity test_32a is still failing with this same error even for a patch based on the latest master commit v2_13_53-165-gebaf3b1b9980 "LU-11643 tests: revert new images and tests for upgrade patch":
https://testing.whamcloud.com/test_sets/a4c8f3d3-b8e5-4e7e-9192-2b0bc22279b4

Comment by Arshad Hussain [ 30/May/20 ]

+1 on Master: https://testing.whamcloud.com/sub_tests/e3575182-057a-4057-965e-fd0c293e939b

Comment by Emoly Liu [ 01/Jun/20 ]

more on master: 
https://testing.whamcloud.com/test_sets/b52e5506-8e0a-48c7-8c97-3666e4df460e
https://testing.whamcloud.com/test_sets/3c7736eb-a501-445a-ad51-4e4d2e004212

Comment by Chris Horn [ 01/Jun/20 ]

+1 on master: https://testing.whamcloud.com/test_sessions/807df025-8fd7-45d1-8961-b7fbaf84cdc2

Comment by Sebastien Buisson [ 02/Jun/20 ]

It seems that almost no patch managed to pass conf-sanity test_32a in the last couple of days.

Comment by Andreas Dilger [ 17/Jun/20 ]

Looking at the test results, it seems that review-dne-part-3 (ldiskfs) is the only session that is timing out, and never review-dne-zfs-part-3, so the failure must be related to one of the ldiskfs test images.

Comment by Gerrit Updater [ 19/Jun/20 ]

James Nunez (jnunez@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/39108
Subject: LU-13514 tests: stop running conf-sanity test 32a ldiskfs
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 440654a0768c4bebec20fa99571318e2bf429ccc

Comment by Gerrit Updater [ 19/Jun/20 ]

James Nunez (jnunez@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/39109
Subject: LU-13514 tests: remove upgrade images for conf-sanity
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: b33b1a85c843be4ffffd181605d8ae2ac07c3bac

Comment by Gerrit Updater [ 10/Jul/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39109/
Subject: LU-13514 tests: remove upgrade images for conf-sanity
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: d574a55778f035691bd3bed621cfcdb8200a9785

Comment by Gerrit Updater [ 30/Oct/20 ]

Yang Sheng (ys@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40492
Subject: LU-13514 tests: remove upgrade images for conf-sanity
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: d50c28405f107249b65623be6580e20b311e0bef

Comment by Gerrit Updater [ 07/Nov/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40492/
Subject: LU-13514 tests: remove upgrade images for conf-sanity
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 1d4c0620b41bf2bee63cf3eb40a6c78fa503645c

Comment by Peter Jones [ 07/Nov/20 ]

Is turning off this test a solution or the equivalent of adding something to the always accept list?

Comment by James Nunez (Inactive) [ 16/Nov/20 ]

For master, future 2.14.0, for the past 4 weeks, the only conf-sanity test 32a hangs/timeouts are for interop testing:
2.13.0 server/2.13.56.40 clients - https://testing.whamcloud.com/test_sets/b032d2d1-f0bf-4952-91ca-060a69f7d1ab
2.12.5 servers/2.13.56.45 clients - https://testing.whamcloud.com/test_sets/896bf228-69d7-4417-b163-58a75fe97c23

For the b2_12 branch, we see this test hang in non-interop testing:
2.12.5.82 server/client - https://testing.whamcloud.com/test_sets/f8ec6569-d632-41c9-8d6b-21dd1ac9c635

Comment by Gerrit Updater [ 20/Nov/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40537/
Subject: LU-13514 tests: replace nid in conf-sanity test_32
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 327c8b77694bb0796f168df26e0c543d9610691e

Comment by Andreas Dilger [ 02/Mar/21 ]

Yang Sheng, is patch https://review.whamcloud.com/40537 "LU-13514 tests: replace nid in conf-sanity test_32" needed on master?

Comment by Gerrit Updater [ 28/Jan/22 ]

"Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/46354
Subject: LU-13514 tests: replace nid in conf-sanity test_32
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 98e3e1018f57293061d4d9dd515507577c4e2cae

Comment by Gerrit Updater [ 29/Jan/22 ]

"Andreas Dilger <adilger@whamcloud.com>" merged in patch https://review.whamcloud.com/46354/
Subject: LU-13514 tests: replace nid in conf-sanity test_32
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 4d2efa87abda3c177ca5a2178cd5b2eafa7f99af

Generated at Sat Feb 10 03:01:55 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.