[LU-8526] replay-single test_90: @@@@@@ FAIL: wrong stripe: all, uuid: lustre-OST0000_UUID Created: 23/Aug/16  Updated: 15/Feb/17  Resolved: 15/Feb/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.9.0
Fix Version/s: Lustre 2.10.0

Type: Bug Priority: Major
Reporter: Maloo Assignee: WC Triage
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-4039 Failure on test suite replay-single t... Resolved
is related to LU-8544 recovery-double-scale test_pairwise_f... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for nasf <fan.yong@intel.com>

Please provide additional information about the failure here.

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/03cc3718-68ea-11e6-b2e2-5254006e85c2.



 Comments   
Comment by nasf (Inactive) [ 23/Aug/16 ]

Looks similar as LU-4039.

Comment by Nathaniel Clark [ 24/Aug/16 ]

lustre-reviews, review-zfs-part-2
https://testing.hpdd.intel.com/test_sets/1011bee2-698c-11e6-909b-5254006e85c2

Comment by Jian Yu [ 24/Aug/16 ]

+1 on master branch:
https://testing.hpdd.intel.com/test_sets/97b87550-69b6-11e6-9258-5254006e85c2

Comment by nasf (Inactive) [ 24/Aug/16 ]

+1 on master:
https://testing.hpdd.intel.com/test_sets/ac32a648-6981-11e6-baeb-5254006e85c2

Comment by nasf (Inactive) [ 14/Sep/16 ]

+1 on master:
https://testing.hpdd.intel.com/test_sets/1e365b2a-79e2-11e6-8a8c-5254006e85c2

Comment by Niu Yawei (Inactive) [ 27/Sep/16 ]

Another failure: https://testing.hpdd.intel.com/test_sets/98e04392-8412-11e6-a81d-5254006e85c2

Comment by James Nunez (Inactive) [ 29/Sep/16 ]

I've increased the priority of this ticket because master branch is experiencing the error several times a day recently.

Comment by Andreas Dilger [ 29/Sep/16 ]

It may be that patch http://review.whamcloud.com/22459 "LU-8544 test: using lfs df in client_up" will fix this problem. As stated in LU-4039, I think this was introduced by patch http://review.whamcloud.com/19195 "LU-7759 llite: handle inactive OSTs better in statfs" and the changes in LU-4039 did not fix the problem, and re-enabling the test on 2016-08-10 clearly caused test_90 to start failing again.

Comment by Andreas Dilger [ 30/Sep/16 ]

Close as a duplicate of LU-8544, which has now landed.

Comment by Bruno Faccini (Inactive) [ 11/Oct/16 ]

Andreas, don't you think that we shouldn't also strengthen replay-single/test_90 itself by adding a check like "clients_up" or "wait_osts_up" (presently only available in conf-sanity) instead of current "wait_osc_import_state mds ost FULL" (original change introduced for LU-4039) ??

Comment by Andreas Dilger [ 12/Oct/16 ]

Bruno, I'm not against improving the tests if you think that will avoid failures. That said, please balance time spent against the chance of this actually causing problems in the future.

Comment by Gerrit Updater [ 13/Oct/16 ]

Faccini Bruno (bruno.faccini@intel.com) uploaded a new patch: http://review.whamcloud.com/23148
Subject: LU-8526 tests: ensure all OSTs active for allocations
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 2ab63209f8a72ef2f8c61a27671d6ce96c2c18fd

Comment by Bruno Faccini (Inactive) [ 13/Oct/16 ]

http://review.whamcloud.com/23148 implements what I think is the specific fix to ensure replay-single/test_90 will run successful and particularly to allow for random selection of OSTs during setstripe to be effective, which was the main cause of failures being tracked by this ticket and also by LU-4039.

Comment by Gerrit Updater [ 15/Feb/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/23148/
Subject: LU-8526 tests: ensure all OSTs active for allocations
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 04794b33653fedca282d6f8dfd9c1c9e833ead06

Generated at Sat Feb 10 02:18:20 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.