[LU-4039] Failure on test suite replay-single test_90: wrong stripe: f0, uuid: lustre-OST0000_UUID Created: 01/Oct/13  Updated: 20/Jan/17  Resolved: 11/Aug/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.0, Lustre 2.9.0
Fix Version/s: Lustre 2.9.0

Type: Bug Priority: Critical
Reporter: Maloo Assignee: Yang Sheng
Resolution: Fixed Votes: 0
Labels: zfs
Environment:

server and client: lustre-master build # 1687


Issue Links:
Related
is related to LU-8544 recovery-double-scale test_pairwise_f... Resolved
is related to LU-7759 umount hanging in modern distros when... Resolved
is related to LU-8526 replay-single test_90: @@@@@@ FAIL: w... Resolved
Severity: 3
Rank (Obsolete): 10848

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/f9c487da-26c9-11e3-83d1-52540035b04c.

The sub-test test_90 failed with the following error:

Create the files
/mnt/lustre/d0.replay-single/d90/f0
lmm_stripe_count: 1
lmm_stripe_size: 1048576
lmm_pattern: 1
lmm_layout_gen: 0
lmm_stripe_offset: 1
obdidx objid objid group
1 30476 0x770c 0

replay-single test_90: @@@@@@ FAIL: wrong stripe: f0, uuid: lustre-OST0000_UUID

Info required for matching: replay-single 90



 Comments   
Comment by Yang Sheng [ 22/Jun/16 ]

This issue was caused by test_89 failed OST0000 and then import still is not in FULL state while test_90 start. So stripe alloced wrong. I'll push a patch to fix it.

/proc/fs/lustre/osp/lustre-MDT0000-osp-MDT0001/state:current_state: FULL
/proc/fs/lustre/osp/lustre-MDT0001-osp-MDT0000/state:current_state: FULL
/proc/fs/lustre/osp/lustre-OST0000-osc-MDT0000/state:current_state: REPLAY_WAIT
/proc/fs/lustre/osp/lustre-OST0000-osc-MDT0001/state:current_state: REPLAY_WAIT
/proc/fs/lustre/osp/lustre-OST0001-osc-MDT0000/state:current_state: FULL
/proc/fs/lustre/osp/lustre-OST0001-osc-MDT0001/state:current_state: FULL

Comment by Gerrit Updater [ 22/Jun/16 ]

Yang Sheng (yang.sheng@intel.com) uploaded a new patch: http://review.whamcloud.com/20931
Subject: LU-4039 tests: ensure osc import in FULL state
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: d03dbc49b9ee5da93341a5d4ddc42a8915293612

Comment by Sebastien Buisson (Inactive) [ 04/Jul/16 ]

Several recent hits on master, like:
https://testing.hpdd.intel.com/test_sets/ae363b24-4205-11e6-a0ce-5254006e85c2

Comment by Andreas Dilger [ 04/Jul/16 ]

It appears that this test only started failing again on 2016-06-28, so it is very likely a regression caused by some patch that recently landed. It seems that the MDS is not enforcing setstripe requests that specify a starting OST index, possibly if that OST does not have any precreated objects. One candidate is patch http://review.whamcloud.com/19195 "LU-7759 llite: handle inactive OSTs better in statfs" though I can't see why it would be a problem...

Comment by Andreas Dilger [ 05/Jul/16 ]

Looking at the patch testing history for http://review.whamcloud.com/19195 it appears that it was failing replay-single test_90 on a regular basis, except for the very last version of the patch, which was landed.

Comment by Gerrit Updater [ 05/Jul/16 ]

Andreas Dilger (andreas.dilger@intel.com) merged in patch http://review.whamcloud.com/20931/
Subject: LU-4039 tests: ensure osc import in FULL state
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: eb2657a654a449f2c5504fa4c5ba4b7e1413ddea

Comment by Yang Sheng [ 06/Jul/16 ]

Patch landed. Close ticket.

Comment by Andreas Dilger [ 06/Jul/16 ]

It looks like this patch did not fix the problem. There were two recent patch tests that failed replay-single.sh even though they included the latest patch:

http://review.whamcloud.com/18758

http://review.whamcloud.com/16105

Comment by Yang Sheng [ 06/Jul/16 ]

Looks like almost tests failed on:

replay-single test_90: @@@@@@ FAIL: wrong stripe: all, uuid: lustre-OST0000_UUID 

and test_89 is skipped. I'll push a debug patch for it.

Comment by Gerrit Updater [ 06/Jul/16 ]

Yang Sheng (yang.sheng@intel.com) uploaded a new patch: http://review.whamcloud.com/21175
Subject: LU-4039 tests: debug patch
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 927531c4e0981990861cb2a82319816e2f913fab

Comment by James Nunez (Inactive) [ 07/Jul/16 ]

Increased priority of ticket because replay-single test 90 is failing on master multiple times a day.

Comment by Jian Yu [ 08/Jul/16 ]

This is blocking patch review testing on master branch:
https://testing.hpdd.intel.com/test_sets/becbbe9c-3fed-11e6-80b9-5254006e85c2
https://testing.hpdd.intel.com/test_sets/828464ba-40e8-11e6-acf3-5254006e85c2
https://testing.hpdd.intel.com/test_sets/e00ada42-4020-11e6-acf3-5254006e85c2

Comment by Gerrit Updater [ 08/Jul/16 ]

James Nunez (james.a.nunez@intel.com) uploaded a new patch: http://review.whamcloud.com/21224
Subject: LU-4039 tests: EXCEPT replay-single test 90
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 27688be30c7c58343ec4eec88798b0588771f196

Comment by Gerrit Updater [ 09/Jul/16 ]

Andreas Dilger (andreas.dilger@intel.com) merged in patch http://review.whamcloud.com/21224/
Subject: LU-4039 tests: EXCEPT replay-single test 90
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 95dbe2cbd2e8829356fc8287366b04fed7131ada

Comment by Gerrit Updater [ 05/Aug/16 ]

Yang Sheng (yang.sheng@intel.com) uploaded a new patch: http://review.whamcloud.com/21736
Subject: LU-4039 tests: enable test_90 for replay-single
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 5762823612887e81bc9d60874028f97cc441419b

Comment by Gerrit Updater [ 11/Aug/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/21736/
Subject: LU-4039 tests: enable test_90 for replay-single
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: f49e518491e8cc96ca720e22e3f62443148cb102

Comment by Peter Jones [ 11/Aug/16 ]

Test is active again so reclosing. Let's open a new ticket if any furhter failures are found for this test

Comment by nasf (Inactive) [ 12/Aug/16 ]

Hit it again on master:
https://testing.hpdd.intel.com/test_sets/a102421e-5ff5-11e6-b2e2-5254006e85c2

Comment by nasf (Inactive) [ 23/Aug/16 ]

Open new ticket LU-8526 for new failures on master recently.

Generated at Sat Feb 10 01:39:07 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.