[LU-4039] Failure on test suite replay-single test_90: wrong stripe: f0, uuid: lustre-OST0000_UUID Created: 01/Oct/13 Updated: 20/Jan/17 Resolved: 11/Aug/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.0, Lustre 2.9.0 |
| Fix Version/s: | Lustre 2.9.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Maloo | Assignee: | Yang Sheng |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | zfs | ||
| Environment: |
server and client: lustre-master build # 1687 |
||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 10848 | ||||||||||||||||
| Description |
|
This issue was created by maloo for sarah <sarah@whamcloud.com> This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/f9c487da-26c9-11e3-83d1-52540035b04c. The sub-test test_90 failed with the following error:
Info required for matching: replay-single 90 |
| Comments |
| Comment by Yang Sheng [ 22/Jun/16 ] |
|
This issue was caused by test_89 failed OST0000 and then import still is not in FULL state while test_90 start. So stripe alloced wrong. I'll push a patch to fix it. /proc/fs/lustre/osp/lustre-MDT0000-osp-MDT0001/state:current_state: FULL /proc/fs/lustre/osp/lustre-MDT0001-osp-MDT0000/state:current_state: FULL /proc/fs/lustre/osp/lustre-OST0000-osc-MDT0000/state:current_state: REPLAY_WAIT /proc/fs/lustre/osp/lustre-OST0000-osc-MDT0001/state:current_state: REPLAY_WAIT /proc/fs/lustre/osp/lustre-OST0001-osc-MDT0000/state:current_state: FULL /proc/fs/lustre/osp/lustre-OST0001-osc-MDT0001/state:current_state: FULL |
| Comment by Gerrit Updater [ 22/Jun/16 ] |
|
Yang Sheng (yang.sheng@intel.com) uploaded a new patch: http://review.whamcloud.com/20931 |
| Comment by Sebastien Buisson (Inactive) [ 04/Jul/16 ] |
|
Several recent hits on master, like: |
| Comment by Andreas Dilger [ 04/Jul/16 ] |
|
It appears that this test only started failing again on 2016-06-28, so it is very likely a regression caused by some patch that recently landed. It seems that the MDS is not enforcing setstripe requests that specify a starting OST index, possibly if that OST does not have any precreated objects. One candidate is patch http://review.whamcloud.com/19195 " |
| Comment by Andreas Dilger [ 05/Jul/16 ] |
|
Looking at the patch testing history for http://review.whamcloud.com/19195 it appears that it was failing replay-single test_90 on a regular basis, except for the very last version of the patch, which was landed. |
| Comment by Gerrit Updater [ 05/Jul/16 ] |
|
Andreas Dilger (andreas.dilger@intel.com) merged in patch http://review.whamcloud.com/20931/ |
| Comment by Yang Sheng [ 06/Jul/16 ] |
|
Patch landed. Close ticket. |
| Comment by Andreas Dilger [ 06/Jul/16 ] |
|
It looks like this patch did not fix the problem. There were two recent patch tests that failed replay-single.sh even though they included the latest patch: |
| Comment by Yang Sheng [ 06/Jul/16 ] |
|
Looks like almost tests failed on: replay-single test_90: @@@@@@ FAIL: wrong stripe: all, uuid: lustre-OST0000_UUID and test_89 is skipped. I'll push a debug patch for it. |
| Comment by Gerrit Updater [ 06/Jul/16 ] |
|
Yang Sheng (yang.sheng@intel.com) uploaded a new patch: http://review.whamcloud.com/21175 |
| Comment by James Nunez (Inactive) [ 07/Jul/16 ] |
|
Increased priority of ticket because replay-single test 90 is failing on master multiple times a day. |
| Comment by Jian Yu [ 08/Jul/16 ] |
|
This is blocking patch review testing on master branch: |
| Comment by Gerrit Updater [ 08/Jul/16 ] |
|
James Nunez (james.a.nunez@intel.com) uploaded a new patch: http://review.whamcloud.com/21224 |
| Comment by Gerrit Updater [ 09/Jul/16 ] |
|
Andreas Dilger (andreas.dilger@intel.com) merged in patch http://review.whamcloud.com/21224/ |
| Comment by Gerrit Updater [ 05/Aug/16 ] |
|
Yang Sheng (yang.sheng@intel.com) uploaded a new patch: http://review.whamcloud.com/21736 |
| Comment by Gerrit Updater [ 11/Aug/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/21736/ |
| Comment by Peter Jones [ 11/Aug/16 ] |
|
Test is active again so reclosing. Let's open a new ticket if any furhter failures are found for this test |
| Comment by nasf (Inactive) [ 12/Aug/16 ] |
|
Hit it again on master: |
| Comment by nasf (Inactive) [ 23/Aug/16 ] |
|
Open new ticket |