[LU-5721] sanity-lfsck test_18f failed for unexpected layout LFSCK status Created: 09/Oct/14 Updated: 03/Nov/14 Resolved: 03/Nov/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.7.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | nasf (Inactive) | Assignee: | nasf (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 16053 |
| Description |
|
Recently, Maloo tests hit some sanity-lfsck test_18f failures because of unexpected layout LFSCK status: sanity-lfsck test 18f: Skip the failed OST(s) when handle orphan OST-objects == 12:43:28 (1407761008) https://testing.hpdd.intel.com/test_sets/29aaa042-4a50-11e4-880b-5254006e85c2 https://testing.hpdd.intel.com/test_sets/a4396266-3b47-11e4-a78f-5254006e85c2 https://testing.hpdd.intel.com/test_sets/2ed4fa44-3a52-11e4-b82a-5254006e85c2 https://testing.hpdd.intel.com/test_sets/aa055074-3543-11e4-9daf-5254006e85c2 https://testing.hpdd.intel.com/test_sets/2c7eb87c-2196-11e4-8700-5254006e85c2 https://testing.hpdd.intel.com/test_sets/876c3a80-2186-11e4-b153-5254006e85c2 https://testing.hpdd.intel.com/test_sets/f38fb5c8-216f-11e4-bd4e-5254006e85c2 |
| Comments |
| Comment by John Hammond [ 09/Oct/14 ] |
|
Here are the other 9/16 failures in maloo when I checked yesterday: ---- 11560 ----
https://testing.hpdd.intel.com/test_sets/da2714ee-49f8-11e4-92b1-5254006e85c2
10-02 11560 a168cdd6e093d1cb6fc551202d6a53aab5f87fc8/5 LU-5451 lod: improve weird FID handling
7e000f8fcad8ed9023f502ca63c47f3bdcac8a6b LU-5511 lfsck: repair unmatched parent-child pairs
Error: '(6.1) Expect 1 fixed on mds1, but got: 0'
https://testing.hpdd.intel.com/test_sets/b6842b5e-4a06-11e4-adcb-5254006e85c2
10-01 11560 a168cdd6e093d1cb6fc551202d6a53aab5f87fc8 LU-5451 lod: improve weird FID handling
Error: '(6.1) Expect 1 fixed on mds1, but got: 0'
https://testing.hpdd.intel.com/test_sets/47630d76-4560-11e4-8e96-5254006e85c2
09-26 11560 fa13c28d81ca917d1cfdfdefedb3a06845bb2386 LU-5451 lod: improve weird FID handling
Error: '(6.1) Expect 1 fixed on mds1, but got: 0'
---- 10996 ----
https://testing.hpdd.intel.com/test_sets/883bc6b6-06be-11e4-8941-5254006e85c2
07-08 10996/4 2c4ffba41367e2cb850b2f7af1285641112c87fc LU-5506 lfsck: skip orphan OST-object handling for failed OSTs (Merged)
Error: '(6) Expect 2 fixed on mds{2}, but got: 3'
https://testing.hpdd.intel.com/test_sets/aef7bb80-0645-11e4-8bf0-5254006e85c2
07-08 10996/3 e5d34ebcb64476bc9228551d68f08f0de4ae2944 LU-5506 lfsck: skip orphan OST-object handling for failed OSTs
Error: '(3) OST{1} Expect 'partial', but got 'scanning-phase2''
https://testing.hpdd.intel.com/test_sets/6c7a8fea-0617-11e4-be6f-5254006e85c2
07-07 10996/2 e5d34ebcb64476bc9228551d68f08f0de4ae2944 LU-5506 lfsck: skip orphan OST-object handling for failed OSTs
Error: '(6) Expect 2 fixed on mds{2}, but got: 3'
---- full ----
https://testing.hpdd.intel.com/test_sets/64eb3bde-4dfd-11e4-8fdd-5254006e85c2
10-06 full
Error: '(2) MDS1 is not the expected 'partial''
client 6039fc8fd47ffd73a31b073687f32cac0a35a8aa v2_6_53_0-12-g6039fc8
server 73ea776053d99f74a9f5679fe55ec5d9461b8a89 v2_6_0_0
https://testing.hpdd.intel.com/test_sets/44f280d4-4d45-11e4-857c-5254006e85c2
10-05 full
Error: '(2) MDS1 is not the expected 'partial''
client 0b4b33592c09d37c0132d39c7823db78a3efcb3c v2_6_53_0-8-g0b4b335
server 73ea776053d99f74a9f5679fe55ec5d9461b8a89 v2_6_0_0
---- 9383 ----
https://testing.hpdd.intel.com/test_sets/4b097264-48ce-11e4-b83b-5254006e85c2
09-30 9383 LU-4665 utils: lfs setstripe to specify OSTs
Error: '(3) Fail to repair unmatched pair: 0'
DUE TO REGRESSION IN PATCH.
|
| Comment by Andreas Dilger [ 10/Oct/14 ] |
|
What is the impact of this bug? It isn't at all clear from the description whether this is just causing test failures, or if it is actually a bug that would cause problems for users? |
| Comment by nasf (Inactive) [ 12/Oct/14 ] |
|
There are two sub-failures under this ticket: 1) During the first part of test test, we inject error stub to similar the case of some OST failed to respond some LFSCK request, then the LFSCK on the MDT should skip orphan OST-object handling for this OST and mark the LFSCK status as "partial", but because of some unknown reason, such injection did not cause the "partial" status. The potential impact is that the LFSCK may handle some orphan OST-objects unexpectedly. But because the OST-object usually contains its parent MDT-object's FID information, such unexpected LFSCK behaviour is harmless for most of cases. 2) During the second part of the test, we clear former injected error stub, then the LFSCK should go smoothly and handle all objects. The final LFSCK status should be "completed", but because of some unknown reason, the LFSCK failed to verify some OST-object(s), then the final LFSCK status was "partial". This failure may cause some orphan OST-objects cannot be handled. |
| Comment by nasf (Inactive) [ 22/Oct/14 ] |
|
1) The failure about "Error: '(2) MDS1 is not the expected 'partial'' in the old version tests: They failed because the injected failure stub has not been triggered. Such issue has already been resolved by subsequent versions and has been landed to master. 2) The failure about "Error: '(2) MDS4 is not the expected 'partial'' in the tests: It is another failure instance 3) The failure about "Error: '(4) MDS4 is not the expected 'completed'' in the tests: It is another failure instance 4) The failure about "Error: '(8) MDS1 is not the expected 'completed'' in the tests: The LFSCK on the OST got some abnormal status when queried the LFSCK status from the MDT, then the LFSCK on the OST thought that the LFSCK on the MDT hit some unexpected trouble and marked them as exit in advance, and then the subsequent orphan MDT-object handling has been skipped. This issue will be fixed via the patch: http://review.whamcloud.com/#/c/11516/ |
| Comment by Bob Glossman (Inactive) [ 30/Oct/14 ] |
|
another seen in master: |
| Comment by nasf (Inactive) [ 03/Nov/14 ] |
|
The patch http://review.whamcloud.com/#/c/11516/ has been landed to master at Oct.30th, other related fixes will be landed to master via |