[LU-1899] Test failure on test suite sanity, subtest test_17m Created: 11/Sep/12  Updated: 19/Sep/12  Resolved: 19/Sep/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.2.0, Lustre 2.3.0, Lustre 2.4.0
Fix Version/s: None

Type: Bug Priority: Blocker
Reporter: Maloo Assignee: Alex Zhuravlev
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 4146

 Description   

This issue was created by maloo for Minh Diep <mdiep@whamcloud.com>

This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/7ec0dbc2-ff7b-11e1-87b9-52540035b04c.

The sub-test test_17m failed with the following error:

e2fsck should not report error upon short/long symlink MDT: rc=4

Info required for matching: sanity 17m



 Comments   
Comment by Sarah Liu [ 13/Sep/12 ]

another failure: https://maloo.whamcloud.com/test_sets/31bdb61c-fc7c-11e1-bde2-52540035b04c

Comment by Peter Jones [ 14/Sep/12 ]

Yujian

Could you please take care of this one?

Thanks

Peter

Comment by Ian Colle (Inactive) [ 14/Sep/12 ]

https://maloo.whamcloud.com/test_sets/9c65c71a-fe80-11e1-a707-52540035b04c

Comment by Andreas Dilger [ 15/Sep/12 ]

This looks like it is NOT related to the original short/long symlink problem in LU-1540. What it is showing is that there is some other kind of inconsistency being introduced in the filesystem during testing that e2fsck is catching. We probably haven't been running e2fsck on a populated filesystem for some time, and it may be a long-standing problem:

Seen in https://maloo.whamcloud.com/test_sets/32952cfe-fec6-11e1-a707-52540035b04c, https://maloo.whamcloud.com/test_sets/0007aee8-fe9d-11e1-a707-52540035b04c:

e2fsck 1.42.3.wc3 (15-Aug-2012)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Unattached inode 77
Connect to /lost+found? no

Inode 524291 ref count is 3, should be 2.  Fix? no

Pass 5: Checking group summary information

lustre-MDT0000: ********** WARNING: Filesystem still has errors **********

The error Sarah reported is actually the original LU-1540 failure, hit during interop testing. I've pushed http://review.whamcloud.com/4004 to disable this test during interop testing. It does NOT fix the actual e2fsck errors reported above.

Comment by Andreas Dilger [ 15/Sep/12 ]

Increasing priority, since this is hitting fairly often.

Comment by Andreas Dilger [ 15/Sep/12 ]

Looking at the test failures here, it appears that all of the failing tests with the "Unattached inode" error are in LU-1301 or LU-1304. It appears that there is a defect in the orion patch series that is introducing this bug.

Comment by Andreas Dilger [ 15/Sep/12 ]

This is likely due to a directory reference leak from the local object library, perhaps creating some subdirectory without a name?

Comment by Li Wei (Inactive) [ 16/Sep/12 ]

https://maloo.whamcloud.com/test_sets/32952cfe-fec6-11e1-a707-52540035b04c

Comment by Jian Yu [ 17/Sep/12 ]

Lustre client build: http://build.whamcloud.com/job/lustre-b2_3/19
Lustre server build: http://build.whamcloud.com/job/lustre-b2_2/17
Distro/Arch: RHEL6.3/x86_64
https://maloo.whamcloud.com/test_sets/f6d45338-0065-11e2-9f3c-52540035b04c

The patch in http://review.whamcloud.com/4004 needs to be landed/cherry-picked on b2_3 to disable this test during interop testing.

Comment by Alex Zhuravlev [ 17/Sep/12 ]

no, I don't think we should disable the test: Andreas fixed one issue with e2fsprogs and I did fix another problem with nlink in orion code.

Comment by Jian Yu [ 17/Sep/12 ]

Let's mark this ticket as a blocker since it's hit regularly in Lustre b2_2<->b2_3/master interop testing.

Comment by Andreas Dilger [ 17/Sep/12 ]

I think the important issue is that the fix for symlinks is not in the 2.2.0 release (nor any release before 2.2.94, so running this test in interop mode against those older Lustre releases will only cause testing failures.

There is no question about disabling this test for master/2.4. It has proven useful in finding at least one defect, and will hopefully catch more in the future. It would be even better to enable the additional e2fsck check in check_and_cleanup_lustre(), before all of the files are deleted. I'm working on a patch to do that.

Comment by Peter Jones [ 19/Sep/12 ]

The suggested fix has landed to b2_3 so this should now be dealt with

Generated at Sat Feb 10 01:20:40 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.