[LU-999] Test failure on test suite lfsck Created: 16/Jan/12  Updated: 06/Feb/12  Resolved: 06/Feb/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.2.0
Fix Version/s: None

Type: Bug Priority: Blocker
Reporter: Maloo Assignee: Zhenyu Xu
Resolution: Duplicate Votes: 0
Labels: None

Issue Links:
Duplicate
duplicates LU-427 Test failure on test suite lfsck Resolved
Severity: 3
Rank (Obsolete): 6485

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/cbcbbd2e-3f50-11e1-990e-5254004bbbd3.

fat-intel-1vm4: error getting mds_hdr (3685469441:8) in /tmp/mdsdb: DB_NOTFOUND: No matching key/data pair found
fat-intel-1vm4: e2fsck: aborted
Pass 1: Checking inodes, blocks, and sizes
Pass 1: Memory used: 432k/0k (265k/168k), time: 0.07/ 0.01/ 0.02
Pass 1: I/O read: 7MB, write: 0MB, rate: 100.18MB/s
Pass 2: Checking directory structure
Pass 2: Memory used: 432k/0k (291k/142k), time: 0.00/ 0.00/ 0.00
Pass 2: I/O read: 3MB, write: 0MB, rate: 1337.49MB/s
Pass 3: Checking directory connectivity
Peak memory: Memory used: 432k/0k (308k/125k), time: 0.11/ 0.01/ 0.03
Pass 3: Memory used: 432k/0k (291k/142k), time: 0.00/ 0.00/ 0.00
Pass 3: I/O read: 1MB, write: 0MB, rate: 27027.03MB/s
Pass 4: Checking reference counts
Pass 4: Memory used: 396k/0k (164k/233k), time: 0.00/ 0.00/ 0.00
Pass 4: I/O read: 0MB, write: 0MB, rate: 0.00MB/s
Pass 5: Checking group summary information
Pass 5: Memory used: 396k/0k (148k/249k), time: 0.02/ 0.01/ 0.00
Pass 5: I/O read: 1MB, write: 0MB, rate: 64.42MB/s
Pass 6: Acquiring OST information for lfsck
lfsck : @@@@@@ FAIL: e2fsck d -v -t -t -f -n --mdsdb /tmp/mdsdb --ostdb /tmp/ostdb-0 /dev/mapper/lvm-OSS-P0 returned 8, should be <= 1



 Comments   
Comment by Peter Jones [ 16/Jan/12 ]

Bobi

Could you please look into this one?

Thanks

Peter

Comment by Andreas Dilger [ 16/Jan/12 ]

The first thing to check is if the same version of e2fsprogs is installed on the MDS and OSS? Next, check if the version of db4 used by e2fsck is the same on both. This bug was hit in the past, and was due to db4 version mismatches. Please search bugzilla for this error messages.

Comment by Zhenyu Xu [ 31/Jan/12 ]

error msg is:

fat-intel-1vm4: error getting mds_hdr (3685469441:8) in /tmp/mdsdb: DB_NOTFOUND: No matching key/data pair found

the corresponding e2fsck code are:

        memset(&mds_hdr, 0, sizeof(mds_hdr));
        mds_hdr.mds_magic = MDS_MAGIC;           // ====>  0xDBABCD01 == 3685469441
        memset(&key, 0, sizeof(key));
        memset(&data, 0, sizeof(data));
        key.data = &mds_hdr.mds_magic;
        key.size = sizeof(mds_hdr.mds_magic);
        data.data = &mds_hdr;
        data.size = sizeof(mds_hdr);
        data.ulen = sizeof(mds_hdr);
        data.flags = DB_DBT_USERMEM;
        rc = mds_hdrdb->get(mds_hdrdb, NULL, &key, &data, 0);
        if (rc) {
                fprintf(stderr,"error getting mds_hdr ("LPU64":%u) in %s: %s\n",
                        mds_hdr.mds_magic, (int)sizeof(mds_hdr.mds_magic),
                        ctx->lustre_mdsdb, db_strerror(rc));
                ctx->flags |= E2F_FLAG_ABORT;
                goto out;
        }

e2fsck cannot find the correct mds header magic value in /tmp/mdsdb when generating ost db.

This could be caused by db4 version mismatch, but I checked the test session info https://maloo.whamcloud.com/test_sessions/99aea334-3f4f-11e1-990e-5254004bbbd3, MDS(fat-intel-1vm3) and OSTs(fat-intel-1vm4) use the same build image (Kernel Version: 2.6.32-131.17.1.el6_lustre.ge126ace.x86_64 Lustre Version: jenkins-arch=x86_64,build_type=server,distro=el6,ib_stack=inkern)

Comment by Andreas Dilger [ 06/Feb/12 ]

Is the /tmp/mdsdb file available on the OSS node, and is it definitely the right one (i.e. not left over from some previous run)? I haven't checked this code to verify if it will fail with a "file not found" if the mdsdb file is missing entirely. Since the OSS and MDS are running in different VM images, it may be that the file is not being copied to the OSS correctly.

Comment by Zhenyu Xu [ 06/Feb/12 ]

Sarah,

Would you please check whether /tmp is a shared directory among MDS and OSS, and whether /tmp/mdsdb on the OSS node is exactly the same one as on the MDS node if we got another hit?

Thanks.

Comment by Jian Yu [ 06/Feb/12 ]

It seems this is the same issue as LU-427.

Comment by Zhenyu Xu [ 06/Feb/12 ]

yes, I think it's dup of LU-427, thanks Yujian.

Generated at Sat Feb 10 01:12:32 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.