[LU-2985] e2fsck fails and aborted if it generates the database for lfsck Created: 19/Mar/13  Updated: 19/Mar/13  Resolved: 19/Mar/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Shuichi Ihara (Inactive) Assignee: Niu Yawei (Inactive)
Resolution: Won't Fix Votes: 0
Labels: None
Environment:

Lustre-2.1.0 CentOS5.5


Attachments: File e2fsck-mdsdb-output    
Severity: 1
Rank (Obsolete): 7277

 Description   

The customer's MDT crashed and they ran e2fsck and a lot of problem couldn't be fixed. They upgraded e2fsck to latest version and fixed corruptions with it and they are now going to lfsck before mount MDT.

However, when they ran e2fsck to generage database, we saw the following error messages and can't generate database from MDT.

/sbin/e2fsck -n -v --mdsdb /tmp/mdsdb /dev/ExHwRaid10VolGroup/mdt

MDS: got 4168 bytes = 521 entries in lov_objids
MDS: max_files = 288784110
MDS: num_osts = 521
mds info db file written
db->put failed
: DB_KEYEXIST: Key/data pair already exists
e2fsck: aborted



 Comments   
Comment by nasf (Inactive) [ 19/Mar/13 ]

Ihara, are there more logs can be used?

Comment by Niu Yawei (Inactive) [ 19/Mar/13 ]

Apparently there is old data in the db which has duplicated key? Does removing the old db and rerun lfsck works?

Comment by Shuichi Ihara (Inactive) [ 19/Mar/13 ]

Niu, FanYong
They removed /tmp/mdsdb every time before run e2fsck for generating database.
what logs files do you want to see?

Comment by Shuichi Ihara (Inactive) [ 19/Mar/13 ]

it's log file when they ran e2fsck to generate database.

Comment by Shuichi Ihara (Inactive) [ 19/Mar/13 ]

just attached e2fsck log.

Comment by Niu Yawei (Inactive) [ 19/Mar/13 ]

seems like there are duplicated inode number on the local mds filesystem, did you run 'e2fsck -f' to fix the local filesystem before generating mds db?

Comment by Andreas Dilger [ 19/Mar/13 ]

For 2.x lfsck, it is using the FID as the key for the mdsdb. I don't think there could be duplicate inode numbers in the filesystem, but it is possible that lfsck isn't handling hard links correctly?

If the customer has run e2fsck on the MDS filesystem, it isn't strictly necessary to run lfsck. Until this problem with lfsck is better understood, they should be able to use the filesystem without lfsck.

Comment by Peter Jones [ 19/Mar/13 ]

Please cease all work on this. This is for an unsupported site running and unsupported version of Lustre.

Generated at Sat Feb 10 01:29:58 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.