[LU-367] lfsck 1.41.90.wc2: illegal flag specified to DB->open Created: 27/May/11  Updated: 26/Oct/11  Resolved: 02/Jun/11

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 1.8.6
Fix Version/s: Lustre 2.1.0, Lustre 1.8.6

Type: Bug Priority: Major
Reporter: Jian Yu Assignee: Andreas Dilger
Resolution: Fixed Votes: 0
Labels: None
Environment:

Lustre Branch: b1_8
Lustre Build: http://newbuild.whamcloud.com/job/lustre-b1_8/61/
Distro/Arch: RHEL6/x86_64(patchless client, in-kernel OFED), RHEL5/x86_64(server, OFED 1.5.3, ext4)

MGS/MDS Node: client-6-ib (1 combo MGS/MDT)
OSS Node: client-8-ib (6 OSTs)
Client Nodes: client-1-ib, client-2-ib

$ pdsh l root -S -w client[1,2,6,8] "rpm -q e2fsprogs"
client-1: e2fsprogs-1.41.90.wc2-7.el6.x86_64
client-2: e2fsprogs-1.41.90.wc2-7.el6.x86_64
client-6: e2fsprogs-1.41.90.wc2-0redhat
client-8: e2fsprogs-1.41.90.wc2-0redhat


Attachments: File db.tgz     File lfsck-1306485882.tar.bz2    
Story Points: 1
Severity: 3
Rank (Obsolete): 10145

 Description   

lfsck test failed as follows:

lfsck -c -l --mdsdb /home/yujian/test_logs/mdsdb --ostdb /home/yujian/test_logs/ostdb-0 /home/yujian/test_logs/ostdb-1 /home/yujian/test_logs/ostdb-2 /home/yujian/test_logs/ostdb-3 /home/yujian/test_logs/ostdb-4 /home/yujian/test_logs/ostdb-5 /mnt/lustre
lfsck 1.41.90.wc2 (14-May-2011)
illegal flag specified to DB->open
/home/yujian/test_logs/mdsdb:mdshdr
: Invalid argument
/usr/lib64/lustre/tests/test-framework.sh: line 2269: 12801 Segmentation fault      (core dumped) lfsck -c -l --mdsdb /home/yujian/test_logs/mdsdb --ostdb /home/yujian/test_logs/ostdb-0 /home/yujian/test_logs/ostdb-1 /home/yujian/test_logs/ostdb-2 /home/yujian/test_logs/ostdb-3 /home/yujian/test_logs/ostdb-4 /home/yujian/test_logs/ostdb-5 /mnt/lustre
 lfsck : @@@@@@ FAIL: lfsck -c -l --mdsdb /home/yujian/test_logs/mdsdb --ostdb  /home/yujian/test_logs/ostdb-0 /home/yujian/test_logs/ostdb-1 /home/yujian/test_logs/ostdb-2 /home/yujian/test_logs/ostdb-3 /home/yujian/test_logs/ostdb-4 /home/yujian/test_logs/ostdb-5 /mnt/lustre returned 139, should be <= 1 
Dumping lctl log to /home/yujian/test_logs/2011-05-27/014304/lfsck..*.1306485882.log
tar: Removing leading `/' from member names
/home/yujian/test_logs/2011-05-27/014304/lfsck-1306485882.tar.bz2

Dmesg on client-1-ib showed:

lfsck[12801]: segfault at 5 ip 00007fd0ed3b4ff7 sp 00007fff411b5e50 error 4 in libc-2.12.so[7fd0ed36d000+175000]

Maloo report: https://maloo.whamcloud.com/test_sets/9648ba14-883d-11e0-b4df-52540025f9af

The logs and db files are attached.



 Comments   
Comment by Andreas Dilger [ 27/May/11 ]

It looks from the e2fsprogs versions that there are 2 different distros being tested on the MDS/OSS (RHEL5) and on the clients (RHEL6). It seems possible that this will result in different versions of db4 being used, which has caused compatibility issues in the past.

I need to confirm whether or not the lfsck run was done on the client or on the MDS. If lfsck was run on the MDS then this is a non-issue, but seems like it may be a reason for this problem.

Probably a short-term solution is to record the version of e2fsck/lfsck into the MDSDB, and verify this on the OSTs and client, so that there are no surprises.

Comment by Peter Jones [ 27/May/11 ]

Andreas

Please reassign if someone else should work on this but it sounds like you are

Peter

Comment by Andreas Dilger [ 27/May/11 ]

I was able to reproduce this error on my local filesystem by building the MDSDB and OSTDBs on a system with db4-4.2 and then running lfsck on a client system with db4-4.7. Both systems were running e2fsprogs-1.41.90.wc2 RPMs on x86_64 built from the same source, but for their respective distros.

I don't think this issue is a blocker, though it makes sense to avoid confusion like this in the future by having the MDSDB store the db4 version that was used, or something. That said, storing it in the database doesn't help, because opening the database is the problem in the first place.

Comment by Andreas Dilger [ 28/May/11 ]

Patch has been submitted to http://review.whamcloud.com/867.

Comment by Andreas Dilger [ 02/Jun/11 ]

Patch was verified by Yu Jian in https://maloo.whamcloud.com/test_sets/747400f4-8c27-11e0-aab9-52540025f9af, and has been landed to the e2fsprogs master-lustre branch.

Comment by Build Master (Inactive) [ 02/Jun/11 ]

Integrated in e2fsprogs-master » i686,el5 #28
LU-367 Handle DB->open errors without crashing

Andreas Dilger : 4ef693a23fda00eec24840a3d072f8fe466b845f
Files :

  • patches/e2fsprogs-lfsck.patch
Comment by Build Master (Inactive) [ 02/Jun/11 ]

Integrated in e2fsprogs-master » x86_64,el5 #28
LU-367 Handle DB->open errors without crashing

Andreas Dilger : 4ef693a23fda00eec24840a3d072f8fe466b845f
Files :

  • patches/e2fsprogs-lfsck.patch
Comment by Build Master (Inactive) [ 02/Jun/11 ]

Integrated in e2fsprogs-master » x86_64,el6 #28
LU-367 Handle DB->open errors without crashing

Andreas Dilger : 4ef693a23fda00eec24840a3d072f8fe466b845f
Files :

  • patches/e2fsprogs-lfsck.patch
Comment by Build Master (Inactive) [ 02/Jun/11 ]

Integrated in e2fsprogs-master » i686,el6 #28
LU-367 Handle DB->open errors without crashing

Andreas Dilger : 4ef693a23fda00eec24840a3d072f8fe466b845f
Files :

  • patches/e2fsprogs-lfsck.patch
Comment by Andreas Dilger [ 02/Jun/11 ]

Patch appears to fix the problem (tested manually).

Comment by Build Master (Inactive) [ 14/Jun/11 ]

Integrated in e2fsprogs-master » i686,el6 #41
LU-367 Clean up Lustre configure option handling

Andreas Dilger : df8f009b6cd67d8a2b5750c1143480b9b644446d
Files :

  • patches/e2fsprogs-add-trusted-fid.patch
  • patches/e2fsprogs-rpm_RHEL-6.patch
  • patches/e2fsprogs-rpm_SLES-11.patch
  • patches/e2fsprogs-lfsck.patch
  • patches/e2fsprogs-version.patch
Comment by Build Master (Inactive) [ 14/Jun/11 ]

Integrated in e2fsprogs-master » i686,el5 #41
LU-367 Clean up Lustre configure option handling

Andreas Dilger : df8f009b6cd67d8a2b5750c1143480b9b644446d
Files :

  • patches/e2fsprogs-rpm_RHEL-6.patch
  • patches/e2fsprogs-version.patch
  • patches/e2fsprogs-add-trusted-fid.patch
  • patches/e2fsprogs-rpm_SLES-11.patch
  • patches/e2fsprogs-lfsck.patch
Comment by Build Master (Inactive) [ 14/Jun/11 ]

Integrated in e2fsprogs-master » x86_64,el6 #41
LU-367 Clean up Lustre configure option handling

Andreas Dilger : df8f009b6cd67d8a2b5750c1143480b9b644446d
Files :

  • patches/e2fsprogs-version.patch
  • patches/e2fsprogs-add-trusted-fid.patch
  • patches/e2fsprogs-lfsck.patch
  • patches/e2fsprogs-rpm_SLES-11.patch
  • patches/e2fsprogs-rpm_RHEL-6.patch
Comment by Build Master (Inactive) [ 14/Jun/11 ]

Integrated in e2fsprogs-master » x86_64,el5 #41
LU-367 Clean up Lustre configure option handling

Andreas Dilger : df8f009b6cd67d8a2b5750c1143480b9b644446d
Files :

  • patches/e2fsprogs-rpm_SLES-11.patch
  • patches/e2fsprogs-add-trusted-fid.patch
  • patches/e2fsprogs-lfsck.patch
  • patches/e2fsprogs-rpm_RHEL-6.patch
  • patches/e2fsprogs-version.patch
Generated at Sat Feb 10 01:06:19 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.