Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-367

lfsck 1.41.90.wc2: illegal flag specified to DB->open

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.1.0, Lustre 1.8.6
    • Lustre 1.8.6
    • None
    • 1
    • 3
    • 10145

    Description

      lfsck test failed as follows:

      lfsck -c -l --mdsdb /home/yujian/test_logs/mdsdb --ostdb /home/yujian/test_logs/ostdb-0 /home/yujian/test_logs/ostdb-1 /home/yujian/test_logs/ostdb-2 /home/yujian/test_logs/ostdb-3 /home/yujian/test_logs/ostdb-4 /home/yujian/test_logs/ostdb-5 /mnt/lustre
      lfsck 1.41.90.wc2 (14-May-2011)
      illegal flag specified to DB->open
      /home/yujian/test_logs/mdsdb:mdshdr
      : Invalid argument
      /usr/lib64/lustre/tests/test-framework.sh: line 2269: 12801 Segmentation fault      (core dumped) lfsck -c -l --mdsdb /home/yujian/test_logs/mdsdb --ostdb /home/yujian/test_logs/ostdb-0 /home/yujian/test_logs/ostdb-1 /home/yujian/test_logs/ostdb-2 /home/yujian/test_logs/ostdb-3 /home/yujian/test_logs/ostdb-4 /home/yujian/test_logs/ostdb-5 /mnt/lustre
       lfsck : @@@@@@ FAIL: lfsck -c -l --mdsdb /home/yujian/test_logs/mdsdb --ostdb  /home/yujian/test_logs/ostdb-0 /home/yujian/test_logs/ostdb-1 /home/yujian/test_logs/ostdb-2 /home/yujian/test_logs/ostdb-3 /home/yujian/test_logs/ostdb-4 /home/yujian/test_logs/ostdb-5 /mnt/lustre returned 139, should be <= 1 
      Dumping lctl log to /home/yujian/test_logs/2011-05-27/014304/lfsck..*.1306485882.log
      tar: Removing leading `/' from member names
      /home/yujian/test_logs/2011-05-27/014304/lfsck-1306485882.tar.bz2
      

      Dmesg on client-1-ib showed:

      lfsck[12801]: segfault at 5 ip 00007fd0ed3b4ff7 sp 00007fff411b5e50 error 4 in libc-2.12.so[7fd0ed36d000+175000]
      

      Maloo report: https://maloo.whamcloud.com/test_sets/9648ba14-883d-11e0-b4df-52540025f9af

      The logs and db files are attached.

      Attachments

        Activity

          [LU-367] lfsck 1.41.90.wc2: illegal flag specified to DB->open

          Integrated in e2fsprogs-master » i686,el6 #41
          LU-367 Clean up Lustre configure option handling

          Andreas Dilger : df8f009b6cd67d8a2b5750c1143480b9b644446d
          Files :

          • patches/e2fsprogs-add-trusted-fid.patch
          • patches/e2fsprogs-rpm_RHEL-6.patch
          • patches/e2fsprogs-rpm_SLES-11.patch
          • patches/e2fsprogs-lfsck.patch
          • patches/e2fsprogs-version.patch
          hudson Build Master (Inactive) added a comment - Integrated in e2fsprogs-master » i686,el6 #41 LU-367 Clean up Lustre configure option handling Andreas Dilger : df8f009b6cd67d8a2b5750c1143480b9b644446d Files : patches/e2fsprogs-add-trusted-fid.patch patches/e2fsprogs-rpm_RHEL-6.patch patches/e2fsprogs-rpm_SLES-11.patch patches/e2fsprogs-lfsck.patch patches/e2fsprogs-version.patch

          Patch appears to fix the problem (tested manually).

          adilger Andreas Dilger added a comment - Patch appears to fix the problem (tested manually).

          Integrated in e2fsprogs-master » i686,el6 #28
          LU-367 Handle DB->open errors without crashing

          Andreas Dilger : 4ef693a23fda00eec24840a3d072f8fe466b845f
          Files :

          • patches/e2fsprogs-lfsck.patch
          hudson Build Master (Inactive) added a comment - Integrated in e2fsprogs-master » i686,el6 #28 LU-367 Handle DB->open errors without crashing Andreas Dilger : 4ef693a23fda00eec24840a3d072f8fe466b845f Files : patches/e2fsprogs-lfsck.patch

          Integrated in e2fsprogs-master » x86_64,el6 #28
          LU-367 Handle DB->open errors without crashing

          Andreas Dilger : 4ef693a23fda00eec24840a3d072f8fe466b845f
          Files :

          • patches/e2fsprogs-lfsck.patch
          hudson Build Master (Inactive) added a comment - Integrated in e2fsprogs-master » x86_64,el6 #28 LU-367 Handle DB->open errors without crashing Andreas Dilger : 4ef693a23fda00eec24840a3d072f8fe466b845f Files : patches/e2fsprogs-lfsck.patch

          Integrated in e2fsprogs-master » x86_64,el5 #28
          LU-367 Handle DB->open errors without crashing

          Andreas Dilger : 4ef693a23fda00eec24840a3d072f8fe466b845f
          Files :

          • patches/e2fsprogs-lfsck.patch
          hudson Build Master (Inactive) added a comment - Integrated in e2fsprogs-master » x86_64,el5 #28 LU-367 Handle DB->open errors without crashing Andreas Dilger : 4ef693a23fda00eec24840a3d072f8fe466b845f Files : patches/e2fsprogs-lfsck.patch

          Integrated in e2fsprogs-master » i686,el5 #28
          LU-367 Handle DB->open errors without crashing

          Andreas Dilger : 4ef693a23fda00eec24840a3d072f8fe466b845f
          Files :

          • patches/e2fsprogs-lfsck.patch
          hudson Build Master (Inactive) added a comment - Integrated in e2fsprogs-master » i686,el5 #28 LU-367 Handle DB->open errors without crashing Andreas Dilger : 4ef693a23fda00eec24840a3d072f8fe466b845f Files : patches/e2fsprogs-lfsck.patch

          Patch was verified by Yu Jian in https://maloo.whamcloud.com/test_sets/747400f4-8c27-11e0-aab9-52540025f9af, and has been landed to the e2fsprogs master-lustre branch.

          adilger Andreas Dilger added a comment - Patch was verified by Yu Jian in https://maloo.whamcloud.com/test_sets/747400f4-8c27-11e0-aab9-52540025f9af , and has been landed to the e2fsprogs master-lustre branch.

          Patch has been submitted to http://review.whamcloud.com/867.

          adilger Andreas Dilger added a comment - Patch has been submitted to http://review.whamcloud.com/867 .

          I was able to reproduce this error on my local filesystem by building the MDSDB and OSTDBs on a system with db4-4.2 and then running lfsck on a client system with db4-4.7. Both systems were running e2fsprogs-1.41.90.wc2 RPMs on x86_64 built from the same source, but for their respective distros.

          I don't think this issue is a blocker, though it makes sense to avoid confusion like this in the future by having the MDSDB store the db4 version that was used, or something. That said, storing it in the database doesn't help, because opening the database is the problem in the first place.

          adilger Andreas Dilger added a comment - I was able to reproduce this error on my local filesystem by building the MDSDB and OSTDBs on a system with db4-4.2 and then running lfsck on a client system with db4-4.7. Both systems were running e2fsprogs-1.41.90.wc2 RPMs on x86_64 built from the same source, but for their respective distros. I don't think this issue is a blocker, though it makes sense to avoid confusion like this in the future by having the MDSDB store the db4 version that was used, or something. That said, storing it in the database doesn't help, because opening the database is the problem in the first place.
          pjones Peter Jones added a comment -

          Andreas

          Please reassign if someone else should work on this but it sounds like you are

          Peter

          pjones Peter Jones added a comment - Andreas Please reassign if someone else should work on this but it sounds like you are Peter

          It looks from the e2fsprogs versions that there are 2 different distros being tested on the MDS/OSS (RHEL5) and on the clients (RHEL6). It seems possible that this will result in different versions of db4 being used, which has caused compatibility issues in the past.

          I need to confirm whether or not the lfsck run was done on the client or on the MDS. If lfsck was run on the MDS then this is a non-issue, but seems like it may be a reason for this problem.

          Probably a short-term solution is to record the version of e2fsck/lfsck into the MDSDB, and verify this on the OSTs and client, so that there are no surprises.

          adilger Andreas Dilger added a comment - It looks from the e2fsprogs versions that there are 2 different distros being tested on the MDS/OSS (RHEL5) and on the clients (RHEL6). It seems possible that this will result in different versions of db4 being used, which has caused compatibility issues in the past. I need to confirm whether or not the lfsck run was done on the client or on the MDS. If lfsck was run on the MDS then this is a non-issue, but seems like it may be a reason for this problem. Probably a short-term solution is to record the version of e2fsck/lfsck into the MDSDB, and verify this on the OSTs and client, so that there are no surprises.

          People

            adilger Andreas Dilger
            yujian Jian Yu
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: