Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3041

Interop 1.8.9<->2.4 failure on test suite lfsck

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 1.8.9
    • Lustre 2.4.0
    • None
    • 3
    • 7421

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/d2c16ae8-948b-11e2-93c6-52540035b04c.

      lustre-OST0000: ********** WARNING: Filesystem still has errors **********
      
      
               187 inodes used (0.14%, out of 131072)
                 1 non-contiguous file (0.5%)
                 0 non-contiguous directories (0.0%)
                   # of inodes with ind/dind/tind blocks: 0/0/0
                   Extent depth histogram: 181
             35296 blocks used (6.73%, out of 524288)
                 0 bad blocks
                 1 large file
      
               139 regular files
                39 directories
                 0 character device files
                 0 block device files
                 0 fifos
                 0 links
                 0 symbolic links (0 fast symbolic links)
                 0 sockets
      ------------
               178 files
      Memory used: 2428k/21180k (738k/1691k), time:  0.15/ 0.05/ 0.01
      I/O read: 6MB, write: 0MB, rate: 39.22MB/s
       lfsck : @@@@@@ FAIL: e2fsck -d -v -t -t -f -n --mdsdb /home/autotest/.autotest/shared_dir/2013-03-22/182230-70011888563720/mdsdb --ostdb /home/autotest/.autotest/shared_dir/2013-03-22/182230-70011888563720/ostdb-0 /dev/mapper/lvm--OSS-P1 returned 4, should be <= 1 
      

      Attachments

        Issue Links

          Activity

            [LU-3041] Interop 1.8.9<->2.4 failure on test suite lfsck
            emoly.liu Emoly Liu added a comment -

            Patch landed for b1_8

            emoly.liu Emoly Liu added a comment - Patch landed for b1_8
            emoly.liu Emoly Liu added a comment -

            The original b1_8 lfsck test log shows

            lfsck 1.42.6.wc2 (10-Dec-2012)
            lfsck: ost_idx 0: pass1: check for duplicate objects
            lfsck: ost_idx 0: pass1 OK (20 files total)
            lfsck: ost_idx 0: pass2: check for missing inode objects
            [0]: failed to recreate /mnt/lustre/d0.lfsck/testfile.7 missing obj 0:12
            lfsck: ost_idx 0: pass2 ERROR: 1 dangling inodes found (21 files total)
            

            I can reproduce this error easily. I will file it in another ticket and look into it.

            emoly.liu Emoly Liu added a comment - The original b1_8 lfsck test log shows lfsck 1.42.6.wc2 (10-Dec-2012) lfsck: ost_idx 0: pass1: check for duplicate objects lfsck: ost_idx 0: pass1 OK (20 files total) lfsck: ost_idx 0: pass2: check for missing inode objects [0]: failed to recreate /mnt/lustre/d0.lfsck/testfile.7 missing obj 0:12 lfsck: ost_idx 0: pass2 ERROR: 1 dangling inodes found (21 files total) I can reproduce this error easily. I will file it in another ticket and look into it.
            emoly.liu Emoly Liu added a comment -

            Now there are two lfsck issues we discussed. One is interop 1.8<->2.4 failure, the other is b1_8 lfsck failure.

            This patch can't fix the latter one.

            I just added serverjob and serverbuildno to Test-Parameters in the patch commit msg. Let's see if this patch can fix interop failure.

            emoly.liu Emoly Liu added a comment - Now there are two lfsck issues we discussed. One is interop 1.8<->2.4 failure, the other is b1_8 lfsck failure. This patch can't fix the latter one. I just added serverjob and serverbuildno to Test-Parameters in the patch commit msg. Let's see if this patch can fix interop failure.

            But now I find even on the original b1_8 branch, lfsck can't pass on my local VM. I hit the same error to https://maloo.whamcloud.com/test_sessions/3acb15f6-9f83-11e2-9f27-52540035b04c

            Emoly, can the patch fix this failure?

            niu Niu Yawei (Inactive) added a comment - But now I find even on the original b1_8 branch, lfsck can't pass on my local VM. I hit the same error to https://maloo.whamcloud.com/test_sessions/3acb15f6-9f83-11e2-9f27-52540035b04c Emoly, can the patch fix this failure?
            emoly.liu Emoly Liu added a comment - - edited

            I create a patch http://review.whamcloud.com/#change,5968 per Niu's advice.

            But now I find even on the original b1_8 branch, lfsck can't pass on my local VM. I hit the same error to https://maloo.whamcloud.com/test_sessions/3acb15f6-9f83-11e2-9f27-52540035b04c

            I will create another patch for test only to see if lfsck script has any problem on b1_8. The result is at https://maloo.whamcloud.com/test_sessions/61ecab6c-a051-11e2-898f-52540035b04c .

            emoly.liu Emoly Liu added a comment - - edited I create a patch http://review.whamcloud.com/#change,5968 per Niu's advice. But now I find even on the original b1_8 branch, lfsck can't pass on my local VM. I hit the same error to https://maloo.whamcloud.com/test_sessions/3acb15f6-9f83-11e2-9f27-52540035b04c I will create another patch for test only to see if lfsck script has any problem on b1_8. The result is at https://maloo.whamcloud.com/test_sessions/61ecab6c-a051-11e2-898f-52540035b04c .
            emoly.liu Emoly Liu added a comment -

            OK, I will backport these patches and have a test.

            emoly.liu Emoly Liu added a comment - OK, I will backport these patches and have a test.
            pjones Peter Jones added a comment -

            Emoly

            Could you please take care of this one?

            Thanks

            Peter

            pjones Peter Jones added a comment - Emoly Could you please take care of this one? Thanks Peter

            I think it's the 1.8 test script problem:

            00:45:47:CMD: client-27vm4 debugfs -w -f /home/autotest/.autotest/shared_dir/2013-03-22/182230-70011888563720/debugfs.0VwKusgYux /dev/mapper/lvm--OSS-P1
            00:45:47:client-27vm4: debugfs 1.42.6.wc2 (10-Dec-2012)
            00:45:47:debugfs: rm O/0/d10/10
            00:45:47:
            00:45:47:debugfs: rm O/0/d11/11
            

            lfsck.sh of 1.8 is still using debugfs to remove the objects, which result in quota accounting inconsistency on OST:

            00:46:04:client-27vm4: [QUOTA WARNING] Usage inconsistent for ID 0:actual (1814528, 178) != expected (2830336, 180)
            00:46:04:client-27vm4: [QUOTA WARNING] Usage inconsistent for ID 0:actual (1814528, 178) != expected (2830336, 180)
            

            If we want to run 1.8 test script lfsck.sh (same to 2.1) against 2.4 servers, we have to backport the script fix of LU-2663 to 1.8 (and 2.1), see: http://review.whamcloud.com/#change,5186

            BTW: The fix of LU-2694 should be backport to 1.8 as well (it's already in 2.1), see: http://review.whamcloud.com/#change,5762

            niu Niu Yawei (Inactive) added a comment - I think it's the 1.8 test script problem: 00:45:47:CMD: client-27vm4 debugfs -w -f /home/autotest/.autotest/shared_dir/2013-03-22/182230-70011888563720/debugfs.0VwKusgYux /dev/mapper/lvm--OSS-P1 00:45:47:client-27vm4: debugfs 1.42.6.wc2 (10-Dec-2012) 00:45:47:debugfs: rm O/0/d10/10 00:45:47: 00:45:47:debugfs: rm O/0/d11/11 lfsck.sh of 1.8 is still using debugfs to remove the objects, which result in quota accounting inconsistency on OST: 00:46:04:client-27vm4: [QUOTA WARNING] Usage inconsistent for ID 0:actual (1814528, 178) != expected (2830336, 180) 00:46:04:client-27vm4: [QUOTA WARNING] Usage inconsistent for ID 0:actual (1814528, 178) != expected (2830336, 180) If we want to run 1.8 test script lfsck.sh (same to 2.1) against 2.4 servers, we have to backport the script fix of LU-2663 to 1.8 (and 2.1), see: http://review.whamcloud.com/#change,5186 BTW: The fix of LU-2694 should be backport to 1.8 as well (it's already in 2.1), see: http://review.whamcloud.com/#change,5762

            People

              emoly.liu Emoly Liu
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: