Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2097

sanity.sh test_17m, lfsck: e2fsck failed due to MDT quota accounting error

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.4.0
    • Lustre 2.4.0
    • None
    • 3
    • 4381

    Description

      Running acc-sm on a local ldiskfs test system causes e2fsck to fail when checking the MDT in sanity.sh test_17m. I'm running e2fsprogs-1.42.5.wc3-7.fc13.x86_64, which I don't think is the issue, but it hasn't been released yet.

      It isn't clear when the quota accounting issue is first introduced (even if it is present when the filesystem is formatted), but the problem needs to be resolved for 2.4.0 and/or the e2fsprogs-1.42.5.wc3 release.

      == sanity test 17m: run e2fsck against MDT which contains short/long symlink 21:55:58 (1349409358)
      create 512 short and long symlink files under /mnt/testfs/d0.sanity/d17m
      erase them
      recreate the 512 symlink files with a shorter string
      stop and checking mds1: e2fsck -fnvd /dev/vg_sookie/lvmdt1
      Stopping /mnt/mds1 (opts:-f) on sookie-gig.adilger.int
      e2fsck 1.42.5.wc3 (15-Sep-2012)
      [QUOTA WARNING] Usage inconsistent for ID 0:actual (3649536, 1281) != expected (3551232, 1281)
      [QUOTA WARNING] Usage inconsistent for ID 0:actual (3645440, 1280) != expected (3547136, 1280)
      Pass 1: Checking inodes, blocks, and sizes
      Pass 2: Checking directory structure
      Pass 3: Checking directory connectivity
      Pass 4: Checking reference counts
      Pass 5: Checking group summary information
      Update quota info for quota type 0? no
      
      Update quota info for quota type 1? no
      
      testfs-MDT0000: ********** WARNING: Filesystem still has errors **********
      
              1293 inodes used (1.29%, out of 100000)
                 8 non-contiguous files (0.6%)
                 1 non-contiguous directory (0.1%)
                   # of inodes with ind/dind/tind blocks: 3/0/0
             17587 blocks used (35.17%, out of 50000)
                 0 bad blocks
                 1 large file
      
               109 regular files
               135 directories
                 0 character device files
                 0 block device files
                 0 fifos
                 8 links
              1040 symbolic links (521 fast symbolic links)
                 0 sockets
      ------------
              1292 files
      Starting mds1:   /dev/vg_sookie/lvmdt1 /mnt/mds1
      Started testfs-MDT0000
       sanity test_17m: @@@@@@ FAIL: e2fsck should not report error upon  short/long symlink MDT: rc=4 
      

      Attachments

        Issue Links

          Activity

            [LU-2097] sanity.sh test_17m, lfsck: e2fsck failed due to MDT quota accounting error

            patches landed for 2.4

            niu Niu Yawei (Inactive) added a comment - patches landed for 2.4

            more ll_vfs_dq_init() are added: http://review.whamcloud.com/427

            Without above patch, I can reproduce the problem by "sh sanity.sh; ONLY=8 sh mmp.sh".

            niu Niu Yawei (Inactive) added a comment - more ll_vfs_dq_init() are added: http://review.whamcloud.com/427 Without above patch, I can reproduce the problem by "sh sanity.sh; ONLY=8 sh mmp.sh".

            I verified the test fixed the problem of:

            • NOFORMAT=1 sh llmount.sh; ONLY=17m sh sanity.sh (accounting for llog objects created during mount)
            • ONLY=17m sh sanity.sh; ONLY=17m sh sanity.sh (accounting for oi)

            Maybe there is till something missed, I'm checking into this. Thank you, Andreas.

            niu Niu Yawei (Inactive) added a comment - I verified the test fixed the problem of: NOFORMAT=1 sh llmount.sh; ONLY=17m sh sanity.sh (accounting for llog objects created during mount) ONLY=17m sh sanity.sh; ONLY=17m sh sanity.sh (accounting for oi) Maybe there is till something missed, I'm checking into this. Thank you, Andreas.

            Was this patch actually tested to fix the problem? I still see the same issue here (running mmp.sh test 8 in this case):

            e2fsck 1.42.5.wc3 (15-Sep-2012)
            Starting mds1:   /dev/vg_sookie/lvmdt1 /mnt/mds1
            Start of /dev/vg_sookie/lvmdt1 on mds1 failed 1
            [QUOTA WARNING] Usage inconsistent for ID 0:actual (1294336, 283) != expected (1253376, 279)
            [QUOTA WARNING] Usage inconsistent for ID 0:actual (1294336, 283) != expected (1253376, 279)
            :
            :
            [problems from LU-1861]
            :
            :
            Update quota info for quota type 0? yes
            
            Update quota info for quota type 1? yes
            

            Filesystem had previously been running other acceptance small tests, but I believe this hits early enough and consistently in sanity.sh that it is easily reproduced.

            adilger Andreas Dilger added a comment - Was this patch actually tested to fix the problem? I still see the same issue here (running mmp.sh test 8 in this case): e2fsck 1.42.5.wc3 (15-Sep-2012) Starting mds1: /dev/vg_sookie/lvmdt1 /mnt/mds1 Start of /dev/vg_sookie/lvmdt1 on mds1 failed 1 [QUOTA WARNING] Usage inconsistent for ID 0:actual (1294336, 283) != expected (1253376, 279) [QUOTA WARNING] Usage inconsistent for ID 0:actual (1294336, 283) != expected (1253376, 279) : : [problems from LU-1861] : : Update quota info for quota type 0? yes Update quota info for quota type 1? yes Filesystem had previously been running other acceptance small tests, but I believe this hits early enough and consistently in sanity.sh that it is easily reproduced.

            Patch is merged, just after the 2.3.53 tag.

            adilger Andreas Dilger added a comment - Patch is merged, just after the 2.3.53 tag.
            niu Niu Yawei (Inactive) added a comment - http://review.whamcloud.com/4220

            I can reproduce it by "NOFORMAT=1 sh llmount.sh; ONLY=17m sh sanity.sh" or running the 17m twice "ONLY=17m sh sanity.sh; ONLY=17m sh sanity.sh", but seems OSTs doesn't have such problem, so it's unlikely a kernel or e2fsprogs bug.

            I think it because we missed ll_vfs_dq_init() for some system objects, like llog, oi, last_rcvd, etc. so the block accounting for those existing objects is missed. (if they are newly created, then there isn't any problem), I'll post a patch to fix it soon.

            niu Niu Yawei (Inactive) added a comment - I can reproduce it by "NOFORMAT=1 sh llmount.sh; ONLY=17m sh sanity.sh" or running the 17m twice "ONLY=17m sh sanity.sh; ONLY=17m sh sanity.sh", but seems OSTs doesn't have such problem, so it's unlikely a kernel or e2fsprogs bug. I think it because we missed ll_vfs_dq_init() for some system objects, like llog, oi, last_rcvd, etc. so the block accounting for those existing objects is missed. (if they are newly created, then there isn't any problem), I'll post a patch to fix it soon.

            I think I hit it with both ZFS and ldiskfs using OFD, though I don't think it matters, since the error is on the MDT?

            Ah, nm. I thought fsck was run against an OST.

            this is rather strange, because we do not modify accounting ?

            Right, we don't, but we still bypass the vfs so there might be a code path where we forget to call ll_vfs_dq_transfer() or something similar. It might also be a bug in quota support in e2fsck or even a kernel bug (till now, nothing was checking accounting information correctness).

            Niu, could you please have a look at this one?

            johann Johann Lombardi (Inactive) added a comment - I think I hit it with both ZFS and ldiskfs using OFD, though I don't think it matters, since the error is on the MDT? Ah, nm. I thought fsck was run against an OST. this is rather strange, because we do not modify accounting ? Right, we don't, but we still bypass the vfs so there might be a code path where we forget to call ll_vfs_dq_transfer() or something similar. It might also be a bug in quota support in e2fsck or even a kernel bug (till now, nothing was checking accounting information correctness). Niu, could you please have a look at this one?

            this is rather strange, because we do not modify accounting ?

            bzzz Alex Zhuravlev added a comment - this is rather strange, because we do not modify accounting ?

            I think I hit it with both ZFS and ldiskfs using OFD, though I don't think it matters, since the error is on the MDT?

            adilger Andreas Dilger added a comment - I think I hit it with both ZFS and ldiskfs using OFD, though I don't think it matters, since the error is on the MDT?

            People

              niu Niu Yawei (Inactive)
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: