[LU-2097] sanity.sh test_17m, lfsck: e2fsck failed due to MDT quota accounting error Created: 05/Oct/12  Updated: 19/Apr/13  Resolved: 29/Oct/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: Lustre 2.4.0

Type: Bug Priority: Blocker
Reporter: Andreas Dilger Assignee: Niu Yawei (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

Single node test system (client, MDT, 3x OSTs on same node), x86_64
Lustre master "LU-1842 quota: add quotactl support on qmt" (commit 294aa9cb666c48e02da1057c222fe5f206ce38fc)


Issue Links:
Related
is related to LU-2663 lfsck: e2fsck [QUOTA WARNING] Usage i... Resolved
Severity: 3
Rank (Obsolete): 4381

 Description   

Running acc-sm on a local ldiskfs test system causes e2fsck to fail when checking the MDT in sanity.sh test_17m. I'm running e2fsprogs-1.42.5.wc3-7.fc13.x86_64, which I don't think is the issue, but it hasn't been released yet.

It isn't clear when the quota accounting issue is first introduced (even if it is present when the filesystem is formatted), but the problem needs to be resolved for 2.4.0 and/or the e2fsprogs-1.42.5.wc3 release.

== sanity test 17m: run e2fsck against MDT which contains short/long symlink 21:55:58 (1349409358)
create 512 short and long symlink files under /mnt/testfs/d0.sanity/d17m
erase them
recreate the 512 symlink files with a shorter string
stop and checking mds1: e2fsck -fnvd /dev/vg_sookie/lvmdt1
Stopping /mnt/mds1 (opts:-f) on sookie-gig.adilger.int
e2fsck 1.42.5.wc3 (15-Sep-2012)
[QUOTA WARNING] Usage inconsistent for ID 0:actual (3649536, 1281) != expected (3551232, 1281)
[QUOTA WARNING] Usage inconsistent for ID 0:actual (3645440, 1280) != expected (3547136, 1280)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Update quota info for quota type 0? no

Update quota info for quota type 1? no

testfs-MDT0000: ********** WARNING: Filesystem still has errors **********

        1293 inodes used (1.29%, out of 100000)
           8 non-contiguous files (0.6%)
           1 non-contiguous directory (0.1%)
             # of inodes with ind/dind/tind blocks: 3/0/0
       17587 blocks used (35.17%, out of 50000)
           0 bad blocks
           1 large file

         109 regular files
         135 directories
           0 character device files
           0 block device files
           0 fifos
           8 links
        1040 symbolic links (521 fast symbolic links)
           0 sockets
------------
        1292 files
Starting mds1:   /dev/vg_sookie/lvmdt1 /mnt/mds1
Started testfs-MDT0000
 sanity test_17m: @@@@@@ FAIL: e2fsck should not report error upon  short/long symlink MDT: rc=4 


 Comments   
Comment by Andreas Dilger [ 05/Oct/12 ]

I saw the same problem running mmp.sh test_8 on a newly-formatted MDT filesystem (this is the first test that actually runs e2fsck):

Running e2fsck on the device /dev/vg_sookie/lvmdt1 on mds1...
e2fsck 1.42.5.wc3 (15-Sep-2012)
[QUOTA WARNING] Usage inconsistent for ID 0:actual (1294336, 206) != expected (1273856, 206)
[QUOTA WARNING] Usage inconsistent for ID 0:actual (1294336, 206) != expected (1273856, 206)
MMP interval is 30 seconds and total wait time is 122 seconds. Please wait...
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Setting filetype for entry '16' in /O/1/d16 (50023) to 1.
Setting filetype for entry '17' in /O/1/d17 (50024) to 1.
Setting filetype for entry '18' in /O/1/d18 (50025) to 1.
Setting filetype for entry '19' in /O/1/d19 (50026) to 1.
Setting filetype for entry '20' in /O/1/d20 (50027) to 1.
Setting filetype for entry '21' in /O/1/d21 (50028) to 1.
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Update quota info for quota type 0? yes

Update quota info for quota type 1? yes


testfs-MDT0000: ***** FILE SYSTEM WAS MODIFIED *****
testfs-MDT0000: 215/100000 files (2.8% non-contiguous), 17012/50000 blocks

I can imagine this happening fairly easily if we have code paths that are not updating the root quota usage properly?

The filetype errors are in LU-1861 and not related to this problem.

Comment by Johann Lombardi (Inactive) [ 05/Oct/12 ]

Andreas, is this with OFD or obdfilter?

Comment by Andreas Dilger [ 07/Oct/12 ]

I think I hit it with both ZFS and ldiskfs using OFD, though I don't think it matters, since the error is on the MDT?

Comment by Alex Zhuravlev [ 07/Oct/12 ]

this is rather strange, because we do not modify accounting ?

Comment by Johann Lombardi (Inactive) [ 07/Oct/12 ]

I think I hit it with both ZFS and ldiskfs using OFD, though I don't think it matters, since the error is on the MDT?

Ah, nm. I thought fsck was run against an OST.

this is rather strange, because we do not modify accounting ?

Right, we don't, but we still bypass the vfs so there might be a code path where we forget to call ll_vfs_dq_transfer() or something similar. It might also be a bug in quota support in e2fsck or even a kernel bug (till now, nothing was checking accounting information correctness).

Niu, could you please have a look at this one?

Comment by Niu Yawei (Inactive) [ 08/Oct/12 ]

I can reproduce it by "NOFORMAT=1 sh llmount.sh; ONLY=17m sh sanity.sh" or running the 17m twice "ONLY=17m sh sanity.sh; ONLY=17m sh sanity.sh", but seems OSTs doesn't have such problem, so it's unlikely a kernel or e2fsprogs bug.

I think it because we missed ll_vfs_dq_init() for some system objects, like llog, oi, last_rcvd, etc. so the block accounting for those existing objects is missed. (if they are newly created, then there isn't any problem), I'll post a patch to fix it soon.

Comment by Niu Yawei (Inactive) [ 08/Oct/12 ]

http://review.whamcloud.com/4220

Comment by Andreas Dilger [ 08/Oct/12 ]

Patch is merged, just after the 2.3.53 tag.

Comment by Andreas Dilger [ 12/Oct/12 ]

Was this patch actually tested to fix the problem? I still see the same issue here (running mmp.sh test 8 in this case):

e2fsck 1.42.5.wc3 (15-Sep-2012)
Starting mds1:   /dev/vg_sookie/lvmdt1 /mnt/mds1
Start of /dev/vg_sookie/lvmdt1 on mds1 failed 1
[QUOTA WARNING] Usage inconsistent for ID 0:actual (1294336, 283) != expected (1253376, 279)
[QUOTA WARNING] Usage inconsistent for ID 0:actual (1294336, 283) != expected (1253376, 279)
:
:
[problems from LU-1861]
:
:
Update quota info for quota type 0? yes

Update quota info for quota type 1? yes

Filesystem had previously been running other acceptance small tests, but I believe this hits early enough and consistently in sanity.sh that it is easily reproduced.

Comment by Niu Yawei (Inactive) [ 15/Oct/12 ]

I verified the test fixed the problem of:

  • NOFORMAT=1 sh llmount.sh; ONLY=17m sh sanity.sh (accounting for llog objects created during mount)
  • ONLY=17m sh sanity.sh; ONLY=17m sh sanity.sh (accounting for oi)

Maybe there is till something missed, I'm checking into this. Thank you, Andreas.

Comment by Niu Yawei (Inactive) [ 15/Oct/12 ]

more ll_vfs_dq_init() are added: http://review.whamcloud.com/427

Without above patch, I can reproduce the problem by "sh sanity.sh; ONLY=8 sh mmp.sh".

Comment by Niu Yawei (Inactive) [ 29/Oct/12 ]

patches landed for 2.4

Generated at Sat Feb 10 01:22:20 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.