[LU-2097] sanity.sh test_17m, lfsck: e2fsck failed due to MDT quota accounting error Created: 05/Oct/12 Updated: 19/Apr/13 Resolved: 29/Oct/12 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0 |
| Fix Version/s: | Lustre 2.4.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Andreas Dilger | Assignee: | Niu Yawei (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Single node test system (client, MDT, 3x OSTs on same node), x86_64 |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 4381 | ||||||||
| Description |
|
Running acc-sm on a local ldiskfs test system causes e2fsck to fail when checking the MDT in sanity.sh test_17m. I'm running e2fsprogs-1.42.5.wc3-7.fc13.x86_64, which I don't think is the issue, but it hasn't been released yet. It isn't clear when the quota accounting issue is first introduced (even if it is present when the filesystem is formatted), but the problem needs to be resolved for 2.4.0 and/or the e2fsprogs-1.42.5.wc3 release. == sanity test 17m: run e2fsck against MDT which contains short/long symlink 21:55:58 (1349409358)
create 512 short and long symlink files under /mnt/testfs/d0.sanity/d17m
erase them
recreate the 512 symlink files with a shorter string
stop and checking mds1: e2fsck -fnvd /dev/vg_sookie/lvmdt1
Stopping /mnt/mds1 (opts:-f) on sookie-gig.adilger.int
e2fsck 1.42.5.wc3 (15-Sep-2012)
[QUOTA WARNING] Usage inconsistent for ID 0:actual (3649536, 1281) != expected (3551232, 1281)
[QUOTA WARNING] Usage inconsistent for ID 0:actual (3645440, 1280) != expected (3547136, 1280)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Update quota info for quota type 0? no
Update quota info for quota type 1? no
testfs-MDT0000: ********** WARNING: Filesystem still has errors **********
1293 inodes used (1.29%, out of 100000)
8 non-contiguous files (0.6%)
1 non-contiguous directory (0.1%)
# of inodes with ind/dind/tind blocks: 3/0/0
17587 blocks used (35.17%, out of 50000)
0 bad blocks
1 large file
109 regular files
135 directories
0 character device files
0 block device files
0 fifos
8 links
1040 symbolic links (521 fast symbolic links)
0 sockets
------------
1292 files
Starting mds1: /dev/vg_sookie/lvmdt1 /mnt/mds1
Started testfs-MDT0000
sanity test_17m: @@@@@@ FAIL: e2fsck should not report error upon short/long symlink MDT: rc=4
|
| Comments |
| Comment by Andreas Dilger [ 05/Oct/12 ] |
|
I saw the same problem running mmp.sh test_8 on a newly-formatted MDT filesystem (this is the first test that actually runs e2fsck): Running e2fsck on the device /dev/vg_sookie/lvmdt1 on mds1... e2fsck 1.42.5.wc3 (15-Sep-2012) [QUOTA WARNING] Usage inconsistent for ID 0:actual (1294336, 206) != expected (1273856, 206) [QUOTA WARNING] Usage inconsistent for ID 0:actual (1294336, 206) != expected (1273856, 206) MMP interval is 30 seconds and total wait time is 122 seconds. Please wait... Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Setting filetype for entry '16' in /O/1/d16 (50023) to 1. Setting filetype for entry '17' in /O/1/d17 (50024) to 1. Setting filetype for entry '18' in /O/1/d18 (50025) to 1. Setting filetype for entry '19' in /O/1/d19 (50026) to 1. Setting filetype for entry '20' in /O/1/d20 (50027) to 1. Setting filetype for entry '21' in /O/1/d21 (50028) to 1. Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information Update quota info for quota type 0? yes Update quota info for quota type 1? yes testfs-MDT0000: ***** FILE SYSTEM WAS MODIFIED ***** testfs-MDT0000: 215/100000 files (2.8% non-contiguous), 17012/50000 blocks I can imagine this happening fairly easily if we have code paths that are not updating the root quota usage properly? The filetype errors are in |
| Comment by Johann Lombardi (Inactive) [ 05/Oct/12 ] |
|
Andreas, is this with OFD or obdfilter? |
| Comment by Andreas Dilger [ 07/Oct/12 ] |
|
I think I hit it with both ZFS and ldiskfs using OFD, though I don't think it matters, since the error is on the MDT? |
| Comment by Alex Zhuravlev [ 07/Oct/12 ] |
|
this is rather strange, because we do not modify accounting ? |
| Comment by Johann Lombardi (Inactive) [ 07/Oct/12 ] |
Ah, nm. I thought fsck was run against an OST.
Right, we don't, but we still bypass the vfs so there might be a code path where we forget to call ll_vfs_dq_transfer() or something similar. It might also be a bug in quota support in e2fsck or even a kernel bug (till now, nothing was checking accounting information correctness). Niu, could you please have a look at this one? |
| Comment by Niu Yawei (Inactive) [ 08/Oct/12 ] |
|
I can reproduce it by "NOFORMAT=1 sh llmount.sh; ONLY=17m sh sanity.sh" or running the 17m twice "ONLY=17m sh sanity.sh; ONLY=17m sh sanity.sh", but seems OSTs doesn't have such problem, so it's unlikely a kernel or e2fsprogs bug. I think it because we missed ll_vfs_dq_init() for some system objects, like llog, oi, last_rcvd, etc. so the block accounting for those existing objects is missed. (if they are newly created, then there isn't any problem), I'll post a patch to fix it soon. |
| Comment by Niu Yawei (Inactive) [ 08/Oct/12 ] |
| Comment by Andreas Dilger [ 08/Oct/12 ] |
|
Patch is merged, just after the 2.3.53 tag. |
| Comment by Andreas Dilger [ 12/Oct/12 ] |
|
Was this patch actually tested to fix the problem? I still see the same issue here (running mmp.sh test 8 in this case): e2fsck 1.42.5.wc3 (15-Sep-2012) Starting mds1: /dev/vg_sookie/lvmdt1 /mnt/mds1 Start of /dev/vg_sookie/lvmdt1 on mds1 failed 1 [QUOTA WARNING] Usage inconsistent for ID 0:actual (1294336, 283) != expected (1253376, 279) [QUOTA WARNING] Usage inconsistent for ID 0:actual (1294336, 283) != expected (1253376, 279) : : [problems from LU-1861] : : Update quota info for quota type 0? yes Update quota info for quota type 1? yes Filesystem had previously been running other acceptance small tests, but I believe this hits early enough and consistently in sanity.sh that it is easily reproduced. |
| Comment by Niu Yawei (Inactive) [ 15/Oct/12 ] |
|
I verified the test fixed the problem of:
Maybe there is till something missed, I'm checking into this. Thank you, Andreas. |
| Comment by Niu Yawei (Inactive) [ 15/Oct/12 ] |
|
more ll_vfs_dq_init() are added: http://review.whamcloud.com/427 Without above patch, I can reproduce the problem by "sh sanity.sh; ONLY=8 sh mmp.sh". |
| Comment by Niu Yawei (Inactive) [ 29/Oct/12 ] |
|
patches landed for 2.4 |