[LU-2663] lfsck: e2fsck [QUOTA WARNING] Usage inconsistent Created: 22/Jan/13 Updated: 22/Apr/13 Resolved: 31/Jan/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0 |
| Fix Version/s: | Lustre 2.4.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Maloo | Assignee: | Niu Yawei (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | LB | ||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 6213 | ||||||||||||
| Description |
|
This issue was created by maloo for liuying <emoly.liu@intel.com> This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/ffe70358-644c-11e2-a1aa-52540035b04c.
The 3519:generate_db() in my code is 3515 for node in $(osts_nodes); do
3516 for dev in ${OSTDEVS[i]}; do
3517 run_e2fsck $node $dev "-n --mdsdb $MDSDB --ostdb $OSTDB-$ostidx"
3518 OSTDB_LIST="$OSTDB_LIST $OSTDB-$ostidx"
3519 ostidx=$((ostidx + 1))
3520 done
3521 i=$((i + 1))
But I didn't find anything obviously wrong. |
| Comments |
| Comment by Emoly Liu [ 22/Jan/13 ] |
|
The same error also happened in https://maloo.whamcloud.com/test_sets/8e542f1a-5b58-11e2-b205-52540035b04c |
| Comment by Andreas Dilger [ 23/Jan/13 ] |
|
In the e2fsck output it seems there is still an error checking quota: e2fsck -d -v -t -t -f -n --mdsdb /home/cgearing/.autotest/shared_dir/2013-01-21/201214-70220949137480/mdsdb --ostdb /home/cgearing/.autotest/shared_dir/2013-01-21/201214-70220949137480/ostdb-0 /dev/mapper/lvm--OSS-P1 20:33:45:wtm-6vm4: e2fsck 1.42.6.wc2 (10-Dec-2012) 20:33:45:wtm-6vm4: e2fsck_pass1:1501: increase inode 185 badness 0 to 2 20:33:45:wtm-6vm4: e2fsck_pass1:1501: increase inode 186 badness 0 to 2 20:33:45:wtm-6vm4: [QUOTA WARNING] Usage inconsistent for ID 0:actual (3936256, 177) != expected (6033408, 179) The "inode badness" warnings are because of precreated inode ctime before the filesystem format time (ctime = 0) and could be fixed, but are not a source of problems. |
| Comment by Niu Yawei (Inactive) [ 24/Jan/13 ] |
|
lfsck uses debugfs to remove objects on OSTs, which caused the e2fsck reporting quota usage inconsistence. In the autotest, the lustre filesystem is usually not newly created, then lfsck will not remove objects by debugfs, that's why lfsck often pass in autotest. lfsck.sh: if is_empty_fs $MOUNT; then ... # remove objects associated with files in group $OBJGRP # on the OST with index $OSTIDX remove_objects $OSTNODE $OSTDEV $OBJGRP $OST_REMOVE || \ error "removing objects failed" ... else # is_empty_fs $MOUNT FSCK_MAX_ERR=4 # file system errors left uncorrected fi |
| Comment by Niu Yawei (Inactive) [ 24/Jan/13 ] |
|
The lfsck.sh script problem can be easily fixed by changing the "debugfs remove" to "local mount remove": The new method of removing object looks like: # Remove objects associated with files.
remove_objects() {
local ostdev=$1
shift
local group=$1
shift
local objids="$@"
local facet=ost$((OSTIDX + 1))
local mntpt=$(facet_mntpt $facet)
local opts=$OST_MOUNT_OPTS
local i
local rc
echo "removing objects from $ostdev on $facet: $objids"
if ! do_facet $facet test -b $ostdev; then
opts=$(csa_add "$opts" -o loop)
fi
mount -t $(facet_fstype $facet) $opts $ostdev $mntpt ||
return $?
rc=0;
for i in $objids; do
rm $mntpt/O/$group/d$((i % 32))/$i || { rc=$?; break; }
done
umount -f $mntpt || return $?
return $rc
}
With the new test script, e2fsck doesn't complain the quota usage inconsistency anymore. However, seems the lfsck.sh can't pass for the "if is_empty_fs $MOUNT" case at all, during the lfsck fix phase, it'll hit LBUG in mdd_create_data() (because we don't support MDS_OPEN_HAS_OBJS flag anymore), after fixing the LBUG, lfsck still failed: lfsck -c -l --mdsdb /tmp/mdsdb --ostdb /tmp/ostdb-0 /tmp/ostdb-1 /mnt/lustre lfsck 1.42.5.wc2 (15-Sep-2012) lfsck: ost_idx 0: pass1: check for duplicate objects lfsck: ost_idx 0: pass1 OK (71 files total) lfsck: ost_idx 0: pass2: check for missing inode objects Failed to find fid [0x200000400:0x57:0x0]: DB_NOTFOUND: No matching key/data pair found Failed to find fid [0x200000400:0x59:0x0]: DB_NOTFOUND: No matching key/data pair found Failed to find fid [0x200000400:0x5b:0x0]: DB_NOTFOUND: No matching key/data pair found Failed to find fid [0x200000400:0x5d:0x0]: DB_NOTFOUND: No matching key/data pair found Failed to find fid [0x200000400:0x5f:0x0]: DB_NOTFOUND: No matching key/data pair found lfsck: ost_idx 0: pass2 OK (76 objects) lfsck: ost_idx 0: pass3: check for orphan objects lfsck: [0]: pass3 saved orphan object 0:43, 1048576 bytes lfsck: [0]: pass3 saved orphan object 0:44, 1048576 bytes lfsck: [0]: pass3 saved orphan object 0:45, 1048576 bytes lfsck: [0]: pass3 saved orphan object 0:46, 1048576 bytes lfsck: [0]: pass3 saved orphan object 0:47, 1048576 bytes lfsck: ost_idx 0: pass3 FIXED: 5MB of orphan data (5 of 91 files total) lfsck: ost_idx 1: pass1: check for duplicate objects lfsck: ost_idx 1: pass1 OK (71 files total) lfsck: ost_idx 1: pass2: check for missing inode objects lfsck: ost_idx 1: pass2 OK (76 objects) lfsck: ost_idx 1: pass3: check for orphan objects lfsck: [1]: pass3 saved orphan object 0:43, 1048576 bytes lfsck: [1]: pass3 saved orphan object 0:44, 1048576 bytes lfsck: [1]: pass3 saved orphan object 0:45, 1048576 bytes lfsck: [1]: pass3 saved orphan object 0:46, 1048576 bytes lfsck: [1]: pass3 saved orphan object 0:47, 1048576 bytes lfsck: ost_idx 1: pass3 FIXED: 5MB of orphan data (5 of 96 files total) lfsck: pass4: check for 20 duplicate object references Failed to find fid [0x200000400:0x61:0x0]: DB_NOTFOUND: No matching key/data pair found Failed to find fid [0x6a:0xedbddba1:0x0]: DB_NOTFOUND: No matching key/data pair found Failed to find fid [0x200000400:0x63:0x0]: DB_NOTFOUND: No matching key/data pair found Failed to find fid [0xc1:0xedbddba3:0x0]: DB_NOTFOUND: No matching key/data pair found Failed to find fid [0x200000400:0x67:0x0]: DB_NOTFOUND: No matching key/data pair found Failed to find fid [0xc5:0xedbddba7:0x0]: DB_NOTFOUND: No matching key/data pair found Failed to find fid [0x200000400:0x65:0x0]: DB_NOTFOUND: No matching key/data pair found Failed to find fid [0xc3:0xedbddba5:0x0]: DB_NOTFOUND: No matching key/data pair found Failed to find fid [0x200000400:0x69:0x0]: DB_NOTFOUND: No matching key/data pair found Failed to find fid [0xc7:0xedbddba9:0x0]: DB_NOTFOUND: No matching key/data pair found Failed to find fid [0x200000400:0x62:0x0]: DB_NOTFOUND: No matching key/data pair found Failed to find fid [0x6c:0xedbddba2:0x0]: DB_NOTFOUND: No matching key/data pair found Failed to find fid [0x200000400:0x64:0x0]: DB_NOTFOUND: No matching key/data pair found Failed to find fid [0xc2:0xedbddba4:0x0]: DB_NOTFOUND: No matching key/data pair found Failed to find fid [0x200000400:0x68:0x0]: DB_NOTFOUND: No matching key/data pair found Failed to find fid [0xc6:0xedbddba8:0x0]: DB_NOTFOUND: No matching key/data pair found Failed to find fid [0x200000400:0x66:0x0]: DB_NOTFOUND: No matching key/data pair found Failed to find fid [0xc4:0xedbddba6:0x0]: DB_NOTFOUND: No matching key/data pair found Failed to find fid [0x200000400:0x6a:0x0]: DB_NOTFOUND: No matching key/data pair found Failed to find fid [0xc8:0xedbddbaa:0x0]: DB_NOTFOUND: No matching key/data pair found removed directory: `/mnt/lustre/lost+found/duplicates' lfsck: pass4 finished lfsck: fixed 10 errors lfsck finished with rc=1 removed `/tmp/mdsdb' removed `/tmp/mdsdb.mdshdr' removed `/tmp/ostdb-0' removed `/tmp/ostdb-1' clean after the first check == lfsck test complete, duration 44 sec == 03:35:27 (1359016527) rm: cannot remove `/mnt/lustre/d0.lfsck/testfile.18.bad': Input/output error rm: cannot remove `/mnt/lustre/d0.lfsck/testfile.17.bad': Input/output error rm: cannot remove `/mnt/lustre/d0.lfsck/testfile.19.bad': Input/output error rm: cannot remove `/mnt/lustre/d0.lfsck/testfile.15': Input/output error rm: cannot remove `/mnt/lustre/d0.lfsck/testfile.14.bad': Input/output error rm: cannot remove `/mnt/lustre/d0.lfsck/testfile.20.bad': Input/output error rm: cannot remove `/mnt/lustre/d0.lfsck/testfile.12': Input/output error lfsck : @@@@@@ FAIL: remove sub-test dirs failed Trace dump: = /home/niu/lustre/lustre-master/lustre/tests/test-framework.sh:3957:error_noexit() = /home/niu/lustre/lustre-master/lustre/tests/test-framework.sh:3980:error() = /home/niu/lustre/lustre-master/lustre/tests/test-framework.sh:3565:check_and_cleanup_lustre() = lfsck.sh:291:main() Dumping lctl log to /tmp/test_logs/1359016483/lfsck..*.1359016527.log Dumping logs only on local client. I'm not sure if lfsck is still being supportted or used by user? If not, I think we can just leave these errors behind, otherwise, we'd open other tickets and fix lfsck itself. Andreas, any comment? Thanks. |
| Comment by Niu Yawei (Inactive) [ 28/Jan/13 ] |
|
change 'debugfs remove' to 'mount remove' in lfsck.sh, fixed improper LBUG in mdd_create_data(): http://review.whamcloud.com/5186 |
| Comment by Niu Yawei (Inactive) [ 31/Jan/13 ] |
|
patch landed for 2.4 |