[LU-2663] lfsck: e2fsck [QUOTA WARNING] Usage inconsistent Created: 22/Jan/13  Updated: 22/Apr/13  Resolved: 31/Jan/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: Lustre 2.4.0

Type: Bug Priority: Blocker
Reporter: Maloo Assignee: Niu Yawei (Inactive)
Resolution: Fixed Votes: 0
Labels: LB

Issue Links:
Related
is related to LU-2097 sanity.sh test_17m, lfsck: e2fsck fai... Resolved
is related to LU-2694 lfsck in e2fsprogs is out of date Resolved
Severity: 3
Rank (Obsolete): 6213

 Description   

This issue was created by maloo for liuying <emoly.liu@intel.com>

This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/ffe70358-644c-11e2-a1aa-52540035b04c.

20:33:45:I/O read: 6MB, write: 0MB, rate: 17.00MB/s
20:33:45: lfsck : @@@@@@ FAIL: e2fsck d -v -t -t -f -n --mdsdb /home/cgearing/.autotest/shared_dir/2013-01-21/201214-70220949137480/mdsdb --ostdb /home/cgearing/.autotest/shared_dir/2013-01-21/201214-70220949137480/ostdb-0 /dev/mapper/lvm-OSS-P1 returned 4, should be <= 1
20:33:45: Trace dump:
20:33:45: = /usr/lib64/lustre/tests/test-framework.sh:3975:error_noexit()
20:33:45: = /usr/lib64/lustre/tests/test-framework.sh:3998:error()
20:33:45: = /usr/lib64/lustre/tests/test-framework.sh:3481:run_e2fsck()
20:33:45: = /usr/lib64/lustre/tests/test-framework.sh:3519:generate_db()
20:33:45: = /usr/lib64/lustre/tests/lfsck.sh:259:main()

The 3519:generate_db() in my code is

3515     for node in $(osts_nodes); do
3516         for dev in ${OSTDEVS[i]}; do
3517             run_e2fsck $node $dev "-n --mdsdb $MDSDB --ostdb $OSTDB-$ostidx"
3518             OSTDB_LIST="$OSTDB_LIST $OSTDB-$ostidx"
3519             ostidx=$((ostidx + 1))
3520         done
3521         i=$((i + 1))

But I didn't find anything obviously wrong.



 Comments   
Comment by Emoly Liu [ 22/Jan/13 ]

The same error also happened in https://maloo.whamcloud.com/test_sets/8e542f1a-5b58-11e2-b205-52540035b04c

Comment by Andreas Dilger [ 23/Jan/13 ]

In the e2fsck output it seems there is still an error checking quota:

e2fsck -d -v -t -t -f -n --mdsdb /home/cgearing/.autotest/shared_dir/2013-01-21/201214-70220949137480/mdsdb --ostdb /home/cgearing/.autotest/shared_dir/2013-01-21/201214-70220949137480/ostdb-0 /dev/mapper/lvm--OSS-P1
20:33:45:wtm-6vm4: e2fsck 1.42.6.wc2 (10-Dec-2012)
20:33:45:wtm-6vm4: e2fsck_pass1:1501: increase inode 185 badness 0 to 2
20:33:45:wtm-6vm4: e2fsck_pass1:1501: increase inode 186 badness 0 to 2
20:33:45:wtm-6vm4: [QUOTA WARNING] Usage inconsistent for ID 0:actual (3936256, 177) != expected (6033408, 179)

The "inode badness" warnings are because of precreated inode ctime before the filesystem format time (ctime = 0) and could be fixed, but are not a source of problems.

Comment by Niu Yawei (Inactive) [ 24/Jan/13 ]

lfsck uses debugfs to remove objects on OSTs, which caused the e2fsck reporting quota usage inconsistence. In the autotest, the lustre filesystem is usually not newly created, then lfsck will not remove objects by debugfs, that's why lfsck often pass in autotest.

lfsck.sh:

if is_empty_fs $MOUNT; then
 ...
  # remove objects associated with files in group $OBJGRP
  # on the OST with index $OSTIDX
  remove_objects $OSTNODE $OSTDEV $OBJGRP $OST_REMOVE || \
    error "removing objects failed"
  ...
else # is_empty_fs $MOUNT
    FSCK_MAX_ERR=4   # file system errors left uncorrected
fi
Comment by Niu Yawei (Inactive) [ 24/Jan/13 ]

The lfsck.sh script problem can be easily fixed by changing the "debugfs remove" to "local mount remove":

The new method of removing object looks like:

# Remove objects associated with files.
remove_objects() {
        local ostdev=$1
        shift
        local group=$1
        shift
        local objids="$@"
        local facet=ost$((OSTIDX + 1))
        local mntpt=$(facet_mntpt $facet)
        local opts=$OST_MOUNT_OPTS
        local i
        local rc

        echo "removing objects from $ostdev on $facet: $objids"
        if ! do_facet $facet test -b $ostdev; then
                opts=$(csa_add "$opts" -o loop)
        fi
        mount -t $(facet_fstype $facet) $opts $ostdev $mntpt ||
                return $?
        rc=0;
        for i in $objids; do
                rm $mntpt/O/$group/d$((i % 32))/$i || { rc=$?; break; }
        done
        umount -f $mntpt || return $?
        return $rc
}

With the new test script, e2fsck doesn't complain the quota usage inconsistency anymore.

However, seems the lfsck.sh can't pass for the "if is_empty_fs $MOUNT" case at all, during the lfsck fix phase, it'll hit LBUG in mdd_create_data() (because we don't support MDS_OPEN_HAS_OBJS flag anymore), after fixing the LBUG, lfsck still failed:

lfsck -c -l --mdsdb /tmp/mdsdb --ostdb /tmp/ostdb-0 /tmp/ostdb-1 /mnt/lustre
lfsck 1.42.5.wc2 (15-Sep-2012)
lfsck: ost_idx 0: pass1: check for duplicate objects
lfsck: ost_idx 0: pass1 OK (71 files total)
lfsck: ost_idx 0: pass2: check for missing inode objects
Failed to find fid [0x200000400:0x57:0x0]: DB_NOTFOUND: No matching key/data pair found
Failed to find fid [0x200000400:0x59:0x0]: DB_NOTFOUND: No matching key/data pair found
Failed to find fid [0x200000400:0x5b:0x0]: DB_NOTFOUND: No matching key/data pair found
Failed to find fid [0x200000400:0x5d:0x0]: DB_NOTFOUND: No matching key/data pair found
Failed to find fid [0x200000400:0x5f:0x0]: DB_NOTFOUND: No matching key/data pair found
lfsck: ost_idx 0: pass2 OK (76 objects)
lfsck: ost_idx 0: pass3: check for orphan objects
lfsck: [0]: pass3 saved orphan object 0:43, 1048576 bytes
lfsck: [0]: pass3 saved orphan object 0:44, 1048576 bytes
lfsck: [0]: pass3 saved orphan object 0:45, 1048576 bytes
lfsck: [0]: pass3 saved orphan object 0:46, 1048576 bytes
lfsck: [0]: pass3 saved orphan object 0:47, 1048576 bytes
lfsck: ost_idx 0: pass3 FIXED:    5MB of orphan data (5 of 91 files total)
lfsck: ost_idx 1: pass1: check for duplicate objects
lfsck: ost_idx 1: pass1 OK (71 files total)
lfsck: ost_idx 1: pass2: check for missing inode objects
lfsck: ost_idx 1: pass2 OK (76 objects)
lfsck: ost_idx 1: pass3: check for orphan objects
lfsck: [1]: pass3 saved orphan object 0:43, 1048576 bytes
lfsck: [1]: pass3 saved orphan object 0:44, 1048576 bytes
lfsck: [1]: pass3 saved orphan object 0:45, 1048576 bytes
lfsck: [1]: pass3 saved orphan object 0:46, 1048576 bytes
lfsck: [1]: pass3 saved orphan object 0:47, 1048576 bytes
lfsck: ost_idx 1: pass3 FIXED:    5MB of orphan data (5 of 96 files total)
lfsck: pass4: check for 20 duplicate object references
Failed to find fid [0x200000400:0x61:0x0]: DB_NOTFOUND: No matching key/data pair found
Failed to find fid [0x6a:0xedbddba1:0x0]: DB_NOTFOUND: No matching key/data pair found
Failed to find fid [0x200000400:0x63:0x0]: DB_NOTFOUND: No matching key/data pair found
Failed to find fid [0xc1:0xedbddba3:0x0]: DB_NOTFOUND: No matching key/data pair found
Failed to find fid [0x200000400:0x67:0x0]: DB_NOTFOUND: No matching key/data pair found
Failed to find fid [0xc5:0xedbddba7:0x0]: DB_NOTFOUND: No matching key/data pair found
Failed to find fid [0x200000400:0x65:0x0]: DB_NOTFOUND: No matching key/data pair found
Failed to find fid [0xc3:0xedbddba5:0x0]: DB_NOTFOUND: No matching key/data pair found
Failed to find fid [0x200000400:0x69:0x0]: DB_NOTFOUND: No matching key/data pair found
Failed to find fid [0xc7:0xedbddba9:0x0]: DB_NOTFOUND: No matching key/data pair found
Failed to find fid [0x200000400:0x62:0x0]: DB_NOTFOUND: No matching key/data pair found
Failed to find fid [0x6c:0xedbddba2:0x0]: DB_NOTFOUND: No matching key/data pair found
Failed to find fid [0x200000400:0x64:0x0]: DB_NOTFOUND: No matching key/data pair found
Failed to find fid [0xc2:0xedbddba4:0x0]: DB_NOTFOUND: No matching key/data pair found
Failed to find fid [0x200000400:0x68:0x0]: DB_NOTFOUND: No matching key/data pair found
Failed to find fid [0xc6:0xedbddba8:0x0]: DB_NOTFOUND: No matching key/data pair found
Failed to find fid [0x200000400:0x66:0x0]: DB_NOTFOUND: No matching key/data pair found
Failed to find fid [0xc4:0xedbddba6:0x0]: DB_NOTFOUND: No matching key/data pair found
Failed to find fid [0x200000400:0x6a:0x0]: DB_NOTFOUND: No matching key/data pair found
Failed to find fid [0xc8:0xedbddbaa:0x0]: DB_NOTFOUND: No matching key/data pair found
removed directory: `/mnt/lustre/lost+found/duplicates'
lfsck: pass4 finished
lfsck: fixed 10 errors
lfsck finished with rc=1
removed `/tmp/mdsdb'
removed `/tmp/mdsdb.mdshdr'
removed `/tmp/ostdb-0'
removed `/tmp/ostdb-1'
clean after the first check
== lfsck test complete, duration 44 sec == 03:35:27 (1359016527)
rm: cannot remove `/mnt/lustre/d0.lfsck/testfile.18.bad': Input/output error
rm: cannot remove `/mnt/lustre/d0.lfsck/testfile.17.bad': Input/output error
rm: cannot remove `/mnt/lustre/d0.lfsck/testfile.19.bad': Input/output error
rm: cannot remove `/mnt/lustre/d0.lfsck/testfile.15': Input/output error
rm: cannot remove `/mnt/lustre/d0.lfsck/testfile.14.bad': Input/output error
rm: cannot remove `/mnt/lustre/d0.lfsck/testfile.20.bad': Input/output error
rm: cannot remove `/mnt/lustre/d0.lfsck/testfile.12': Input/output error
 lfsck : @@@@@@ FAIL: remove sub-test dirs failed
  Trace dump:
  = /home/niu/lustre/lustre-master/lustre/tests/test-framework.sh:3957:error_noexit()
  = /home/niu/lustre/lustre-master/lustre/tests/test-framework.sh:3980:error()
  = /home/niu/lustre/lustre-master/lustre/tests/test-framework.sh:3565:check_and_cleanup_lustre()
  = lfsck.sh:291:main()
Dumping lctl log to /tmp/test_logs/1359016483/lfsck..*.1359016527.log
Dumping logs only on local client.

I'm not sure if lfsck is still being supportted or used by user? If not, I think we can just leave these errors behind, otherwise, we'd open other tickets and fix lfsck itself. Andreas, any comment? Thanks.

Comment by Niu Yawei (Inactive) [ 28/Jan/13 ]

change 'debugfs remove' to 'mount remove' in lfsck.sh, fixed improper LBUG in mdd_create_data(): http://review.whamcloud.com/5186

Comment by Niu Yawei (Inactive) [ 31/Jan/13 ]

patch landed for 2.4

Generated at Sat Feb 10 01:27:08 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.