Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2663

lfsck: e2fsck [QUOTA WARNING] Usage inconsistent

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.4.0
    • Lustre 2.4.0
    • 3
    • 6213

    Description

      This issue was created by maloo for liuying <emoly.liu@intel.com>

      This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/ffe70358-644c-11e2-a1aa-52540035b04c.

      20:33:45:I/O read: 6MB, write: 0MB, rate: 17.00MB/s
      20:33:45: lfsck : @@@@@@ FAIL: e2fsck d -v -t -t -f -n --mdsdb /home/cgearing/.autotest/shared_dir/2013-01-21/201214-70220949137480/mdsdb --ostdb /home/cgearing/.autotest/shared_dir/2013-01-21/201214-70220949137480/ostdb-0 /dev/mapper/lvm-OSS-P1 returned 4, should be <= 1
      20:33:45: Trace dump:
      20:33:45: = /usr/lib64/lustre/tests/test-framework.sh:3975:error_noexit()
      20:33:45: = /usr/lib64/lustre/tests/test-framework.sh:3998:error()
      20:33:45: = /usr/lib64/lustre/tests/test-framework.sh:3481:run_e2fsck()
      20:33:45: = /usr/lib64/lustre/tests/test-framework.sh:3519:generate_db()
      20:33:45: = /usr/lib64/lustre/tests/lfsck.sh:259:main()

      The 3519:generate_db() in my code is

      3515     for node in $(osts_nodes); do
      3516         for dev in ${OSTDEVS[i]}; do
      3517             run_e2fsck $node $dev "-n --mdsdb $MDSDB --ostdb $OSTDB-$ostidx"
      3518             OSTDB_LIST="$OSTDB_LIST $OSTDB-$ostidx"
      3519             ostidx=$((ostidx + 1))
      3520         done
      3521         i=$((i + 1))
      

      But I didn't find anything obviously wrong.

      Attachments

        Issue Links

          Activity

            [LU-2663] lfsck: e2fsck [QUOTA WARNING] Usage inconsistent

            patch landed for 2.4

            niu Niu Yawei (Inactive) added a comment - patch landed for 2.4

            change 'debugfs remove' to 'mount remove' in lfsck.sh, fixed improper LBUG in mdd_create_data(): http://review.whamcloud.com/5186

            niu Niu Yawei (Inactive) added a comment - change 'debugfs remove' to 'mount remove' in lfsck.sh, fixed improper LBUG in mdd_create_data(): http://review.whamcloud.com/5186

            The lfsck.sh script problem can be easily fixed by changing the "debugfs remove" to "local mount remove":

            The new method of removing object looks like:

            # Remove objects associated with files.
            remove_objects() {
                    local ostdev=$1
                    shift
                    local group=$1
                    shift
                    local objids="$@"
                    local facet=ost$((OSTIDX + 1))
                    local mntpt=$(facet_mntpt $facet)
                    local opts=$OST_MOUNT_OPTS
                    local i
                    local rc
            
                    echo "removing objects from $ostdev on $facet: $objids"
                    if ! do_facet $facet test -b $ostdev; then
                            opts=$(csa_add "$opts" -o loop)
                    fi
                    mount -t $(facet_fstype $facet) $opts $ostdev $mntpt ||
                            return $?
                    rc=0;
                    for i in $objids; do
                            rm $mntpt/O/$group/d$((i % 32))/$i || { rc=$?; break; }
                    done
                    umount -f $mntpt || return $?
                    return $rc
            }
            

            With the new test script, e2fsck doesn't complain the quota usage inconsistency anymore.

            However, seems the lfsck.sh can't pass for the "if is_empty_fs $MOUNT" case at all, during the lfsck fix phase, it'll hit LBUG in mdd_create_data() (because we don't support MDS_OPEN_HAS_OBJS flag anymore), after fixing the LBUG, lfsck still failed:

            lfsck -c -l --mdsdb /tmp/mdsdb --ostdb /tmp/ostdb-0 /tmp/ostdb-1 /mnt/lustre
            lfsck 1.42.5.wc2 (15-Sep-2012)
            lfsck: ost_idx 0: pass1: check for duplicate objects
            lfsck: ost_idx 0: pass1 OK (71 files total)
            lfsck: ost_idx 0: pass2: check for missing inode objects
            Failed to find fid [0x200000400:0x57:0x0]: DB_NOTFOUND: No matching key/data pair found
            Failed to find fid [0x200000400:0x59:0x0]: DB_NOTFOUND: No matching key/data pair found
            Failed to find fid [0x200000400:0x5b:0x0]: DB_NOTFOUND: No matching key/data pair found
            Failed to find fid [0x200000400:0x5d:0x0]: DB_NOTFOUND: No matching key/data pair found
            Failed to find fid [0x200000400:0x5f:0x0]: DB_NOTFOUND: No matching key/data pair found
            lfsck: ost_idx 0: pass2 OK (76 objects)
            lfsck: ost_idx 0: pass3: check for orphan objects
            lfsck: [0]: pass3 saved orphan object 0:43, 1048576 bytes
            lfsck: [0]: pass3 saved orphan object 0:44, 1048576 bytes
            lfsck: [0]: pass3 saved orphan object 0:45, 1048576 bytes
            lfsck: [0]: pass3 saved orphan object 0:46, 1048576 bytes
            lfsck: [0]: pass3 saved orphan object 0:47, 1048576 bytes
            lfsck: ost_idx 0: pass3 FIXED:    5MB of orphan data (5 of 91 files total)
            lfsck: ost_idx 1: pass1: check for duplicate objects
            lfsck: ost_idx 1: pass1 OK (71 files total)
            lfsck: ost_idx 1: pass2: check for missing inode objects
            lfsck: ost_idx 1: pass2 OK (76 objects)
            lfsck: ost_idx 1: pass3: check for orphan objects
            lfsck: [1]: pass3 saved orphan object 0:43, 1048576 bytes
            lfsck: [1]: pass3 saved orphan object 0:44, 1048576 bytes
            lfsck: [1]: pass3 saved orphan object 0:45, 1048576 bytes
            lfsck: [1]: pass3 saved orphan object 0:46, 1048576 bytes
            lfsck: [1]: pass3 saved orphan object 0:47, 1048576 bytes
            lfsck: ost_idx 1: pass3 FIXED:    5MB of orphan data (5 of 96 files total)
            lfsck: pass4: check for 20 duplicate object references
            Failed to find fid [0x200000400:0x61:0x0]: DB_NOTFOUND: No matching key/data pair found
            Failed to find fid [0x6a:0xedbddba1:0x0]: DB_NOTFOUND: No matching key/data pair found
            Failed to find fid [0x200000400:0x63:0x0]: DB_NOTFOUND: No matching key/data pair found
            Failed to find fid [0xc1:0xedbddba3:0x0]: DB_NOTFOUND: No matching key/data pair found
            Failed to find fid [0x200000400:0x67:0x0]: DB_NOTFOUND: No matching key/data pair found
            Failed to find fid [0xc5:0xedbddba7:0x0]: DB_NOTFOUND: No matching key/data pair found
            Failed to find fid [0x200000400:0x65:0x0]: DB_NOTFOUND: No matching key/data pair found
            Failed to find fid [0xc3:0xedbddba5:0x0]: DB_NOTFOUND: No matching key/data pair found
            Failed to find fid [0x200000400:0x69:0x0]: DB_NOTFOUND: No matching key/data pair found
            Failed to find fid [0xc7:0xedbddba9:0x0]: DB_NOTFOUND: No matching key/data pair found
            Failed to find fid [0x200000400:0x62:0x0]: DB_NOTFOUND: No matching key/data pair found
            Failed to find fid [0x6c:0xedbddba2:0x0]: DB_NOTFOUND: No matching key/data pair found
            Failed to find fid [0x200000400:0x64:0x0]: DB_NOTFOUND: No matching key/data pair found
            Failed to find fid [0xc2:0xedbddba4:0x0]: DB_NOTFOUND: No matching key/data pair found
            Failed to find fid [0x200000400:0x68:0x0]: DB_NOTFOUND: No matching key/data pair found
            Failed to find fid [0xc6:0xedbddba8:0x0]: DB_NOTFOUND: No matching key/data pair found
            Failed to find fid [0x200000400:0x66:0x0]: DB_NOTFOUND: No matching key/data pair found
            Failed to find fid [0xc4:0xedbddba6:0x0]: DB_NOTFOUND: No matching key/data pair found
            Failed to find fid [0x200000400:0x6a:0x0]: DB_NOTFOUND: No matching key/data pair found
            Failed to find fid [0xc8:0xedbddbaa:0x0]: DB_NOTFOUND: No matching key/data pair found
            removed directory: `/mnt/lustre/lost+found/duplicates'
            lfsck: pass4 finished
            lfsck: fixed 10 errors
            lfsck finished with rc=1
            removed `/tmp/mdsdb'
            removed `/tmp/mdsdb.mdshdr'
            removed `/tmp/ostdb-0'
            removed `/tmp/ostdb-1'
            clean after the first check
            == lfsck test complete, duration 44 sec == 03:35:27 (1359016527)
            rm: cannot remove `/mnt/lustre/d0.lfsck/testfile.18.bad': Input/output error
            rm: cannot remove `/mnt/lustre/d0.lfsck/testfile.17.bad': Input/output error
            rm: cannot remove `/mnt/lustre/d0.lfsck/testfile.19.bad': Input/output error
            rm: cannot remove `/mnt/lustre/d0.lfsck/testfile.15': Input/output error
            rm: cannot remove `/mnt/lustre/d0.lfsck/testfile.14.bad': Input/output error
            rm: cannot remove `/mnt/lustre/d0.lfsck/testfile.20.bad': Input/output error
            rm: cannot remove `/mnt/lustre/d0.lfsck/testfile.12': Input/output error
             lfsck : @@@@@@ FAIL: remove sub-test dirs failed
              Trace dump:
              = /home/niu/lustre/lustre-master/lustre/tests/test-framework.sh:3957:error_noexit()
              = /home/niu/lustre/lustre-master/lustre/tests/test-framework.sh:3980:error()
              = /home/niu/lustre/lustre-master/lustre/tests/test-framework.sh:3565:check_and_cleanup_lustre()
              = lfsck.sh:291:main()
            Dumping lctl log to /tmp/test_logs/1359016483/lfsck..*.1359016527.log
            Dumping logs only on local client.
            

            I'm not sure if lfsck is still being supportted or used by user? If not, I think we can just leave these errors behind, otherwise, we'd open other tickets and fix lfsck itself. Andreas, any comment? Thanks.

            niu Niu Yawei (Inactive) added a comment - The lfsck.sh script problem can be easily fixed by changing the "debugfs remove" to "local mount remove": The new method of removing object looks like: # Remove objects associated with files. remove_objects() { local ostdev=$1 shift local group=$1 shift local objids= "$@" local facet=ost$((OSTIDX + 1)) local mntpt=$(facet_mntpt $facet) local opts=$OST_MOUNT_OPTS local i local rc echo "removing objects from $ostdev on $facet: $objids" if ! do_facet $facet test -b $ostdev; then opts=$(csa_add "$opts" -o loop) fi mount -t $(facet_fstype $facet) $opts $ostdev $mntpt || return $? rc=0; for i in $objids; do rm $mntpt/O/$group/d$((i % 32))/$i || { rc=$?; break ; } done umount -f $mntpt || return $? return $rc } With the new test script, e2fsck doesn't complain the quota usage inconsistency anymore. However, seems the lfsck.sh can't pass for the "if is_empty_fs $MOUNT" case at all, during the lfsck fix phase, it'll hit LBUG in mdd_create_data() (because we don't support MDS_OPEN_HAS_OBJS flag anymore), after fixing the LBUG, lfsck still failed: lfsck -c -l --mdsdb /tmp/mdsdb --ostdb /tmp/ostdb-0 /tmp/ostdb-1 /mnt/lustre lfsck 1.42.5.wc2 (15-Sep-2012) lfsck: ost_idx 0: pass1: check for duplicate objects lfsck: ost_idx 0: pass1 OK (71 files total) lfsck: ost_idx 0: pass2: check for missing inode objects Failed to find fid [0x200000400:0x57:0x0]: DB_NOTFOUND: No matching key/data pair found Failed to find fid [0x200000400:0x59:0x0]: DB_NOTFOUND: No matching key/data pair found Failed to find fid [0x200000400:0x5b:0x0]: DB_NOTFOUND: No matching key/data pair found Failed to find fid [0x200000400:0x5d:0x0]: DB_NOTFOUND: No matching key/data pair found Failed to find fid [0x200000400:0x5f:0x0]: DB_NOTFOUND: No matching key/data pair found lfsck: ost_idx 0: pass2 OK (76 objects) lfsck: ost_idx 0: pass3: check for orphan objects lfsck: [0]: pass3 saved orphan object 0:43, 1048576 bytes lfsck: [0]: pass3 saved orphan object 0:44, 1048576 bytes lfsck: [0]: pass3 saved orphan object 0:45, 1048576 bytes lfsck: [0]: pass3 saved orphan object 0:46, 1048576 bytes lfsck: [0]: pass3 saved orphan object 0:47, 1048576 bytes lfsck: ost_idx 0: pass3 FIXED: 5MB of orphan data (5 of 91 files total) lfsck: ost_idx 1: pass1: check for duplicate objects lfsck: ost_idx 1: pass1 OK (71 files total) lfsck: ost_idx 1: pass2: check for missing inode objects lfsck: ost_idx 1: pass2 OK (76 objects) lfsck: ost_idx 1: pass3: check for orphan objects lfsck: [1]: pass3 saved orphan object 0:43, 1048576 bytes lfsck: [1]: pass3 saved orphan object 0:44, 1048576 bytes lfsck: [1]: pass3 saved orphan object 0:45, 1048576 bytes lfsck: [1]: pass3 saved orphan object 0:46, 1048576 bytes lfsck: [1]: pass3 saved orphan object 0:47, 1048576 bytes lfsck: ost_idx 1: pass3 FIXED: 5MB of orphan data (5 of 96 files total) lfsck: pass4: check for 20 duplicate object references Failed to find fid [0x200000400:0x61:0x0]: DB_NOTFOUND: No matching key/data pair found Failed to find fid [0x6a:0xedbddba1:0x0]: DB_NOTFOUND: No matching key/data pair found Failed to find fid [0x200000400:0x63:0x0]: DB_NOTFOUND: No matching key/data pair found Failed to find fid [0xc1:0xedbddba3:0x0]: DB_NOTFOUND: No matching key/data pair found Failed to find fid [0x200000400:0x67:0x0]: DB_NOTFOUND: No matching key/data pair found Failed to find fid [0xc5:0xedbddba7:0x0]: DB_NOTFOUND: No matching key/data pair found Failed to find fid [0x200000400:0x65:0x0]: DB_NOTFOUND: No matching key/data pair found Failed to find fid [0xc3:0xedbddba5:0x0]: DB_NOTFOUND: No matching key/data pair found Failed to find fid [0x200000400:0x69:0x0]: DB_NOTFOUND: No matching key/data pair found Failed to find fid [0xc7:0xedbddba9:0x0]: DB_NOTFOUND: No matching key/data pair found Failed to find fid [0x200000400:0x62:0x0]: DB_NOTFOUND: No matching key/data pair found Failed to find fid [0x6c:0xedbddba2:0x0]: DB_NOTFOUND: No matching key/data pair found Failed to find fid [0x200000400:0x64:0x0]: DB_NOTFOUND: No matching key/data pair found Failed to find fid [0xc2:0xedbddba4:0x0]: DB_NOTFOUND: No matching key/data pair found Failed to find fid [0x200000400:0x68:0x0]: DB_NOTFOUND: No matching key/data pair found Failed to find fid [0xc6:0xedbddba8:0x0]: DB_NOTFOUND: No matching key/data pair found Failed to find fid [0x200000400:0x66:0x0]: DB_NOTFOUND: No matching key/data pair found Failed to find fid [0xc4:0xedbddba6:0x0]: DB_NOTFOUND: No matching key/data pair found Failed to find fid [0x200000400:0x6a:0x0]: DB_NOTFOUND: No matching key/data pair found Failed to find fid [0xc8:0xedbddbaa:0x0]: DB_NOTFOUND: No matching key/data pair found removed directory: `/mnt/lustre/lost+found/duplicates' lfsck: pass4 finished lfsck: fixed 10 errors lfsck finished with rc=1 removed `/tmp/mdsdb' removed `/tmp/mdsdb.mdshdr' removed `/tmp/ostdb-0' removed `/tmp/ostdb-1' clean after the first check == lfsck test complete, duration 44 sec == 03:35:27 (1359016527) rm: cannot remove `/mnt/lustre/d0.lfsck/testfile.18.bad': Input/output error rm: cannot remove `/mnt/lustre/d0.lfsck/testfile.17.bad': Input/output error rm: cannot remove `/mnt/lustre/d0.lfsck/testfile.19.bad': Input/output error rm: cannot remove `/mnt/lustre/d0.lfsck/testfile.15': Input/output error rm: cannot remove `/mnt/lustre/d0.lfsck/testfile.14.bad': Input/output error rm: cannot remove `/mnt/lustre/d0.lfsck/testfile.20.bad': Input/output error rm: cannot remove `/mnt/lustre/d0.lfsck/testfile.12': Input/output error lfsck : @@@@@@ FAIL: remove sub-test dirs failed Trace dump: = /home/niu/lustre/lustre-master/lustre/tests/test-framework.sh:3957:error_noexit() = /home/niu/lustre/lustre-master/lustre/tests/test-framework.sh:3980:error() = /home/niu/lustre/lustre-master/lustre/tests/test-framework.sh:3565:check_and_cleanup_lustre() = lfsck.sh:291:main() Dumping lctl log to /tmp/test_logs/1359016483/lfsck..*.1359016527.log Dumping logs only on local client. I'm not sure if lfsck is still being supportted or used by user? If not, I think we can just leave these errors behind, otherwise, we'd open other tickets and fix lfsck itself. Andreas, any comment? Thanks.

            lfsck uses debugfs to remove objects on OSTs, which caused the e2fsck reporting quota usage inconsistence. In the autotest, the lustre filesystem is usually not newly created, then lfsck will not remove objects by debugfs, that's why lfsck often pass in autotest.

            lfsck.sh:

            if is_empty_fs $MOUNT; then
             ...
              # remove objects associated with files in group $OBJGRP
              # on the OST with index $OSTIDX
              remove_objects $OSTNODE $OSTDEV $OBJGRP $OST_REMOVE || \
                error "removing objects failed"
              ...
            else # is_empty_fs $MOUNT
                FSCK_MAX_ERR=4   # file system errors left uncorrected
            fi
            
            niu Niu Yawei (Inactive) added a comment - lfsck uses debugfs to remove objects on OSTs, which caused the e2fsck reporting quota usage inconsistence. In the autotest, the lustre filesystem is usually not newly created, then lfsck will not remove objects by debugfs, that's why lfsck often pass in autotest. lfsck.sh: if is_empty_fs $MOUNT; then ... # remove objects associated with files in group $OBJGRP # on the OST with index $OSTIDX remove_objects $OSTNODE $OSTDEV $OBJGRP $OST_REMOVE || \ error "removing objects failed" ... else # is_empty_fs $MOUNT FSCK_MAX_ERR=4 # file system errors left uncorrected fi

            In the e2fsck output it seems there is still an error checking quota:

            e2fsck -d -v -t -t -f -n --mdsdb /home/cgearing/.autotest/shared_dir/2013-01-21/201214-70220949137480/mdsdb --ostdb /home/cgearing/.autotest/shared_dir/2013-01-21/201214-70220949137480/ostdb-0 /dev/mapper/lvm--OSS-P1
            20:33:45:wtm-6vm4: e2fsck 1.42.6.wc2 (10-Dec-2012)
            20:33:45:wtm-6vm4: e2fsck_pass1:1501: increase inode 185 badness 0 to 2
            20:33:45:wtm-6vm4: e2fsck_pass1:1501: increase inode 186 badness 0 to 2
            20:33:45:wtm-6vm4: [QUOTA WARNING] Usage inconsistent for ID 0:actual (3936256, 177) != expected (6033408, 179)
            

            The "inode badness" warnings are because of precreated inode ctime before the filesystem format time (ctime = 0) and could be fixed, but are not a source of problems.

            adilger Andreas Dilger added a comment - In the e2fsck output it seems there is still an error checking quota: e2fsck -d -v -t -t -f -n --mdsdb /home/cgearing/.autotest/shared_dir/2013-01-21/201214-70220949137480/mdsdb --ostdb /home/cgearing/.autotest/shared_dir/2013-01-21/201214-70220949137480/ostdb-0 /dev/mapper/lvm--OSS-P1 20:33:45:wtm-6vm4: e2fsck 1.42.6.wc2 (10-Dec-2012) 20:33:45:wtm-6vm4: e2fsck_pass1:1501: increase inode 185 badness 0 to 2 20:33:45:wtm-6vm4: e2fsck_pass1:1501: increase inode 186 badness 0 to 2 20:33:45:wtm-6vm4: [QUOTA WARNING] Usage inconsistent for ID 0:actual (3936256, 177) != expected (6033408, 179) The "inode badness" warnings are because of precreated inode ctime before the filesystem format time (ctime = 0) and could be fixed, but are not a source of problems.
            emoly.liu Emoly Liu added a comment - The same error also happened in https://maloo.whamcloud.com/test_sets/8e542f1a-5b58-11e2-b205-52540035b04c

            People

              niu Niu Yawei (Inactive)
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: