Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11024

Broken inode accounting of MDT on ZFS

Details

    • 3
    • 9223372036854775807

    Description

      Roughly 6,200 10MB files are created with 'dd' loop:

      [thcrowe@td-mngt01 thcrowe] for i in `seq 1 10000` ; do dd if=/dev/zero of=file-$i bs=1M count=10; done
      ^C
      [thcrowe@td-mngt01 thcrowe]

      All files were written to MDT index 0.

      [thcrowe@td-mngt01 thcrowe]$ lfs getstripe m file* | sort | uniq
      0
      [thcrowe@td-mngt01 thcrowe]$ ls 1 file* | wc -l
      6208

      So at this point, POSIX says I have 6,208 files named file-*.
      Lustre however, reports the numbers differently.

      [thcrowe@td-mngt01 thcrowe]$ lfs quota -u 415432 /mnt/slate
      Disk quotas for usr 415432 (uid 415432):
      Filesystem kbytes quota limit grace files quota limit grace
      /mnt/slate 61619203 0 0 - 1473 0 0 -
      [thcrowe@td-mngt01 thcrowe]$

      Here is the same data directly from the MDT

      [root@slate-mds01 ~]# grep -A1 415432 /proc/fs/lustre/osd-zfs/slate-MDT0000/quota_slave/acct_user

      • id: 415432
        usage: { inodes: 1473, kbytes: 53825 }
        [root@slate-mds01 ~]#

      Following commands were used to dump MDT index 0's zfs contents:

      [root@slate-mds01 ~]# zpool list
      NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
      mgs 186G 6.35M 186G - 0% 0% 1.00x ONLINE -
      slate_mdt0000 8.67T 16.1G 8.66T - 0% 0% 1.00x ONLINE -
      [root@slate-mds01 ~]# zpool set cachefile=/tmp/slate_mdt0000.cache slate_mdt0000
      [root@slate-mds01 ~]# cp /tmp/slate_mdt0000.cache /tmp/slate_mdt0000.cache1
      [root@slate-mds01 ~]# zpool set cachefile="" slate_mdt0000
      [root@slate-mds01 ~]# zdb -ddddd -U /tmp/slate_mdt0000.cache1 slate_mdt0000 > /tmp/slate_mdt0000-zdb-ddddd

      Once the zdb completed it output, it is a simple grep to see what uid 415432 has going on.

      [root@slate-mds01 ~]# grep uid /tmp/slate_mdt0000-zdb-ddddd | grep -c 415432
      6210

      6210 is not 6208, because there are 2 directory objects owned by 415432 in the zdb output.

      Further tests were run to check /proc/fs/lustre/osd-zfs/slate-MDT0000/quota_slave/acct_user reported correct information. If created 100 files, the account number increased from 120 to 205, and if created 1000 files, the account number increased from 120 to 929.

      Space accounting on MDT works well.

       

       

       

      Attachments

        Issue Links

          Activity

            [LU-11024] Broken inode accounting of MDT on ZFS
            pjones Peter Jones added a comment -

            Landed for 2.12

            pjones Peter Jones added a comment - Landed for 2.12

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/32418/
            Subject: LU-11024 osd-zfs: properly detect ZFS dnode accounting
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 4813eee9b06f6ceb4f39cce42d904ec1516a824a

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/32418/ Subject: LU-11024 osd-zfs: properly detect ZFS dnode accounting Project: fs/lustre-release Branch: master Current Patch Set: Commit: 4813eee9b06f6ceb4f39cce42d904ec1516a824a

            John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/32422/
            Subject: LU-11024 osd-zfs: properly detect ZFS dnode accounting
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set:
            Commit: 71943a5d23498fc76d621e3855d580ff90f0757a

            gerrit Gerrit Updater added a comment - John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/32422/ Subject: LU-11024 osd-zfs: properly detect ZFS dnode accounting Project: fs/lustre-release Branch: b2_10 Current Patch Set: Commit: 71943a5d23498fc76d621e3855d580ff90f0757a

            Note that this bug caused the ZFS dnode/inode quota to be reported incorrectly by Lustre - it wasn't using the DMU interface for reporting the dnode quota, but had fallen back to estimating dnode usage based on the user's space usage as if using ZFS 0.6.x which didn't have that interface. ZFS was still accounting the dnode usage correctly internally. Once the autoconf patch is applied and the Lustre server is rebuilt/restarted, then the ZFS dnode/inode quota will be reported correctly to Lustre.

            adilger Andreas Dilger added a comment - Note that this bug caused the ZFS dnode/inode quota to be reported incorrectly by Lustre - it wasn't using the DMU interface for reporting the dnode quota, but had fallen back to estimating dnode usage based on the user's space usage as if using ZFS 0.6.x which didn't have that interface. ZFS was still accounting the dnode usage correctly internally. Once the autoconf patch is applied and the Lustre server is rebuilt/restarted, then the ZFS dnode/inode quota will be reported correctly to Lustre.

            Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: https://review.whamcloud.com/32422
            Subject: LU-11024 osd-zfs: properly detect ZFS dnode accounting
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set: 1
            Commit: 3a4677a73c885035e48d4aabeefa2592596a64fd

            gerrit Gerrit Updater added a comment - Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: https://review.whamcloud.com/32422 Subject: LU-11024 osd-zfs: properly detect ZFS dnode accounting Project: fs/lustre-release Branch: b2_10 Current Patch Set: 1 Commit: 3a4677a73c885035e48d4aabeefa2592596a64fd
            yong.fan nasf (Inactive) added a comment - Here is the patch: https://review.whamcloud.com/#/c/32418/

            This looks like it is a bug in the autoconf checking for ZFS dnode accounting introduced by the landing of patch https://review.whamcloud.com/30540 "LU-7991 quota: project quota against ZFS backend" to b2_10 (patch https://review.whamcloud.com/27093 on master).

            diff --git a/config/lustre-build-zfs.m4 b/config/lustre-build-zfs.m4
            index 9e39c80..297e790 100644
            --- a/config/lustre-build-zfs.m4
            +++ b/config/lustre-build-zfs.m4
            @@ -523,10 +523,10 @@ your distribution.
                            dnl # ZFS 0.7.0 feature: SPA_FEATURE_USEROBJ_ACCOUNTING
                            dnl #
                            LB_CHECK_COMPILE([if zfs has native dnode accounting supported],
            -               dmu_objset_userobjspace_upgrade, [
            +               dmu_objset_id_quota_upgrade, [
                                    #include <sys/dmu_objset.h>
                            ],[
            -                       dmu_objset_userobjspace_upgrade(NULL);
            +                       dmu_objset_id_quota_upgrade(NULL);
                            ],[
                                    AC_DEFINE(HAVE_DMU_USEROBJ_ACCOUNTING, 1,
                                            [Have native dnode accounting in ZFS])
            

            In particular, the dmu_objset_id_quota_upgrade() function only exists in ZFS master (for 0.8), while dmu_objset_userobjspace_upgrade() is what exists in ZFS 0.7. We need to check for both. Lustre doesn't call this function directly (that would cause the on-disk format to be changed and prevent downgrade to an older ZFS release), so we don't need any compatibility functions in Lustre, just the detection needs to be fixed.

            adilger Andreas Dilger added a comment - This looks like it is a bug in the autoconf checking for ZFS dnode accounting introduced by the landing of patch https://review.whamcloud.com/30540 " LU-7991 quota: project quota against ZFS backend " to b2_10 (patch https://review.whamcloud.com/27093 on master). diff --git a/config/lustre-build-zfs.m4 b/config/lustre-build-zfs.m4 index 9e39c80..297e790 100644 --- a/config/lustre-build-zfs.m4 +++ b/config/lustre-build-zfs.m4 @@ -523,10 +523,10 @@ your distribution. dnl # ZFS 0.7.0 feature: SPA_FEATURE_USEROBJ_ACCOUNTING dnl # LB_CHECK_COMPILE([ if zfs has native dnode accounting supported], - dmu_objset_userobjspace_upgrade, [ + dmu_objset_id_quota_upgrade, [ #include <sys/dmu_objset.h> ],[ - dmu_objset_userobjspace_upgrade(NULL); + dmu_objset_id_quota_upgrade(NULL); ],[ AC_DEFINE(HAVE_DMU_USEROBJ_ACCOUNTING, 1, [Have native dnode accounting in ZFS]) In particular, the dmu_objset_id_quota_upgrade() function only exists in ZFS master (for 0.8), while dmu_objset_userobjspace_upgrade() is what exists in ZFS 0.7. We need to check for both. Lustre doesn't call this function directly (that would cause the on-disk format to be changed and prevent downgrade to an older ZFS release), so we don't need any compatibility functions in Lustre, just the detection needs to be fixed.

            The ZFS version is 0.7.5, and userobj_accounting feature is running actively on ZFS. The Lustre version is 2.10.3.

            lixi Li Xi (Inactive) added a comment - The ZFS version is 0.7.5, and userobj_accounting feature is running actively on ZFS. The Lustre version is 2.10.3.
            pjones Peter Jones added a comment -

            lixi which Lustre version is being used here?

            pjones Peter Jones added a comment - lixi which Lustre version is being used here?

            What version of ZFS and Lustre is installed on the MDS?  Using ZFS 0.6 had a Lustre-specific inode accounting implementation, but that was replaced with an in-ZFS dnode accounting mechanism in ZFS 0.7.

            adilger Andreas Dilger added a comment - What version of ZFS and Lustre is installed on the MDS?  Using ZFS 0.6 had a Lustre-specific inode accounting implementation, but that was replaced with an in-ZFS dnode accounting mechanism in ZFS 0.7.

            People

              yong.fan nasf (Inactive)
              lixi Li Xi (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: