Details

    • Bug
    • Resolution: Incomplete
    • Minor
    • None
    • Lustre 2.5.4
    • None
    • Toss 2.3-4/2.4-2
    • 3
    • 9223372036854775807

    Description

      We have a process that periodically does a lfs quota throughout the day to monitor the usage on our file system. Recently we have started observing that while the usage numbers are being reported, the number of files that are reported for a given user went from a given number (e.g., 100,000 files) to 0 or 1.

      This has persisted through system reboots and a OS/Lustre update. Once the given user's file count is reported as 0 or 1 it never changes. However checking the quota for the user's GID will generate a file count.

      Anecdotally, it appears that only users that are affected are those that have files that have mixed user and group IDs.

      All the normal checks have been performed (i.e., consistency of UID/GID in the password files between the file servers and clients, checking the MDT for ownership, checking file count with ls -l, etc.). It almost appears that an error condition is being flagged that needs to be cleared.

      Attachments

        Issue Links

          Activity

            [LU-7459] Incorrect file count with lfs quota
            pjones Peter Jones added a comment -

            It does not seem as if this remains a concern

            pjones Peter Jones added a comment - It does not seem as if this remains a concern

            Joe, is it fixed by enabling/disabling quota feature?

            niu Niu Yawei (Inactive) added a comment - Joe, is it fixed by enabling/disabling quota feature?

            No, enabling/disabling quota by "lctl conf_param" doesn't require umount/mount, but it do need some small amount of time to let the MGS spread the new configuration onto all servers. (It's same as other parameter settings done by "lctl conf_param"), you can verify if the new configuration is taking effect by checking the proc files on each server targets. (see the quota_slave.info)

            niu Niu Yawei (Inactive) added a comment - No, enabling/disabling quota by "lctl conf_param" doesn't require umount/mount, but it do need some small amount of time to let the MGS spread the new configuration onto all servers. (It's same as other parameter settings done by "lctl conf_param"), you can verify if the new configuration is taking effect by checking the proc files on each server targets. (see the quota_slave.info)
            jamervi Joe Mervini added a comment -

            Oh - sorry, I misread your comment above regarding tune2fs. I get that now.

            My question is: Is a mount required between the time you set mdt=none and when you set it back to mdt=ug? And when it is switched back on can I simply bring lustre back up and whatever process takes place is done in the background?

            jamervi Joe Mervini added a comment - Oh - sorry, I misread your comment above regarding tune2fs. I get that now. My question is: Is a mount required between the time you set mdt=none and when you set it back to mdt=ug? And when it is switched back on can I simply bring lustre back up and whatever process takes place is done in the background?

            The instructions of enable/disable quota is described in manual "21.2.1. Enabling Disk Quotas (Lustre Software Release 2.4 and later)":

            lctl conf_param fsname.quota.ost|mdt=u|g|ug|none
            

            Using tunefs.lustre (or tune2fs) is to disable/enable quota feature on the device, which is used to enable quota accounting on backend fs.

            niu Niu Yawei (Inactive) added a comment - The instructions of enable/disable quota is described in manual "21.2.1. Enabling Disk Quotas (Lustre Software Release 2.4 and later)": lctl conf_param fsname.quota.ost|mdt=u|g|ug|none Using tunefs.lustre (or tune2fs) is to disable/enable quota feature on the device, which is used to enable quota accounting on backend fs.
            jamervi Joe Mervini added a comment -

            Just so I'm clear, what is the process for disabling and re-enabling quotas on the newer (>2.4.0) lustre releases. The only references that I find in the manual for using tunefs.lustre is for enabling quotas after upgrading for a version of lustre older than 2.4. I'm guessing there's a --noquota parameter or something similar but there is no procedure defined. I don't know whether it's just a matter of running the tunefs.lustre back to back or if a mount is required so if you can provide that detail I'd appreciate it.

            jamervi Joe Mervini added a comment - Just so I'm clear, what is the process for disabling and re-enabling quotas on the newer (>2.4.0) lustre releases. The only references that I find in the manual for using tunefs.lustre is for enabling quotas after upgrading for a version of lustre older than 2.4. I'm guessing there's a --noquota parameter or something similar but there is no procedure defined. I don't know whether it's just a matter of running the tunefs.lustre back to back or if a mount is required so if you can provide that detail I'd appreciate it.
            jamervi Joe Mervini added a comment -

            That is correct. When I run

            lfs quota - g <GID> -v <file system>

            I do get results for whatever GID exists in the directory that is having problems with the UID. So in the example of the top level directory above I get the following outputs: (Note that in our environment by default we assign the same value to both UID and GID for our users.)

            1. lfs quota -g 21371 /fscratch
              Disk quotas for group 21371 (gid 21371):
              Filesystem kbytes quota limit grace files quota limit grace
              /fscratch 275840572 0 0 - 2238 0 0 -
            2. lfs quota -g 33073 /fscratch
              Disk quotas for group 33073 (gid 33073):
              Filesystem kbytes quota limit grace files quota limit grace
              /fscratch 20500875399 0 0 - 192927 0 0 -

            And once again for illustration:

            1. lfs quota -u 21371 /fscratch
              Disk quotas for user 21371 (uid 21371):
              Filesystem kbytes quota limit grace files quota limit grace
              /fscratch 20773618792 0 0 - 1 0 0 -

            We have an scheduled outage that might permit us to do a quota check coming up the 2nd week of December.

            jamervi Joe Mervini added a comment - That is correct. When I run lfs quota - g <GID> -v <file system> I do get results for whatever GID exists in the directory that is having problems with the UID. So in the example of the top level directory above I get the following outputs: (Note that in our environment by default we assign the same value to both UID and GID for our users.) lfs quota -g 21371 /fscratch Disk quotas for group 21371 (gid 21371): Filesystem kbytes quota limit grace files quota limit grace /fscratch 275840572 0 0 - 2238 0 0 - lfs quota -g 33073 /fscratch Disk quotas for group 33073 (gid 33073): Filesystem kbytes quota limit grace files quota limit grace /fscratch 20500875399 0 0 - 192927 0 0 - And once again for illustration: lfs quota -u 21371 /fscratch Disk quotas for user 21371 (uid 21371): Filesystem kbytes quota limit grace files quota limit grace /fscratch 20773618792 0 0 - 1 0 0 - We have an scheduled outage that might permit us to do a quota check coming up the 2nd week of December.

            Looks the inode accounting on backend fs is broken somehow, I have no idea why it's broken so far, but I think a quotacheck may fix it. (disable then re-enable quota feature on MDT device by tune2fs, it requires MDT offline)

            However checking the quota for the user's GID will generate a file count.

            Could you elaborate on this? What do you mean "checking the quota for the user's GID"?

            niu Niu Yawei (Inactive) added a comment - Looks the inode accounting on backend fs is broken somehow, I have no idea why it's broken so far, but I think a quotacheck may fix it. (disable then re-enable quota feature on MDT device by tune2fs, it requires MDT offline) However checking the quota for the user's GID will generate a file count. Could you elaborate on this? What do you mean "checking the quota for the user's GID"?
            jamervi Joe Mervini added a comment -

            Here's a sample of the user's top level directory (I've X'ed out most of the file/dir names):

            total 266
            drwxr-xr-x 7 21371 33073 16384 Sep 14 09:36 3XXXXXXXXX
            drwxr-xr-x 3 21371 33073 4096 May 12 2015 AFXXXXXXXX
            drwxr-xr-x+ 5 21371 33073 4096 Jan 7 2013 ArXXXXXXXXX
            drwxr-xr-x 5 21371 33073 36864 Sep 14 09:30 BXXXXXXXXXX
            drwxr-xr-x 14 21371 33073 4096 Jan 6 2015 CXXXXXXXXXXX
            drwxr-xr-x 8 21371 33073 4096 Sep 14 11:17 From_gscratch2
            drwxr-xr-x 3 21371 33073 4096 Aug 19 2014 H2013
            drwxr-xr-x+ 2 21371 33073 4096 Mar 12 2013 HXXXXX
            drwxr-xr-x 11 21371 33073 4096 Aug 24 2012 hXXXXXX
            drwxr-xr-x 9 21371 33073 4096 Oct 8 2014 MXXXXXX
            rw------ 1 21371 21371 63681 Sep 14 14:30 MyDiskUsage_09142014
            drwxr-xr-x 3 21371 33073 16384 Sep 14 10:33 NXXXXXXX
            drwxr-xr-x 2 21371 33073 4096 Jun 17 10:53 P_XXXXXXX
            drwxr-xr-x 10 21371 33073 4096 Sep 14 11:45 RXXXXXXX
            drwxr-xr-x 4 21371 33073 4096 Jan 30 2012 Test_codes
            drwxr-xr-x 3 21371 33073 4096 Mar 10 2015 ValTests
            drwxr-xr-x 5 21371 33073 20480 Sep 14 14:27 ZXXXXXX

            Here is the output from the MDS:

            [root@fmds1 quota_slave]# grep -A1 21371 acct_user

            • id: 21371
              usage: { inodes: 1, kbytes: 0 }

              [root@fmds1 quota_slave]# grep -A1 21371 acct_group

            • id: 21371
              usage: { inodes: 2238, kbytes: 180 }

              [root@fmds1 quota_slave]# grep -A1 33073 acct_group

            • id: 33073
              usage: { inodes: 192927, kbytes: 20400 }

            I was going to get a word count on files in the directory using find on the UID but there were a sufficient number of files that after more than 1.5 hours it was still working and I terminated the process.

            I am also attaching the output from the lfs quota command as you requested.

            jamervi Joe Mervini added a comment - Here's a sample of the user's top level directory (I've X'ed out most of the file/dir names): total 266 drwxr-xr-x 7 21371 33073 16384 Sep 14 09:36 3XXXXXXXXX drwxr-xr-x 3 21371 33073 4096 May 12 2015 AFXXXXXXXX drwxr-xr-x+ 5 21371 33073 4096 Jan 7 2013 ArXXXXXXXXX drwxr-xr-x 5 21371 33073 36864 Sep 14 09:30 BXXXXXXXXXX drwxr-xr-x 14 21371 33073 4096 Jan 6 2015 CXXXXXXXXXXX drwxr-xr-x 8 21371 33073 4096 Sep 14 11:17 From_gscratch2 drwxr-xr-x 3 21371 33073 4096 Aug 19 2014 H2013 drwxr-xr-x+ 2 21371 33073 4096 Mar 12 2013 HXXXXX drwxr-xr-x 11 21371 33073 4096 Aug 24 2012 hXXXXXX drwxr-xr-x 9 21371 33073 4096 Oct 8 2014 MXXXXXX rw ------ 1 21371 21371 63681 Sep 14 14:30 MyDiskUsage_09142014 drwxr-xr-x 3 21371 33073 16384 Sep 14 10:33 NXXXXXXX drwxr-xr-x 2 21371 33073 4096 Jun 17 10:53 P_XXXXXXX drwxr-xr-x 10 21371 33073 4096 Sep 14 11:45 RXXXXXXX drwxr-xr-x 4 21371 33073 4096 Jan 30 2012 Test_codes drwxr-xr-x 3 21371 33073 4096 Mar 10 2015 ValTests drwxr-xr-x 5 21371 33073 20480 Sep 14 14:27 ZXXXXXX Here is the output from the MDS: [root@fmds1 quota_slave] # grep -A1 21371 acct_user id: 21371 usage: { inodes: 1, kbytes: 0 } [root@fmds1 quota_slave] # grep -A1 21371 acct_group id: 21371 usage: { inodes: 2238, kbytes: 180 } [root@fmds1 quota_slave] # grep -A1 33073 acct_group id: 33073 usage: { inodes: 192927, kbytes: 20400 } I was going to get a word count on files in the directory using find on the UID but there were a sufficient number of files that after more than 1.5 hours it was still working and I terminated the process. I am also attaching the output from the lfs quota command as you requested.

            it appears that only users that are affected are those that have files that have mixed user and group IDs.

            Could you give an example to illustrate such situation? I don't quite understand "mixed user and group IDs"

            Could you pick a problematic user and post the output of "lfs quota -u $UID -v $mount" here? And please check the numbers for this user in the "/proc/fs/lustre/osd-ldiskfs/$fsname-MDT0000/quota_slave/acct_user". Thanks.

            niu Niu Yawei (Inactive) added a comment - it appears that only users that are affected are those that have files that have mixed user and group IDs. Could you give an example to illustrate such situation? I don't quite understand "mixed user and group IDs" Could you pick a problematic user and post the output of "lfs quota -u $UID -v $mount" here? And please check the numbers for this user in the "/proc/fs/lustre/osd-ldiskfs/$fsname-MDT0000/quota_slave/acct_user". Thanks.

            People

              niu Niu Yawei (Inactive)
              jamervi Joe Mervini
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: