[LU-7459] Incorrect file count with lfs quota Created: 21/Nov/15  Updated: 21/Jul/17  Resolved: 21/Jul/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.4
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Joe Mervini Assignee: Niu Yawei (Inactive)
Resolution: Incomplete Votes: 0
Labels: None
Environment:

Toss 2.3-4/2.4-2


Attachments: Text File lfs_quota.txt    
Issue Links:
Related
is related to LU-7467 Quota Account wrong Resolved
Epic/Theme: Quota
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

We have a process that periodically does a lfs quota throughout the day to monitor the usage on our file system. Recently we have started observing that while the usage numbers are being reported, the number of files that are reported for a given user went from a given number (e.g., 100,000 files) to 0 or 1.

This has persisted through system reboots and a OS/Lustre update. Once the given user's file count is reported as 0 or 1 it never changes. However checking the quota for the user's GID will generate a file count.

Anecdotally, it appears that only users that are affected are those that have files that have mixed user and group IDs.

All the normal checks have been performed (i.e., consistency of UID/GID in the password files between the file servers and clients, checking the MDT for ownership, checking file count with ls -l, etc.). It almost appears that an error condition is being flagged that needs to be cleared.



 Comments   
Comment by Peter Jones [ 21/Nov/15 ]

Niu

Could you please advise

Thanks

Peter

PS/ It was great to see you in person this week Joe!

Comment by Niu Yawei (Inactive) [ 23/Nov/15 ]

it appears that only users that are affected are those that have files that have mixed user and group IDs.

Could you give an example to illustrate such situation? I don't quite understand "mixed user and group IDs"

Could you pick a problematic user and post the output of "lfs quota -u $UID -v $mount" here? And please check the numbers for this user in the "/proc/fs/lustre/osd-ldiskfs/$fsname-MDT0000/quota_slave/acct_user". Thanks.

Comment by Joe Mervini [ 23/Nov/15 ]

Here's a sample of the user's top level directory (I've X'ed out most of the file/dir names):

total 266
drwxr-xr-x 7 21371 33073 16384 Sep 14 09:36 3XXXXXXXXX
drwxr-xr-x 3 21371 33073 4096 May 12 2015 AFXXXXXXXX
drwxr-xr-x+ 5 21371 33073 4096 Jan 7 2013 ArXXXXXXXXX
drwxr-xr-x 5 21371 33073 36864 Sep 14 09:30 BXXXXXXXXXX
drwxr-xr-x 14 21371 33073 4096 Jan 6 2015 CXXXXXXXXXXX
drwxr-xr-x 8 21371 33073 4096 Sep 14 11:17 From_gscratch2
drwxr-xr-x 3 21371 33073 4096 Aug 19 2014 H2013
drwxr-xr-x+ 2 21371 33073 4096 Mar 12 2013 HXXXXX
drwxr-xr-x 11 21371 33073 4096 Aug 24 2012 hXXXXXX
drwxr-xr-x 9 21371 33073 4096 Oct 8 2014 MXXXXXX
rw------ 1 21371 21371 63681 Sep 14 14:30 MyDiskUsage_09142014
drwxr-xr-x 3 21371 33073 16384 Sep 14 10:33 NXXXXXXX
drwxr-xr-x 2 21371 33073 4096 Jun 17 10:53 P_XXXXXXX
drwxr-xr-x 10 21371 33073 4096 Sep 14 11:45 RXXXXXXX
drwxr-xr-x 4 21371 33073 4096 Jan 30 2012 Test_codes
drwxr-xr-x 3 21371 33073 4096 Mar 10 2015 ValTests
drwxr-xr-x 5 21371 33073 20480 Sep 14 14:27 ZXXXXXX

Here is the output from the MDS:

[root@fmds1 quota_slave]# grep -A1 21371 acct_user

  • id: 21371
    usage: { inodes: 1, kbytes: 0 }

    [root@fmds1 quota_slave]# grep -A1 21371 acct_group

  • id: 21371
    usage: { inodes: 2238, kbytes: 180 }

    [root@fmds1 quota_slave]# grep -A1 33073 acct_group

  • id: 33073
    usage: { inodes: 192927, kbytes: 20400 }

I was going to get a word count on files in the directory using find on the UID but there were a sufficient number of files that after more than 1.5 hours it was still working and I terminated the process.

I am also attaching the output from the lfs quota command as you requested.

Comment by Niu Yawei (Inactive) [ 24/Nov/15 ]

Looks the inode accounting on backend fs is broken somehow, I have no idea why it's broken so far, but I think a quotacheck may fix it. (disable then re-enable quota feature on MDT device by tune2fs, it requires MDT offline)

However checking the quota for the user's GID will generate a file count.

Could you elaborate on this? What do you mean "checking the quota for the user's GID"?

Comment by Joe Mervini [ 24/Nov/15 ]

That is correct. When I run

lfs quota - g <GID> -v <file system>

I do get results for whatever GID exists in the directory that is having problems with the UID. So in the example of the top level directory above I get the following outputs: (Note that in our environment by default we assign the same value to both UID and GID for our users.)

  1. lfs quota -g 21371 /fscratch
    Disk quotas for group 21371 (gid 21371):
    Filesystem kbytes quota limit grace files quota limit grace
    /fscratch 275840572 0 0 - 2238 0 0 -
  2. lfs quota -g 33073 /fscratch
    Disk quotas for group 33073 (gid 33073):
    Filesystem kbytes quota limit grace files quota limit grace
    /fscratch 20500875399 0 0 - 192927 0 0 -

And once again for illustration:

  1. lfs quota -u 21371 /fscratch
    Disk quotas for user 21371 (uid 21371):
    Filesystem kbytes quota limit grace files quota limit grace
    /fscratch 20773618792 0 0 - 1 0 0 -

We have an scheduled outage that might permit us to do a quota check coming up the 2nd week of December.

Comment by Joe Mervini [ 30/Nov/15 ]

Just so I'm clear, what is the process for disabling and re-enabling quotas on the newer (>2.4.0) lustre releases. The only references that I find in the manual for using tunefs.lustre is for enabling quotas after upgrading for a version of lustre older than 2.4. I'm guessing there's a --noquota parameter or something similar but there is no procedure defined. I don't know whether it's just a matter of running the tunefs.lustre back to back or if a mount is required so if you can provide that detail I'd appreciate it.

Comment by Niu Yawei (Inactive) [ 01/Dec/15 ]

The instructions of enable/disable quota is described in manual "21.2.1. Enabling Disk Quotas (Lustre Software Release 2.4 and later)":

lctl conf_param fsname.quota.ost|mdt=u|g|ug|none

Using tunefs.lustre (or tune2fs) is to disable/enable quota feature on the device, which is used to enable quota accounting on backend fs.

Comment by Joe Mervini [ 01/Dec/15 ]

Oh - sorry, I misread your comment above regarding tune2fs. I get that now.

My question is: Is a mount required between the time you set mdt=none and when you set it back to mdt=ug? And when it is switched back on can I simply bring lustre back up and whatever process takes place is done in the background?

Comment by Niu Yawei (Inactive) [ 02/Dec/15 ]

No, enabling/disabling quota by "lctl conf_param" doesn't require umount/mount, but it do need some small amount of time to let the MGS spread the new configuration onto all servers. (It's same as other parameter settings done by "lctl conf_param"), you can verify if the new configuration is taking effect by checking the proc files on each server targets. (see the quota_slave.info)

Comment by Niu Yawei (Inactive) [ 17/Jul/17 ]

Joe, is it fixed by enabling/disabling quota feature?

Comment by Peter Jones [ 21/Jul/17 ]

It does not seem as if this remains a concern

Generated at Sat Feb 10 02:09:05 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.