[LU-7459] Incorrect file count with lfs quota Created: 21/Nov/15 Updated: 21/Jul/17 Resolved: 21/Jul/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.4 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Joe Mervini | Assignee: | Niu Yawei (Inactive) |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Toss 2.3-4/2.4-2 |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Epic/Theme: | Quota | ||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
We have a process that periodically does a lfs quota throughout the day to monitor the usage on our file system. Recently we have started observing that while the usage numbers are being reported, the number of files that are reported for a given user went from a given number (e.g., 100,000 files) to 0 or 1. This has persisted through system reboots and a OS/Lustre update. Once the given user's file count is reported as 0 or 1 it never changes. However checking the quota for the user's GID will generate a file count. Anecdotally, it appears that only users that are affected are those that have files that have mixed user and group IDs. All the normal checks have been performed (i.e., consistency of UID/GID in the password files between the file servers and clients, checking the MDT for ownership, checking file count with ls -l, etc.). It almost appears that an error condition is being flagged that needs to be cleared. |
| Comments |
| Comment by Peter Jones [ 21/Nov/15 ] |
|
Niu Could you please advise Thanks Peter PS/ It was great to see you in person this week Joe! |
| Comment by Niu Yawei (Inactive) [ 23/Nov/15 ] |
Could you give an example to illustrate such situation? I don't quite understand "mixed user and group IDs" Could you pick a problematic user and post the output of "lfs quota -u $UID -v $mount" here? And please check the numbers for this user in the "/proc/fs/lustre/osd-ldiskfs/$fsname-MDT0000/quota_slave/acct_user". Thanks. |
| Comment by Joe Mervini [ 23/Nov/15 ] |
|
Here's a sample of the user's top level directory (I've X'ed out most of the file/dir names): total 266 Here is the output from the MDS: [root@fmds1 quota_slave]# grep -A1 21371 acct_user
I was going to get a word count on files in the directory using find on the UID but there were a sufficient number of files that after more than 1.5 hours it was still working and I terminated the process. I am also attaching the output from the lfs quota command as you requested. |
| Comment by Niu Yawei (Inactive) [ 24/Nov/15 ] |
|
Looks the inode accounting on backend fs is broken somehow, I have no idea why it's broken so far, but I think a quotacheck may fix it. (disable then re-enable quota feature on MDT device by tune2fs, it requires MDT offline)
Could you elaborate on this? What do you mean "checking the quota for the user's GID"? |
| Comment by Joe Mervini [ 24/Nov/15 ] |
|
That is correct. When I run lfs quota - g <GID> -v <file system> I do get results for whatever GID exists in the directory that is having problems with the UID. So in the example of the top level directory above I get the following outputs: (Note that in our environment by default we assign the same value to both UID and GID for our users.)
And once again for illustration:
We have an scheduled outage that might permit us to do a quota check coming up the 2nd week of December. |
| Comment by Joe Mervini [ 30/Nov/15 ] |
|
Just so I'm clear, what is the process for disabling and re-enabling quotas on the newer (>2.4.0) lustre releases. The only references that I find in the manual for using tunefs.lustre is for enabling quotas after upgrading for a version of lustre older than 2.4. I'm guessing there's a --noquota parameter or something similar but there is no procedure defined. I don't know whether it's just a matter of running the tunefs.lustre back to back or if a mount is required so if you can provide that detail I'd appreciate it. |
| Comment by Niu Yawei (Inactive) [ 01/Dec/15 ] |
|
The instructions of enable/disable quota is described in manual "21.2.1. Enabling Disk Quotas (Lustre Software Release 2.4 and later)": lctl conf_param fsname.quota.ost|mdt=u|g|ug|none Using tunefs.lustre (or tune2fs) is to disable/enable quota feature on the device, which is used to enable quota accounting on backend fs. |
| Comment by Joe Mervini [ 01/Dec/15 ] |
|
Oh - sorry, I misread your comment above regarding tune2fs. I get that now. My question is: Is a mount required between the time you set mdt=none and when you set it back to mdt=ug? And when it is switched back on can I simply bring lustre back up and whatever process takes place is done in the background? |
| Comment by Niu Yawei (Inactive) [ 02/Dec/15 ] |
|
No, enabling/disabling quota by "lctl conf_param" doesn't require umount/mount, but it do need some small amount of time to let the MGS spread the new configuration onto all servers. (It's same as other parameter settings done by "lctl conf_param"), you can verify if the new configuration is taking effect by checking the proc files on each server targets. (see the quota_slave.info) |
| Comment by Niu Yawei (Inactive) [ 17/Jul/17 ] |
|
Joe, is it fixed by enabling/disabling quota feature? |
| Comment by Peter Jones [ 21/Jul/17 ] |
|
It does not seem as if this remains a concern |