[LU-5028] quota has grace time when the soft limits are not exceeded Created: 08/May/14  Updated: 18/Nov/16  Resolved: 18/Nov/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.2
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Li Xi (Inactive) Assignee: Niu Yawei (Inactive)
Resolution: Cannot Reproduce Votes: 0
Labels: None

Attachments: Text File lfs_quota-g_guta.log    
Severity: 3
Rank (Obsolete): 13909

 Description   

We found 'lfs quota' reports strange results for some groups. The soft limit of them are not exceeded, but the grace times are not zero.

[root@fm08 ~]# lfs quota -g gism /home2
Disk quotas for group gism (gid 1513):
Filesystem kbytes quota limit grace files quota limit
grace
/home2 45897736* 200000000 2000000000 6d23h59m45s 1 0
0 -

[root@fm08 ~]# lfs quota -g gitj /home2
Disk quotas for group gitj (gid 3271):
Filesystem kbytes quota limit grace files quota limit
grace
/home2 16644836* 200000000 2000000000 6d23h59m48s 1 0
0 -



 Comments   
Comment by Niu Yawei (Inactive) [ 08/May/14 ]

I'm afraid that the allocated limits has exceeded the softlimit, could you run "lfs quota -v" to see if the total allocated block limit has exceeded the softlimit?

Comment by Li Xi (Inactive) [ 08/May/14 ]

Hi Yawei,

Thanks for the quick response!

Following is the 'lfs quota -v' results. We changed the softlimit from 200000000 to 1000000000, so there is no grace time now.

[root@ff04 ~]# lfs quota -v -g gism /home2
Disk quotas for group gism (gid 1513):
     Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
         /home2 45897736  1000000000 2000000000       -       1       0       0       -
home2-MDT0000_UUID
                      4       -       0       -       1       -       0       -
home2-OST0000_UUID
                 256004       - 16777216       -       -       -       -       -
home2-OST0001_UUID
                      0       - 4194384       -       -       -       -       -
home2-OST0002_UUID
                      0       - 4718844       -       -       -       -       -
home2-OST0003_UUID
                 256000       - 16777216       -       -       -       -       -
home2-OST0004_UUID
                      0       - 5242980       -       -       -       -       -
home2-OST0005_UUID
                2676740       - 16777216       -       -       -       -       -
home2-OST0006_UUID
                2676740       - 16777216       -       -       -       -       -
home2-OST0007_UUID
                2676740       - 16777216       -       -       -       -       -
home2-OST0008_UUID
                2932744       - 16777216       -       -       -       -       -
home2-OST0009_UUID
                      0       - 4788916       -       -       -       -       -
home2-OST000a_UUID
                 256004       - 16777216       -       -       -       -       -
home2-OST000b_UUID
                      0       - 4194304       -       -       -       -       -
home2-OST000c_UUID
                      0       - 4194304       -       -       -       -       -
home2-OST000d_UUID
                 256000       - 16777216       -       -       -       -       -
home2-OST000e_UUID
                      0       - 4194304       -       -       -       -       -
home2-OST000f_UUID
                2676740       - 16777216       -       -       -       -       -
home2-OST0010_UUID
                2676740       - 16777216       -       -       -       -       -
home2-OST0011_UUID
                2676740       - 16777216       -       -       -       -       -
home2-OST0012_UUID
                2932744       - 16777216       -       -       -       -       -
home2-OST0013_UUID
                      0       - 4823972       -       -       -       -       -
home2-OST0014_UUID
                      0       - 4194428       -       -       -       -       -
home2-OST0015_UUID
                      0       - 4195184       -       -       -       -       -
home2-OST0016_UUID
                      0       - 4719436       -       -       -       -       -
home2-OST0017_UUID
                 256004       - 16777216       -       -       -       -       -
home2-OST0018_UUID
                      0       -       0       -       -       -       -       -
home2-OST0019_UUID
                2676204       - 16777216       -       -       -       -       -
home2-OST001a_UUID
                2676204       - 16777216       -       -       -       -       -
home2-OST001b_UUID
                2676204       - 16777216       -       -       -       -       -
home2-OST001c_UUID
                2932208       - 16777216       -       -       -       -       -
home2-OST001d_UUID
                 256004       - 16777216       -       -       -       -       -
home2-OST001e_UUID
                      0       - 4194808       -       -       -       -       -
home2-OST001f_UUID
                      0       - 4718896       -       -       -       -       -
home2-OST0020_UUID
                 256000       - 16777216       -       -       -       -       -
home2-OST0021_UUID
                      0       - 5242892       -       -       -       -       -
home2-OST0022_UUID
                2676740       - 16777216       -       -       -       -       -
home2-OST0023_UUID
                2676740       - 16777216       -       -       -       -       -
home2-OST0024_UUID
                2676740       - 16777216       -       -       -       -       -
home2-OST0025_UUID
                2932744       - 16777216       -       -       -       -       -
home2-OST0026_UUID
                      0       - 4543924       -       -       -       -       -
home2-OST0027_UUID
                 256004       - 16777216       -       -       -       -       -
[root@ff04 ~]# lfs quota -v -g gitj /home2
Disk quotas for group gitj (gid 3271):
     Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
         /home2 16644836  1000000000 2000000000       -       1       0       0       -
home2-MDT0000_UUID
                      4       -       0       -       1       -       0       -
home2-OST0000_UUID
                 252836       - 16777216       -       -       -       -       -
home2-OST0001_UUID
                 232308       - 16777216       -       -       -       -       -
home2-OST0002_UUID
                 304368       - 16777216       -       -       -       -       -
home2-OST0003_UUID
                 479444       - 16777216       -       -       -       -       -
home2-OST0004_UUID
                 304508       - 16777216       -       -       -       -       -
home2-OST0005_UUID
                 237444       - 16777216       -       -       -       -       -
home2-OST0006_UUID
                1175356       - 16777216       -       -       -       -       -
home2-OST0007_UUID
                 239768       - 16777216       -       -       -       -       -
home2-OST0008_UUID
                 685248       - 16777216       -       -       -       -       -
home2-OST0009_UUID
                 243156       - 16777216       -       -       -       -       -
home2-OST000a_UUID
                 275252       - 16777216       -       -       -       -       -
home2-OST000b_UUID
                 227748       - 16777216       -       -       -       -       -
home2-OST000c_UUID
                 311196       - 16777216       -       -       -       -       -
home2-OST000d_UUID
                 511500       - 16777216       -       -       -       -       -
home2-OST000e_UUID
                 293564       - 16777216       -       -       -       -       -
home2-OST000f_UUID
                 218748       - 16777216       -       -       -       -       -
home2-OST0010_UUID
                1186936       - 16777216       -       -       -       -       -
home2-OST0011_UUID
                 444848       - 16777216       -       -       -       -       -
home2-OST0012_UUID
                 504152       - 16777216       -       -       -       -       -
home2-OST0013_UUID
                 229696       - 16777216       -       -       -       -       -
home2-OST0014_UUID
                 245656       - 16777216       -       -       -       -       -
home2-OST0015_UUID
                 272548       - 16777216       -       -       -       -       -
home2-OST0016_UUID
                 457932       - 16777216       -       -       -       -       -
home2-OST0017_UUID
                 359204       - 16777216       -       -       -       -       -
home2-OST0018_UUID
                 234288       - 16777216       -       -       -       -       -
home2-OST0019_UUID
                 223420       - 16777216       -       -       -       -       -
home2-OST001a_UUID
                1166492       - 16777216       -       -       -       -       -
home2-OST001b_UUID
                 440788       - 16777216       -       -       -       -       -
home2-OST001c_UUID
                 491780       - 16777216       -       -       -       -       -
home2-OST001d_UUID
                 250044       - 16777216       -       -       -       -       -
home2-OST001e_UUID
                 245992       - 16777216       -       -       -       -       -
home2-OST001f_UUID
                 273804       - 16777216       -       -       -       -       -
home2-OST0020_UUID
                 466008       - 16777216       -       -       -       -       -
home2-OST0021_UUID
                 348564       - 16777216       -       -       -       -       -
home2-OST0022_UUID
                 239476       - 16777216       -       -       -       -       -
home2-OST0023_UUID
                 229744       - 16777216       -       -       -       -       -
home2-OST0024_UUID
                1172008       - 16777216       -       -       -       -       -
home2-OST0025_UUID
                 448984       - 16777216       -       -       -       -       -
home2-OST0026_UUID
                 473704       - 16777216       -       -       -       -       -
home2-OST0027_UUID
                 246320       - 16777216       -       -       -       -       -
Comment by Niu Yawei (Inactive) [ 08/May/14 ]

If you change the softlimit, the allocated limits could be changed as well. (because the qunit is increased when you enlarge the softlimit)
From the output we can see the total allocated limits has exceeded original softlimit now (and I suspect it has exceeded before change), that's why grace timer was triggered.

Did you unlink some large files before? (Which could free lots of space, but the allocated limit can't be released immediately)

Comment by Shuichi Ihara (Inactive) [ 08/May/14 ]

Hi Niu,
We have checked PENDING direcotry if open-unlink file are exist, but there was no files in there.
And, we can't explain the following situation too. quota is worng accounting. this is same lustre filesystem at same customer.

$ ls -l
total 1631392
-rw-r--r-- 1 uits0006 gits         82 Nov 19 09:14 cdbullet.gif
drwxr-xr-x 2 uits0006 gits       4096 Nov 19 09:14 css
drwxr-xr-x 2 uits0006 gits       4096 Nov 19 09:14 images
-rw-r--r-- 1 uits0006 gits     578119 Nov 19 01:44 InstallationGuide_8.06.007_01.html
-rw-r--r-- 1 uits0006 gits    1862473 Nov 19 09:16 LicenseAdministration.pdf
-rw-r--r-- 1 uits0006 gits     515515 Nov 19 09:14 ReleaseNotes_8.06.007_01.html
drwxr-xr-x 2 uits0006 gits       4096 Nov 19 09:13 scripts
-rwxr-xr-x 1 uits0006 gits 1667566384 Nov 19 09:09 STAR-CCM+8.06.007_01_linux-x86_64-2.3.4_gnu4.6-r8.bin

There are more than 8 files at least(we checked he has 121 files for total), but "lfs quota" says there is only one file for this user.

$ lfs quota /home2
Disk quotas for user uits0006 (uid 4879):
     Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
         /home2 6447520  10485760 10485760       -       1       0       0       -
Comment by Li Xi (Inactive) [ 08/May/14 ]

Hi Yawei,

Thanks for you explaination. I have a doubt here. I though the grace time is triggered if and only if the actually used space exceeds the softlimit, but it seems that I was wrong? The grace time is triggered iff the allocated(or can I say 'granted') limits exceeds the softlimit, right? So, that means, there is some difference between allocated limits and actually uses space? If so, then what is the difference?

Thanks,
Li Xi

Comment by Niu Yawei (Inactive) [ 08/May/14 ]

Hi, Ihara
Is this an upgraded system? If it's an upgraded system, probably the old files wasn't accounted due to e2fsprogs bug.

Hi, LiXi
The grace time is triggered when 'granted' exceeding softlimit. We try to minimize the difference (used space vs. granted space) by shrinking qunit size as the granted space approaching the soft/hard limit, so the difference shouldn't be that huge.

Did you just deleted some large files just before run the 'lfs quota'? Can this be reproduced?

Comment by Shuichi Ihara (Inactive) [ 08/May/14 ]

Niu,
Originally, lustre-2.4.2 was installed as initial install, but we upgraded lustre-2.4.3 with debug patches for LU-4249.
It's still old e2fsrpogs (1.42.7.wc1) installed, but never ran fsck, other utilies in e2fsprogs.
Even this case, does LU-4504 issue produce?

Comment by Niu Yawei (Inactive) [ 09/May/14 ]

Originally, lustre-2.4.2 was installed as initial install, but we upgraded lustre-2.4.3 with debug patches for LU-4249.
It's still old e2fsrpogs (1.42.7.wc1) installed, but never ran fsck, other utilies in e2fsprogs.
Even this case, does LU-4504 issue produce?

Did you ever disable/enable quota feature for mdt device? I suggest you upgrade the e2fsprogs and disable/enable quota feature for mdt device to see if the problem can be resolved.

Comment by Shuichi Ihara (Inactive) [ 13/May/14 ]

Hi Niu, We didn't disable/enable quota yet since we needed to umount/mount MDT.We will be going to try this disable/enable, but before that, we really want to makre quota corraption happens even we didn't run any e2fsck or writing data with tool in old e2fsprogs?

Comment by Niu Yawei (Inactive) [ 14/May/14 ]

If tune2fs & e2fsck wasn't used on lustre device, the quota accounting file should be complete. Could you try to create new files to see if it'll be accounted?

Comment by Shuichi Ihara (Inactive) [ 14/May/14 ]

New files are accountted, but total accountted number of files are incorrect.

Comment by Shuichi Ihara (Inactive) [ 14/May/14 ]

And, another problem, as far as we checked "lfs quota -v", total accounted file size is close to sum of each of quota slave accounting size. but sum of quota limit in quota slave is completely wrong of cluster wide quota lmit.

Comment by Niu Yawei (Inactive) [ 14/May/14 ]

And, another problem, as far as we checked "lfs quota -v", total accounted file size is close to sum of each of quota slave accounting size. but sum of quota limit in quota slave is completely wrong of cluster wide quota lmit.

Sum of allocated limit should be less than hardlimit of the system, because there could be some limits not allocated to slaves yet.

Comment by John Fuchs-Chesney (Inactive) [ 10/Jun/14 ]

Shuichi-san,
Do we expect to see any further activity on this ticket?
Is there anything more that you require from HPDD?
Thanks,
~ jfc.

Generated at Sat Feb 10 01:47:54 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.