[LU-12500] group block quota limits not enforced Created: 01/Jul/19  Updated: 01/Jul/19  Resolved: 01/Jul/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.5
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: SC Admin (Inactive) Assignee: Hongchao Zhang
Resolution: Not a Bug Votes: 0
Labels: None
Environment:

2.10.5 + lots of patches on servers, x86_64, zfs 0.7.9, OPA
2.10.7 on clients, x86_64, OPA
group block and inode quotas on.


Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Hi,

it doesn't look like lustre is enforcing group block quota. eg.

 > lfs quota -g oz011 /fred
Disk quotas for grp oz011 (gid 10206):
     Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
          /fred 10893023318*      0 10737418240       -   57475       0 1000000       -
 > dd if=/tmp/urand100 of=/fred/oz011/blah bs=1M
1000+1 records in
1000+1 records out
1048738400 bytes (1.0 GB) copied, 0.995716 s, 1.1 GB/s
 > ls -lsh /fred/oz011/blah
688M -rw-r--r-- 1 user oz011 1001M Jul  1 21:42 /fred/oz011/blah
 > lfs quota -g oz011 /fred
Disk quotas for grp oz011 (gid 10206):
     Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
          /fred 10894895294*      0 10737418240       -   57477       0 1000000       -

I can see old quota bugs that look similar, but none currently open.

all our directories are setgid. ie.

 > ls -ld /fred/oz011
drwxrws--- 17 root oz011 33280 Jul  1 21:42 /fred/oz011

server is 2.10.5 plus these

lu11082-lu11103-stuckMdtThreads-gerrit32853-3dc08caa.diff
lu11418-refreshStale-gerrit33401-v4-71f409c9.diff
lu11111-lfsck-gerrit32796-693fe452.ported.patch
lu11418-stopOrphCleanupDaThreadSpinning-gerrit33662-45434fd0.diff
lu11201-lfsckDoesntFinish-gerrit33078-4829fb05.patch
lu11419-lfsckDoesntFinish-gerrit33252-22503a1d.diff
lu11301-stuckMdtThreads2-c43baa1c.patch
lu11663-partialPageCorruption-gerrit33748-18d6b8fb.diff
lu11418-hungMdtZfs-gerrit33248-eaa3c60d.diff

not all of which are in 2.10.x AFAIK (all but one are in 2.12?), so it'd unfortunately be quite a bit of work to update servers to 2.10.8.

clients are all stock 2.10.7

thanks

cheers,
robin



 Comments   
Comment by SC Admin (Inactive) [ 01/Jul/19 ]

a bit more info is that if I wait a while then the "size on disk" of the file increases. ie. now it's

 > ls -lsh /fred/oz011/blah
915M -rw-r--r-- 1 user oz011 1001M Jul  1 21:42 /fred/oz011/blah

perhaps that's normal. I forget... :-/

a 3rd bit of info is that I suspect this quota issue also causes unusually high load on OSS's. we recently had a user in (what we now realise was) an over-quota group running 300+ i/o intensive jobs. this caused load on OSS's of 200+. one OSS was STONITH'd because it hit a timeout - probably just from the load.

cheers,
robin

Comment by Peter Jones [ 01/Jul/19 ]

Hongchao

Can you please advise?

Thanks

Peter

Comment by Patrick Farrell (Inactive) [ 01/Jul/19 ]

It would be good to check that you've got quota enforcement enabled, and not just accounting.

This is described in the quota section of the Lustre operations manual, but you can check with this command on the MDS:

lctl get_param osd-*.*.quota_slave.info 

If you do not have both 'u' and 'g' under 'enabled', then quota enforcement is not enabled.

Comment by SC Admin (Inactive) [ 01/Jul/19 ]

ah. a beer for Patrick

 # cexec -p warble:2 oss:1-10 'lctl get_param osd-*.*.quota_slave.info' | grep enable
warble warble2: quota enabled:  g
warble warble2: quota enabled:  g
warble warble2: quota enabled:  g
oss arkle1: quota enabled:  g
oss arkle1: quota enabled:  g
oss arkle2: quota enabled:  g
oss arkle2: quota enabled:  g
oss arkle3: quota enabled:  g
oss arkle3: quota enabled:  g
oss arkle4: quota enabled:  g
oss arkle4: quota enabled:  g
oss arkle5: quota enabled:  g
oss arkle5: quota enabled:  g
oss arkle6: quota enabled:  g
oss arkle6: quota enabled:  g
oss arkle7: quota enabled:  g
oss arkle7: quota enabled:  g
oss arkle8: quota enabled:  g
oss arkle8: quota enabled:  g
oss arkle9: quota enabled:  none
oss arkle9: quota enabled:  none
oss arkle10: quota enabled:  none
oss arkle10: quota enabled:  none

we added 2 more OSS's a while back, but looks like the conf_param isn't inherited to new OSS's.

is that expected behaviour?

I re-did the conf_param on the MGS

[warble1]root: lctl conf_param dagg.quota.ost=g

and now it looks like dd's still writes some data, but not much, and at least there's an error coming back too ->

 > dd if=/tmp/urand100 of=/fred/oz011/blah bs=1M
dd: error writing '/fred/oz011/blah': Disk quota exceeded
10+0 records in
9+0 records out
9437184 bytes (9.4 MB) copied, 0.404197 s, 23.3 MB/s
 > dd if=/tmp/urand100 of=/fred/oz011/blah2 bs=1M
dd: error writing '/fred/oz011/blah2': Disk quota exceeded
9+0 records in
8+0 records out
8388608 bytes (8.4 MB) copied, 0.199364 s, 42.1 MB/s

so that's probably fine.

if this is all expected behaviour then please close this ticket. thanks!

cheers,
robin

Comment by Patrick Farrell (Inactive) [ 01/Jul/19 ]

Yeah, this is not ideal behavior (re: inheritance with the new OSTs), but it is expected.

Glad to be of assistance!

Comment by Patrick Farrell (Inactive) [ 01/Jul/19 ]

Config issue at customer site.

Generated at Sat Feb 10 02:53:08 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.