[LU-7984] Group quota does not reflect actual usage Created: 05/Apr/16  Updated: 21/Jul/17  Resolved: 21/Jul/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.2
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Peter Bortas Assignee: Niu Yawei (Inactive)
Resolution: Won't Fix Votes: 0
Labels: None
Environment:

CentOS 6.5, Linux 2.6.32-431.20.3.el6.x86_64

Everything running on ZFS, llnl's patches to make ZFS usable (13chaos)

lz4 compression enable

Site is Linkoping University


Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   
  1. du -hs --apparent-size .
    15T .
  2. du -hs .
    9,5T .
  3. lfs find . ! -G rcaguess
  4. lfs quota -g rcaguess /rfs18
    Disk quotas for group rcaguess (gid 771):
    Filesystem kbytes quota limit grace files quota limit grace
    /rfs18
    375111456* 0 1048576 - 340878 0 0 -

(Filesystem name edited to protect the guilty, but other than that
it's a cut'n'paste.)

There is such a huge difference between usage measured by the quota
system and reality that it's a real operation issue.

Is this resolvable without downtime? Getting to within 10% of real usage is workable, it doesn't need to be perfect.

If not, what is the procedure to do an off-line fsck of the quota in 2.4?



 Comments   
Comment by Peter Bortas [ 05/Apr/16 ]

Jira decided to auto-convert the root prompts to a numbered list. Imagine hashes instead of the list numbers...

Comment by Peter Jones [ 06/Apr/16 ]

Niu

Could you please advise?

Thanks

Peter

Comment by Niu Yawei (Inactive) [ 06/Apr/16 ]

I don't think there is any way to do offline fsck for zfs.

Given the huge difference of quota output and du output, I suspect some OST objects got wrong ownership, and there were three known bugs which could lead to the inconsistent ownership on OST objects:

  • LU-4345. If you are running 2.4.2-llnl, I believe the fix has already been included. (Could you verify if your Lustre is the latest 2.4.2-llnl?)
  • LU-5006. If Application creates file by mknod (or by 'lfs setstripe') then changes the file's ownership, the ownership won't be set properly on OST objects.
  • LU-7205. This is a rare race and very unlikely to happen.

The way to fix the inconsistent ownership is: find out all the files belong to group 'xxx', then explicitly change files's group to 'xxx' again (by chown or chgrp command from client), it can be done online. (The prerequisite is LU-4345 is included)

Comment by Peter Bortas [ 06/Apr/16 ]

Thanks Niu, that gives me some threads to start pulling in.

Re: "I don't think there is any way to do offline fsck for zfs"

What I meant was a way to verify and update the combined quota counter against the objects in the OST. There is as you say no fsck for zfs outside of online scrub.

Is there a way to map Lustre file/objid -> zfs object/attribute to verify with zdb that wrong ownership of the object really is the problem?

Comment by Niu Yawei (Inactive) [ 07/Apr/16 ]

What I meant was a way to verify and update the combined quota counter against the objects in the OST.

I don't think zfs has that kind of tool neither.

Is there a way to map Lustre file/objid -> zfs object/attribute to verify with zdb that wrong ownership of the object really is the problem?

With zfs, I think you have to mount the OST as zfs then verify the objects' owner, that requires OST offline.

Comment by Niu Yawei (Inactive) [ 18/Jul/17 ]

Any updates on this? Can we close this one? Thanks.

Comment by Peter Jones [ 21/Jul/17 ]

Seems like there are no further questions on this for the time being

Generated at Sat Feb 10 02:13:37 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.