[LU-5152] Can't enforce block quota when unprivileged user change group - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Minor
Fix Version/s: Lustre 2.11.0, Lustre 2.12.0, Lustre 2.10.4
Affects Version/s: None
Labels:
None

Severity:
3
Bugzilla ID:
13,493
Rank (Obsolete):
14216

Description

A quota bug which affects all versions of Lustre was recently revealed:

If an unprivileged user belongs to multiple groups, when she changes her file from one group to another, block quota won't be enforced.

Above situation was never been considered during quota design (from the first version of quota to current new quota), probably such use case is very rare in the real world, otherwise, it would be reported earlier.

I think we'd fix it in the current new quota arch to make Lustre quota complete.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

LU-5152-proposal-v1
11 kB
30/Mar/17 6:28 AM

Issue Links

is duplicated by

LU-8598 flaw in the ldiskfs group quota enforcement

Resolved

is related to

LU-7266 Fix LDLM pool to make LRUR working properly

Open

LU-11465 OSS/MDS deadlock in 2.10.5

Resolved

LU-10048 osd-ldiskfs to truncate outside of main transaction

Resolved

LU-11227 client process hangs when lod_sync accesses deactivated OSTs

Resolved

is related to

LU-11303 slow chgrp as user when quotas are enabled

Resolved

mentioned in: Page Loading...

(1 is related to , 1 mentioned in)

Activity

[LU-5152] Can't enforce block quota when unprivileged user change group

Gerrit Updater added a comment - 18/Nov/18 11:49 PM

Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33682
Subject: Revert "~~LU-5152~~ quota: enforce block quota for chgrp"
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: 3f3e9312be341981060ec1b9912e1b93645c94a8

Gerrit Updater added a comment - 18/Nov/18 11:49 PM Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33682 Subject: Revert " LU-5152 quota: enforce block quota for chgrp" Project: fs/lustre-release Branch: b2_10 Current Patch Set: 1 Commit: 3f3e9312be341981060ec1b9912e1b93645c94a8

Gerrit Updater added a comment - 16/Nov/18 9:47 PM

Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33678
Subject: Revert "~~LU-5152~~ quota: enforce block quota for chgrp"
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 52b06125e4012fd5b347c5237583b41d0254c2b5

Gerrit Updater added a comment - 16/Nov/18 9:47 PM Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33678 Subject: Revert " LU-5152 quota: enforce block quota for chgrp" Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 52b06125e4012fd5b347c5237583b41d0254c2b5

Gerrit Updater added a comment - 09/Feb/18 10:30 PM

John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/31210/
Subject: ~~LU-5152~~ quota: enforce block quota for chgrp
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: 07412234ec60de20cb8d8e45d755297fe6da2d61

Gerrit Updater added a comment - 09/Feb/18 10:30 PM John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/31210/ Subject: LU-5152 quota: enforce block quota for chgrp Project: fs/lustre-release Branch: b2_10 Current Patch Set: Commit: 07412234ec60de20cb8d8e45d755297fe6da2d61

Gerrit Updater added a comment - 07/Feb/18 7:46 PM

Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/31210
Subject: ~~LU-5152~~ quota: enforce block quota for chgrp
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: ccf8091ea0b36c1ff540eb910c9bce268a47d874

Gerrit Updater added a comment - 07/Feb/18 7:46 PM Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/31210 Subject: LU-5152 quota: enforce block quota for chgrp Project: fs/lustre-release Branch: b2_10 Current Patch Set: 1 Commit: ccf8091ea0b36c1ff540eb910c9bce268a47d874

Peter Jones added a comment - 06/Feb/18 5:05 AM

Landed for 2.11

Peter Jones added a comment - 06/Feb/18 5:05 AM Landed for 2.11

Gerrit Updater added a comment - 06/Feb/18 4:27 AM

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30146/
Subject: ~~LU-5152~~ quota: enforce block quota for chgrp
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 8a71fd5061bd073e055e6cbba1d238305e6827bb

Gerrit Updater added a comment - 06/Feb/18 4:27 AM Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30146/ Subject: LU-5152 quota: enforce block quota for chgrp Project: fs/lustre-release Branch: master Current Patch Set: Commit: 8a71fd5061bd073e055e6cbba1d238305e6827bb

Gerrit Updater added a comment - 17/Nov/17 11:34 AM

Hongchao Zhang (hongchao.zhang@intel.com) uploaded a new patch: https://review.whamcloud.com/30146
Subject: ~~LU-5152~~ quota: enforce block quota for chgrp
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 6457dbdd5f76a5bfd90f6d0383c26eaa67afb2f8

Gerrit Updater added a comment - 17/Nov/17 11:34 AM Hongchao Zhang (hongchao.zhang@intel.com) uploaded a new patch: https://review.whamcloud.com/30146 Subject: LU-5152 quota: enforce block quota for chgrp Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 6457dbdd5f76a5bfd90f6d0383c26eaa67afb2f8

Gerrit Updater added a comment - 16/Sep/17 7:38 AM

Hongchao Zhang (hongchao.zhang@intel.com) uploaded a new patch: https://review.whamcloud.com/29029
Subject: ~~LU-5152~~ quota: enforce block quota for chgrp
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 65ad2a61d710f1bd410fdf59a96a34497e233de0

Gerrit Updater added a comment - 16/Sep/17 7:38 AM Hongchao Zhang (hongchao.zhang@intel.com) uploaded a new patch: https://review.whamcloud.com/29029 Subject: LU-5152 quota: enforce block quota for chgrp Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 65ad2a61d710f1bd410fdf59a96a34497e233de0

Niu Yawei (Inactive) added a comment - 12/Apr/17 4:42 AM

Considering that we already have problems with quotas being spread across OSTs, I think that spreading quotas across all of the clients can become even worse. If each client in a 32000-node system needs 32MB to generate good RPCs, that means 1TB of quota would be needed. Even with only 1MB of quota per client this would be 32GB of quota consumed just to generate a single RPC per client.

Right, but to implement an accurate quota for chgrp & cached write, I think that's probably the only way we have. It's worthy noting that these reserved quotas can be reclaimed when server is short of quotas (usage approaching limit), and inactive client (no user/group write from that client) should have 0 reservation at the end.

I was thinking that the quota/grant acquire could be done by enqueueing the DLM lock on the quota resource FID, and the quota/grant is returned to the client with the LVB data, and the client keeps this LVB updated as quota/grant is consumed. When the lock is cancelled, any remaining quota/grant is returned with the lock.

My plan was to use single lock for all IDs (not per ID lock), and that lock will never be revoked, I just want to use it's existing scalable glimpse mechanism to reclaim 'grant' or notify limit set/cleared.

The MDS would need to track the total reserved quota for the setattr operations, not just checking each one. It would "consume" quota locally (from quota master) for the new user/group for each operation, and that quota would need to be logged in the setattr llog and transferred to the OSTs along with the setattr operations. I don't think the MDS would need to query the OSTs for their quota limits at all, but rather get its own quota. If there is a separate mechanism to reclaim space from OSTs, then that would happen in the background.

I think the major drawback of this method is that it increases quota imbalance unnecessarily, when setattr on MDT, it acquires large amount of quota limit from quota master, after a short time when setattr synced to OSTs, OSTs have to acquire the limit back? (If OSTs use the limit packed in setattr log directly, it'll introduce more complexity on limit syncing between master & salves.) If OSTs acquire limit in the first place, such kind of thrashing can be avoided.

And it requires change to quota slave to be aware of setattr log, it needs to scan the setattr log on quota reintegration or on rebalancing.

Another thing needs be mentioned is that limit reclaim on OSTs does happen in the background, but setattr has to wait for the rebalancing done (to acquire limit for MDT), so MDT needs to handle it properly to avoid blocking MDT service thread, also, MDT needs to glimpse OST objects to know current used blocks before setattr. All these work being handled by client looks better to me.

It is true that there could be a race condition, if the file size is growing quickly while the ownership is being changed, but that is not any different than quota races today for regular writes.

Yes, as I mentioned in proposal, I think we can use this opportunity to solve the current problem. It looks to me that both manners require lots of development effort, so my opinion is that we should choose a way that can solve the problem better, at the meantime, the same framework could be reused by other purposes.
BTW, this race window looks not that short to me.

Niu Yawei (Inactive) added a comment - 12/Apr/17 4:42 AM Considering that we already have problems with quotas being spread across OSTs, I think that spreading quotas across all of the clients can become even worse. If each client in a 32000-node system needs 32MB to generate good RPCs, that means 1TB of quota would be needed. Even with only 1MB of quota per client this would be 32GB of quota consumed just to generate a single RPC per client. Right, but to implement an accurate quota for chgrp & cached write, I think that's probably the only way we have. It's worthy noting that these reserved quotas can be reclaimed when server is short of quotas (usage approaching limit), and inactive client (no user/group write from that client) should have 0 reservation at the end. I was thinking that the quota/grant acquire could be done by enqueueing the DLM lock on the quota resource FID, and the quota/grant is returned to the client with the LVB data, and the client keeps this LVB updated as quota/grant is consumed. When the lock is cancelled, any remaining quota/grant is returned with the lock. My plan was to use single lock for all IDs (not per ID lock), and that lock will never be revoked, I just want to use it's existing scalable glimpse mechanism to reclaim 'grant' or notify limit set/cleared. The MDS would need to track the total reserved quota for the setattr operations, not just checking each one. It would "consume" quota locally (from quota master) for the new user/group for each operation, and that quota would need to be logged in the setattr llog and transferred to the OSTs along with the setattr operations. I don't think the MDS would need to query the OSTs for their quota limits at all, but rather get its own quota. If there is a separate mechanism to reclaim space from OSTs, then that would happen in the background. I think the major drawback of this method is that it increases quota imbalance unnecessarily, when setattr on MDT, it acquires large amount of quota limit from quota master, after a short time when setattr synced to OSTs, OSTs have to acquire the limit back? (If OSTs use the limit packed in setattr log directly, it'll introduce more complexity on limit syncing between master & salves.) If OSTs acquire limit in the first place, such kind of thrashing can be avoided. And it requires change to quota slave to be aware of setattr log, it needs to scan the setattr log on quota reintegration or on rebalancing. Another thing needs be mentioned is that limit reclaim on OSTs does happen in the background, but setattr has to wait for the rebalancing done (to acquire limit for MDT), so MDT needs to handle it properly to avoid blocking MDT service thread, also, MDT needs to glimpse OST objects to know current used blocks before setattr. All these work being handled by client looks better to me. It is true that there could be a race condition, if the file size is growing quickly while the ownership is being changed, but that is not any different than quota races today for regular writes. Yes, as I mentioned in proposal, I think we can use this opportunity to solve the current problem. It looks to me that both manners require lots of development effort, so my opinion is that we should choose a way that can solve the problem better, at the meantime, the same framework could be reused by other purposes. BTW, this race window looks not that short to me.

Andreas Dilger added a comment - 11/Apr/17 7:39 PM

could become even worse if the total space is spread among many users/group IDs

Considering that we already have problems with quotas being spread across OSTs, I think that spreading quotas across all of the clients can become even worse. If each client in a 32000-node system needs 32MB to generate good RPCs, that means 1TB of quota would be needed. Even with only 1MB of quota per client this would be 32GB of quota consumed just to generate a single RPC per client.

I'm not sure if I understand you correctly. You mean not acquire/consume grants for usr/group in write RPC, but use a dedicate new RPC to do acquire/release?

I was thinking that the quota/grant acquire could be done by enqueueing the DLM lock on the quota resource FID, and the quota/grant is returned to the client with the LVB data, and the client keeps this LVB updated as quota/grant is consumed. When the lock is cancelled, any remaining quota/grant is returned with the lock.

User can still exceed quota easily if we don't reserve quota for whole setattr operation. Imagine user do a batch chgrp operations (each single file isn't large enough to exceed limit, but the total size exceeds limit), I think the window (before setattr log being synced to OSTs) is big enough for a user to exceed far more quota by this manner. (for example, 1G group limit, 0.9G per file, a batch chgrp to 10 files, group will consume "9 * limit" space at the end)

The MDS would need to track the total reserved quota for the setattr operations, not just checking each one. It would "consume" quota locally (from quota master) for the new user/group for each operation, and that quota would need to be logged in the setattr llog and transferred to the OSTs along with the setattr operations. I don't think the MDS would need to query the OSTs for their quota limits at all, but rather get its own quota. If there is a separate mechanism to reclaim space from OSTs, then that would happen in the background.

It would make sense to ensure there is space in the new "llog_setattr64_rec_v2" being added to 2.10 for project quotas to log quota updates for each of user/group/project in case we need it.

It is true that there could be a race condition, if the file size is growing quickly while the ownership is being changed, but that is not any different than quota races today for regular writes.

Andreas Dilger added a comment - 11/Apr/17 7:39 PM could become even worse if the total space is spread among many users/group IDs Considering that we already have problems with quotas being spread across OSTs, I think that spreading quotas across all of the clients can become even worse. If each client in a 32000-node system needs 32MB to generate good RPCs, that means 1TB of quota would be needed. Even with only 1MB of quota per client this would be 32GB of quota consumed just to generate a single RPC per client. I'm not sure if I understand you correctly. You mean not acquire/consume grants for usr/group in write RPC, but use a dedicate new RPC to do acquire/release? I was thinking that the quota/grant acquire could be done by enqueueing the DLM lock on the quota resource FID, and the quota/grant is returned to the client with the LVB data, and the client keeps this LVB updated as quota/grant is consumed. When the lock is cancelled, any remaining quota/grant is returned with the lock. User can still exceed quota easily if we don't reserve quota for whole setattr operation. Imagine user do a batch chgrp operations (each single file isn't large enough to exceed limit, but the total size exceeds limit), I think the window (before setattr log being synced to OSTs) is big enough for a user to exceed far more quota by this manner. (for example, 1G group limit, 0.9G per file, a batch chgrp to 10 files, group will consume "9 * limit" space at the end) The MDS would need to track the total reserved quota for the setattr operations, not just checking each one. It would "consume" quota locally (from quota master) for the new user/group for each operation, and that quota would need to be logged in the setattr llog and transferred to the OSTs along with the setattr operations. I don't think the MDS would need to query the OSTs for their quota limits at all, but rather get its own quota. If there is a separate mechanism to reclaim space from OSTs, then that would happen in the background. It would make sense to ensure there is space in the new " llog_setattr64_rec_v2 " being added to 2.10 for project quotas to log quota updates for each of user/group/project in case we need it. It is true that there could be a race condition, if the file size is growing quickly while the ownership is being changed, but that is not any different than quota races today for regular writes.

Niu Yawei (Inactive) added a comment - 05/Apr/17 3:22 PM

Thank you for the review, Andreas.

basing the client-side quota on grants could be problematic, since this could cause a large amount of quota to be pinned on clients. This is already a problem with grant, and could become even worse if the total space is spread among many users/group IDs. That said, it makes sense that both grant and quota share a single mechanism.

Right, I agree. That's why I think a reclaim mechanism is necessary, and the reclaim should happen only when it's short of quota limit on slave.
BTW: The 'grant' for quota is per UID/GID, so it won't be worse than current grant. ("and could become even worse if the total space is spread among many users/group IDs")

the DLM callback mechanism to release grants is also my preferred solution, but in that case it probably makes sense to also have the grant request mechanism use the DLM in the same manner to acquire quota grants for a particular UID/GID

I'm not sure if I understand you correctly. You mean not acquire/consume grants for usr/group in write RPC, but use a dedicate new RPC to do acquire/release? I think that new RPC should be used for chgrp, but for cached write, we still need to acquire/consume grants for each user/group by write RPC (few more sets of 'grant' numbers needs be packed in write RPC).

The main question is whether this can be fixed more easily by just having the MDS check the target group's quota before changing the inode's group locally? The window for a user to exceed grant would typically be small enough that this shouldn't be a big problem, and it would be much simpler to implement. Definitely I'm in support of implementing grant revocation and other cleanups proposed, but I think it will take a long time to implement all of these proposed changes?

That was my initial idea, but I gave it up due to two considerations:

User can still exceed quota easily if we don't reserve quota for whole setattr operation. Imagine user do a batch chgrp operations (each single file isn't large enough to exceed limit, but the total size exceeds limit), I think the window (before setattr log being synced to OSTs) is big enough for a user to exceed far more quota by this manner. (for example, 1G group limit, 0.9G per file, a batch chgrp to 10 files, group will consume "9 * limit" space at the end)

Even if above situation is tolerable, it's not that easy for MDT to check block quota limit:
1. It requires MDT to glimpse each object (or pack blocks in setattr RPC?).
2. It requires MDT be aware of block limit.
3. It requires new RPC from MDT to each OST to check limit.
4. The check RPC may trigger slave to acquire limit from master, then trigger master reclaim consequently. like what I described in the proposal, we have to handle this situation on MDT as well. (return -EINPROGRESS to client for the setattr, and let client retry later, that would increase MDT load)

We can see my proposal is just to offload all these jobs to client, and to ensure the quota correctness, an additional work is to reserve quota till whole setattr procedure done, but that one can be achieved by leveraging existing grant code, and a side benefit is that cache write exceeding quota problem can be solved at the same time.

Niu Yawei (Inactive) added a comment - 05/Apr/17 3:22 PM Thank you for the review, Andreas. basing the client-side quota on grants could be problematic, since this could cause a large amount of quota to be pinned on clients. This is already a problem with grant, and could become even worse if the total space is spread among many users/group IDs. That said, it makes sense that both grant and quota share a single mechanism. Right, I agree. That's why I think a reclaim mechanism is necessary, and the reclaim should happen only when it's short of quota limit on slave. BTW: The 'grant' for quota is per UID/GID, so it won't be worse than current grant. ("and could become even worse if the total space is spread among many users/group IDs") the DLM callback mechanism to release grants is also my preferred solution, but in that case it probably makes sense to also have the grant request mechanism use the DLM in the same manner to acquire quota grants for a particular UID/GID I'm not sure if I understand you correctly. You mean not acquire/consume grants for usr/group in write RPC, but use a dedicate new RPC to do acquire/release? I think that new RPC should be used for chgrp, but for cached write, we still need to acquire/consume grants for each user/group by write RPC (few more sets of 'grant' numbers needs be packed in write RPC). The main question is whether this can be fixed more easily by just having the MDS check the target group's quota before changing the inode's group locally? The window for a user to exceed grant would typically be small enough that this shouldn't be a big problem, and it would be much simpler to implement. Definitely I'm in support of implementing grant revocation and other cleanups proposed, but I think it will take a long time to implement all of these proposed changes? That was my initial idea, but I gave it up due to two considerations: User can still exceed quota easily if we don't reserve quota for whole setattr operation. Imagine user do a batch chgrp operations (each single file isn't large enough to exceed limit, but the total size exceeds limit), I think the window (before setattr log being synced to OSTs) is big enough for a user to exceed far more quota by this manner. (for example, 1G group limit, 0.9G per file, a batch chgrp to 10 files, group will consume "9 * limit" space at the end) Even if above situation is tolerable, it's not that easy for MDT to check block quota limit: 1. It requires MDT to glimpse each object (or pack blocks in setattr RPC?). 2. It requires MDT be aware of block limit. 3. It requires new RPC from MDT to each OST to check limit. 4. The check RPC may trigger slave to acquire limit from master, then trigger master reclaim consequently. like what I described in the proposal, we have to handle this situation on MDT as well. (return -EINPROGRESS to client for the setattr, and let client retry later, that would increase MDT load) We can see my proposal is just to offload all these jobs to client, and to ensure the quota correctness, an additional work is to reserve quota till whole setattr procedure done, but that one can be achieved by leveraging existing grant code, and a side benefit is that cache write exceeding quota problem can be solved at the same time.

People

Assignee:: Hongchao Zhang

Reporter:: Niu Yawei (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 26 Start watching this issue

Dates

Created:: 06/Jun/14 4:01 AM

Updated:: 24/Sep/20 12:12 PM

Resolved:: 06/Feb/18 5:05 AM