[LU-11023] OST Pool Quotas Created: 16/May/18  Updated: 30/May/23  Resolved: 14/May/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.14.0

Type: New Feature Priority: Major
Reporter: Andreas Dilger Assignee: Sergey Cheremencev
Resolution: Fixed Votes: 0
Labels: DoM2, FLR2

Attachments: PDF File CLUG2013_Pool-Support-of-Quota_final.pdf     PDF File HLD Lustre Pool Quotas-Cray201902.pdf     PDF File OST_POOL_Based_Quota_Design.pdf    
Issue Links:
Cloners
Clones LU-4017 Add project quota support feature Resolved
Related
is related to LU-9809 RTDS(Real-Time Dynamic Striping): A p... Open
is related to LU-7816 Add default quota setting support for... Resolved
is related to LU-13359 change lfs quota --pool to print used... Resolved
is related to LU-9 Optimize weighted QOS Round-Robin all... Open
is related to LU-13058 Intermediate component removal (PFL/SEL) Open
is related to LU-16857 Allocate OST objects from spill pool ... Open
is related to LU-11022 FLR1.5: "lfs mirror" usability for Bu... Resolved
is related to LU-13587 sanity-quota test_68: Oops: RIP: qpi_... Resolved
is related to LU-13840 Remove use of "-o" for lfs setquota Resolved
is related to LU-13954 add OST pool quota options for lfs-qu... Resolved
is related to LU-13756 qmt_pool_lock leak in qmt_pool_lookup() Resolved
is related to LU-13810 Check OST pool quota hard limit at fi... Resolved
is related to LUDOC-467 Add feature documentation for Pool Qu... Resolved
is related to LU-11571 MDT pool Open
is related to LU-10995 DoM2: allow MDT-only filesystems Open
is related to LU-13066 RR vs. QOS allocator should be tracke... Resolved
is related to LU-13445 enhance ha.sh to support different users Resolved
is related to LU-13952 add default quota to OST pool quotas Resolved
is related to LU-13971 Report Pool Quotas for a user Resolved
is related to LU-14071 add OST pool quota options for lfs-qu... Resolved
Rank (Obsolete): 9223372036854775807

 Description   

OST (or MDT) pool feature enables users to group OSTs together to make object placement more flexible which is a very useful mechanism for system management. However the pool support of quota is not completed now which limits the use of it. Luckily current quota framework is really powerful and flexible which makes it possible to add new extension.



 Comments   
Comment by Andreas Dilger [ 16/May/18 ]

I've created a new issue for tracking pool quota, or possibly OST/MDT quota, since LU-4017 was used for Project Quota and closed. There are lots of good discussions in that ticket on this feature.

Having the ability to put separate quotas on OSTs/MDTs (either directly or via pools) is important for production deployment of both Data-on-MDT to limit space usage on MDTs, as well as FLR for burst-buffer implementation to limit usage on flash OSTs. I'm not fixed on linking this quota to OST pools since there are some complexities there, and we'd also want to have MDT pools for that to be useful for DoM, but I think some kind of limits are needed for these use cases.

Comment by Nathan Rutman [ 29/Aug/18 ]

The design doc seems to have a problematic concept: a new EA with the "pool the object belongs to". The OST belongs to a pool (or pools), but the object does not belong to a pool itself. Or put another way, all objects on the OST belong to same set of pools that the OST belongs to. I guess the original idea in the doc was to try to make OST pools "look like" directory quotas by setting a new pool per directory, and so can't handle a single object in more than one pool. If we drop this idea, then we can drastically simplify the pool quotas design.

  • Don't need EA for each object.
  • OSTs don't need to know which pool they are part of.
  • Don't need to send pool configuration or changes to OSTs. 
  • Don't need to hash pool names.
  • All pool knowledge remains local to MDS.

Why all that? Because all the handling of pool quotas can be confined to the quota master on the MDS. The OSTs just continue to request a single user or group quota from quota master. The master knows which pool(s) the OST is a member of, and just checks the quota for each pool, returning the minimum remaining amount for that OST to the slave. 

E.g. for this case
lfs setquota --block-hardlimit 2097152 -u user1 -p flash /mnt/lustre
lfs setquota --block-hardlimit 2097152 -u user2 -p flash /mnt/lustre
lfs setquota --block-hardlimit 2097152 -u user1 /mnt/lustre

there would be 4 quotas files created (2 per pool): admin_quotafile.usr, admin_quotafile.grp, admin_quotafile.usr.flash, admin_quotafile.grp.flash. For a quota acq request from an OST in the flash pool, the MDS would check all four files and return the minimal amount remaining. For an OST not in the flash pool, it would not check the .flash quotafiles.

The more pools we have, the more quotafiles, and so quota checks will go incrementally slower, but I think this is acceptable.

So after two hours of looking into this, I think this should be relatively easy to do. Am I missing something?
 

Comment by Andreas Dilger [ 30/Aug/18 ]

It would be great if you are going to implement this feature. This is one of the major gaps for tiered storage being really usable within Lustre. If we have e.g. flash OSTs in the filesystem there is currently no way to exclude them from regular usage (e.g. if someone creates files without specifying a pool).

Typically, the flash OSTs will also be smaller than the disk OSTs, so they will also fill up more quickly. Having the object allocator tied into pool quotas will ensure that the flash OSTs are skipped when a user doesn't have any remaining quota there, or was never given any in the first place. While the MDS allocation-time decision is not going to prevent all abuses (e.g. user creates a million small files in the flash pool, then tries to write lots of data into each one) it will at least avoid the majority of such issues.

Conveniently, the "default quota" functionality (LU-7816) was just landed to 2.12, so proper integration with this should allow admins to configure (if they want) "by default users have no quota/X quota on the flash pool" instead of having to explicitly configure this for all users on all pools.

Nathan, would you be able to write up a revised design doc that explains your proposed solution. It should include some reasonable use cases (in particular the tiered storage case with a flash OST pool and a disk OST pool that allows users some limited amount of space in the flash pool that can be time-limited to a short time like 24h). There also needs to be consideration on how the quota tools will be able to specify the quota limits and how this will integrate into the allocator on the MDS.

Comment by Nathan Rutman [ 30/Aug/18 ]

It's on Cray's short list for implementation (Cray ticket LUS-5801). We considered including pool quotas in allocator decisions, but came to the conclusion that we should not: it was the user's decision to use this pool; it's not really the MDS's role to second-guess and use a different pool that what it was told. 

In any case, I'd prefer to get the pool quotas restrictions first (in this ticket), then consider the allocator changes as a follow-on. (Frankly, I think the allocator is in bad need of a complete rewrite in any case.)

Comment by Cory Spitz [ 09/Nov/18 ]

I've added this LU to http://wiki.lustre.org/Projects.  @sergey from Cray will be picking this up within the next month.

Comment by Nathan Rutman [ 03/Dec/18 ]

A design question:

When destroying a pool, does it make more sense to destroy all associated pool quota settings, or retain them in case the pool is recreated?

 

Comment by Cory Spitz [ 03/Dec/18 ]

What's the case for keeping them?  If it is to make the user's life easier if/when a pool is recreated just give the user a tool to save the config.  Remember, the system will still have to do some re-accounting when the quotas are respecified.

 

Comment by Andreas Dilger [ 04/Dec/18 ]

Cory, I think there are two separate issues here. There is the pool usage accounting, which is just based on the per-OST quota accounting, and is not accounted on a per-pool basis. There is a separate pool limit file, which is what the administrator specifies for each user (e.g. adilger gets 1TB in the flash pool), which contains potentially a lot of custom information and does not necessarily become stale when the pool name is removed.

Given that the pool quota files are probably going to be relatively small, I'm not against keeping them if the pool is removed, so long as there is some way to actually delete the quota limits. Otherwise, I foresee that some admin will have a problem with their pool quota file, try to remove the pool and recreate it, and not be able to resolve their problem.

Comment by Sergey Cheremencev [ 21/Dec/18 ]

Hello !
I would like to discuss OST pools and Quota pools interaction.

There are 2 ways:

  1. Quota pools use get all info from OST pools.
    Benefits: OST pools currently exist, don't need to add the same code.
    Drawbacks: quota pool master and OST pools are located on different obd devices.
    I am not sure I see a nice way to communication between them. Thoughts, ideas how it should/could be done ?
    Configs ?
  2. Quota pools doesn't depend on OST pools. We need to add just 2 commands like "quota_pool_new" and "quota_pool_add".
    Benefits: we can use these commands for DoM: "lfs quota_pool_new -type DOM dom_pool1 /mnt/lustre". And even combine OST and MDT pools in one: "lfs quota_pool_new -type DOM,OSS fast_pool /mnt/lustre".
    Drawbacks: new code that partially clone OST pools functionality.
Comment by Andreas Dilger [ 21/Dec/18 ]

IMHO it would be confusing/annoying to have to configure OST pools separately from pool quotas. People are already using OST pools for allocating files on specific OSTs, so having to define and configure pool quotas separately (and possibly differently by accident) would cause a lot of support issues/questions.

Even though the quota master (MDT0000) is on a different device from the MGS, it should be that MDT0000 has all of the pool information because it is using the pools to do allocation.

I think the biggest effort would be to allow MDTs to be added to pools, and have this affect inode allocation is the MDTs are added to a pool.

Comment by Sergey Cheremencev [ 25/Dec/18 ]

Thanks for the answer, Andreas.

Have one more item for discussion.
Right now to address the pool is used qpi_key that is composed of (pool_id | (pool_type << 16)).
As OSTs shouldn't know anything about pool_id's we can change this to use LOD pool name for qpi_key.
And use pool_name as a name for directory with quota_files.
Now we have:

├── changelog_catalog
├── ...
├── quota_master
│   ├── dt-0x0
│   │   ├── 0x1020000
│   │   ├── 0x1020000-OST0000_UUID
│   │   ├── 0x1020000-OST0001_UUID
 

Instead we can have something like

├── quota_master
│   ├── dt-pool1
│   │...
│   ├── dt-poolN
 

However I guess pool_id could be useful later. Possibly to group disks directly on the OST.
Thus if we want to hold all the code existed around pool ids we need to have a mapping from Quota pool to LOD pool.
For example each Quota pool can store both pool name and pool id. Pool name is to address LOD pools and appropriate pool id to operate with quota pools by current scheme.

Comment by Sergey Cheremencev [ 29/Dec/18 ]

Please ignore my previous comment - it is not actual right now.
I've looked to quota code carefully and not sure that we can use the model when OSTs know nothing about different pools and all pool quota accounting is performed on MDS.

Comment by Andreas Dilger [ 29/Dec/18 ]

According to Nathan's proposal, which I agree with, the concept of a pool quota would be something only understood by the MDT, basically adding the per-OST quotas together based on which OSTs belong in a pool. This would be similar to how "lfs df -p" works on the client, only adding the free space from OSTs that are part of a pool.

Comment by Nathan Rutman [ 14/Feb/19 ]

I've attached our HLD for review (Sergey and I both worked on it).  Please let us know any comments or concerns; we'll be implementing this shortly.

Comment by Patrick Farrell (Inactive) [ 14/Feb/19 ]

Referring to the example about qunit calculations in:

"5.2.2. Qunit changes "

I won't quote the whole example.

In the case described in the doc, of two overlapping pools, the user is close to out on one of those OSTs because of that pool.  So it is affecting performance, but also, if they've got anything using that OST, then they're almost out of quota.  And the striping policy doesn't take quota in to account, so files will get striped to that OST, so they will use it...  So unless you're making special efforts to avoid it, you'll run out of quota there while using the other pool.

So I don't think this is worse than today in ways that matter, and I think "do nothing" would be acceptable...?  "Do nothing" with advice to avoid overlapping pools where possible?

 

Comment by Sergey Cheremencev [ 15/Feb/19 ]

"Current code anticipated support for quota pools, but expected a scheme where the pool ID comes from the slaves. We therefore can't use the existing structure without breaking our requirement that OST-side quota tracking remains unaffected."

As we can't use already existed "quota pools", new feature should have different name. Propose to name this "quota *s*pools"(slave pools).
On the other side term "pools" is known for a long times as a pool of OSTs. So for users it is better to call new feature - "Quota pools".
I suggest to rename existed "quota pools" to something like "quota dpools"(disk pools) or "quota dcomb"(disk combines) and keep term "Quota pools" for current feature.

C02TM06XHTD6:quota c17829$ grep -R pool . | wc -l
     383

What does community think about rename ?

Comment by Andreas Dilger [ 26/Feb/19 ]

Will you be implementing MDT pools as part of this effort, or is that not in the current plans?

Comment by Sergey Cheremencev [ 26/Feb/19 ]

MDT pools part is not in the current plans.

I guess this work should be started with implementing MDT pools. Possibly we need some independent pools layer including both MDTs and MDTs pools that can be available from LOD, quota and MDD.

Comment by Andreas Dilger [ 26/Feb/19 ]

Just reviewing the HLD, some general comments:

  • 6.2 examples do not match the use case presented in 6.1? It would make it more clear if the numbers in these sections matched, so that the reader can "follow along" with the examples after the scenario is described in 6.1.
  • 6.2.2,4 and 6.3.3 examples suppose the case where the user's pool quota is less than the free space on any OST. It doesn't seem that there is any coordination between the MDS object allocation and the pool quota space? What happens when Bob tries to create and write to a new file in a directory not associated with any pool, in the case where he is out of quota on OST10-20, and only 0.1GB on OST5-9? If the MDS object allocator doesn't take the pool quota into account, it is entirely possible that the file objects will be allocated on the over-quota OSTs, and Bob will get -EDQUOT even though there are OSTs that have space and quota available he could write to? It would seem that having at least some communication between LOD and QMT for object allocation would avoid this issue. It wouldn't need to be continuous (i.e. not for every object), but rather similar to how LOD is periodically checking the statfs for QOS allocation. It may be that each user gets a separate "mask" of "available space" for each OST, based on available quota that limits the QOS algorithm's selection of OSTs for their file creations.
  • I saw in the comments at one point that destroying a pool would preserve the quota limits, so they are available if the pool is recreated (which may happen if e.g. there is some problem with the config logs or similar), but this is not reflected in the HLD. IMHO, this behaviour makes sense, since assigning user/group/project quotas for pools is typically cumbersome work, and admins may not have scripts to do this or backups. My understanding is that setting quotas for user/group/project is already a bit of a chore, so we don't want to make it harder. If there is no pool definition, then the left-over pool quota limits could just be ignored completely (e.g. not loaded into memory)? Is there a reason this was removed?
  • as for existing "pool ID" support in quotas, my (limited) understanding is that this is all totally unused. At one time we discussed mapping a pool name to a numeric ID, but that was dropped due to complexity. There was also an old proposal to allow a "FID" to be a pool ID, allowing a directory to have a quota for all of the tree underneath it. However, that was later replaced with project quotas based on the XFS design, which has since been implemented for ext4 and ZFS as well, so it is unlikely to change in the future. Granted that I'm not very familiar with the quota code, I'm fine with getting rid of the whole idea of numeric IDs for quotas and using names instead, if the protocol supports it. We'd probably have to watch out for mapping the string name to the network protocol so that it does not conflict with existing FID usage.

Hongchao, can you please take a look at the HLD and provide your input on the pool ID issue, as well as any other thoughts.

Comment by Patrick Farrell (Inactive) [ 26/Feb/19 ]

"I saw in the comments at one point that destroying a pool would preserve the quota limits, so they are available if the pool is recreated (which may happen if e.g. there is some problem with the config logs or similar), but this is not reflected in the HLD. IMHO, this behaviour makes sense, since assigning user/group/project quotas for pools is typically cumbersome work, and admins may not have scripts to do this or backups. My understanding is that setting quotas for user/group/project is already a bit of a chore, so we don't want to make it harder. If there is no pool definition, then the left-over pool quota limits could just be ignored completely (e.g. not loaded into memory)? Is there a reason this was removed?"

I'm not sure of the details on its removal, but I said (at some point, possibly in discussions at Cray) that I thought this would potentially be confusing and have relatively little utility.

Basically, what if the creator of a pool doesn't want quotas and doesn't realize they're re-using a name?  It seems quite unpleasant to have "surprise" quotas.

Then, also, when do we get rid of the pool quota info for pools that are gone?

One of the ideas for pool quotas is that a workload manager or similar is creating them dynamically on a per-job basis, potentially both pools and pool quotas.  So the old quotas could really pile up.  Maybe they're so tiny it doesn't really matter...  (Obviously, we could age them out or something, but it just adds complexity.)

Comment by Patrick Farrell (Inactive) [ 26/Feb/19 ]

And, yeah, pool id is totally unused.  I'm confused about how discussion of it crept back in to the design doc - Are you guys planning to implement pool ids?  They effectively don't exist today, despite a little bit of old code for them still being present.

Comment by Andreas Dilger [ 27/Feb/19 ]

One of the ideas for pool quotas is that a workload manager or similar is creating them dynamically on a per-job basis, potentially both pools and pool quotas. So the old quotas could really pile up. Maybe they're so tiny it doesn't really matter... (Obviously, we could age them out or something, but it just adds complexity.)

This would be a bad implementation, from a configuration point of view. Pools are stored in the Lustre config log, so that they are visible on the clients, but dynamically creating and removing the pools would quickly exhaust the available config llog space. I could see that the quotas might be changed on a per-job basis, but it seems unlikely that the hardware composing a pool would change frequently? If they really wanted per-OST quotas, then just configure one OST per pool and grant quota to the subset of OSTs that are desired.

IMHO, in the longer term, it would be desirable to allow users to make their own "pseudo pools", either by allowing something like "lfs setstripe -o X,Y,Z -c1" to work in the same way as a pool (select 1 OST out of the list of OSTs "X, Y, Z"), or by leveraging https://review.whamcloud.com/28972 "LU-9982 lustre: Clients striping from mapped FID in nodemap" to allow creating "template" layout files in e.g. ~/.lustre/ostXYZ_1stripe (possibly using the above "pseudo pool") and then using it like "lfs setstripe -F ostXYZ_1stripe /path/to/new/file" to allow them to have named default layouts of their choosing.

Comment by Sergey Cheremencev [ 27/Feb/19 ]

Basically, what if the creator of a pool doesn't want quotas and doesn't realize they're re-using a name?  It seems quite unpleasant to have "surprise" quotas.

Yes, it was the main reason why we decided it is better to remove quota pool files together with appropriate pool.

And, yeah, pool id is totally unused.  I'm confused about how discussion of it crept back in to the design doc - Are you guys planning to implement pool ids? 

No, we are not planning. Furthermore I've already started to write the code and place "new quota pools" in parallel with "old quota id pools". If we make a decision that existing quota pools could be removed, I stop my work and start with a patch that removes existing quota pools. It can save a lot of time because removing existing pools at final stage will need more effort.

Comment by Nathan Rutman [ 28/Feb/19 ]

MDT pools part is not in the current plans.

Right - this pools quota work is in no way related to hypothetical MDT pools

the MDS object allocator doesn't take the pool quota into account

Right again. Although we agree this would be a nice feature, we are not lumping a big effort like changing the allocator into this ticket. The allocator needs to get some attention, but not only related to this:

  • take into account remaining quotas
  • LU-9809
  • take into account mirror locations
  • LU-10070
  • enable other patterns like "fill groups of OSTs one at a time" for large systems
  • capability-aware allocator
  • fix the QoS which is too opaque/broken?
  • LU-9982
  • lfs setstripe -o X,Y,Z -c1

So we are not going to mess with it for this ticket.

at one point that destroying a pool would preserve the quota limits, so they are available if the pool is recreated

We removed this, as we felt that lingering settings would just be confusing. We don't really expect people to be destroying and then recreating pools, but normally if I destroy something I want it to be dead and gone, and part of the reason I am destroying and re-creating is to clear out something that was confusing/broken/unknown.

setting quotas for user/group/project is already a bit of a chore

Yes, and we are actually thinking about another feature: default quotas. Right now, unset quotas means that there are no limits. Instead, we are thinking about defining a "default" quota user, such that if a user has no explicit quota setting, she gets the default quota. This could of course be set for a pool quota as well. But we will be working on this in a separate ticket; not here.

"pool ID" support in quotas, my (limited) understanding is that this is all totally unused.

It is unused. It's going to take us some significant effort to remove, and will interrupt Sergey's current progress. We are willing to do this, and will include this as a first patch here. If anyone objects to this, PLEASE SPEAK UP NOW since we will shift to working on this immediately.

 

Comment by Andreas Dilger [ 28/Feb/19 ]

Note that there is already a mechanism in new Lustre release to have a default quota user. This was added in patch https://review.whamcloud.com/32306 "LU-7816 quota: add default quota setting support".

What is missing is a good way to backup and restore quota settings.

Comment by Sergey Cheremencev [ 04/Mar/19 ]

I started thinking about existing quota pools removing.
At first look this task has several problems that may move aside “target quota pools”:

  1. If we want to fully remove existing pools, a lot of code should be changed that increases chances of regressions.
  2. Question about compatibility. Now all quota index file names consist of “pool id + pool type + pool_id + slave_uuid(optionally for slave indexes)“. The name(except _UUID part for slaves) is equal to it’s fid->oid. In case of removing quota pools we also need to rename all this files according to some new rules. It could be a problem in case of upgrade/downgrade. Furthermore we should save part of code for some long period of time to have ability do rename from old to new names.
  3. Theoretically, existing quota pools code could be used for “new quota pools”.

So, what if I just will try to reuse existing quota pools for new feature purposes ?
For example one of the main places where existing pools are searched by ID is a request handler - qmt_dqacq.
We can use change qmt_pool_lqe_lookup->qmt_pool_lookup to return not only one pool with ID 0, but the list of pointers to all pools that includes OST/MDT from where MDT0 get quota request. And go to the direction when we always work with a list of pools and LQEs.

qmt_pool_info(used to describe currently quota pools) includes all needed for LOD quota pools(just several fieldes should be added). So no reasons to remove it and add the same structure but with another name. I believe the main part of functions from qmt_pool.c can also be used without big changes.

If suggested way is acceptable, we will finally have following hierarchy:

├── quota_master
│   ├── dt-0x0
│   │   ├── 0x1020000
│   │   ├── 0x1020000-OST0000_UUID
│   │   ├── 0x1020000-OST0001_UUID
│   │   ├── 0x20000
│   │   ├── ....
│   ├── md-0x0
│   │ ├── 0x10000
│   │ ├── 0x1010000
│   │ ├── ...
│   └── pools
│   │ ├── pool1_usr
│   │ ├── pool1_grp
│   │ ├── pool1_prj
│   │ ├── pool2_usr
│   │ ├── ...
├── quota_slave
│   ├── 0x10000
│   ├── 0x10000-MDT0000
│   ├── ...
Comment by Gerrit Updater [ 11/Mar/19 ]

Sergey Cheremencev (c17829@cray.com) uploaded a new patch: https://review.whamcloud.com/34389
Subject: LU-11023 quota: remove quota pool ID
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 0d8a127b778b3e17d5efff1bd4b544632457b7cf

Comment by Gerrit Updater [ 15/Apr/19 ]

Sergey Cheremencev (c17829@cray.com) uploaded a new patch: https://review.whamcloud.com/34667
Subject: LU-11023 quota: tune quota pools through LCFG
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 035e8052b1c141857b709cc4662f9b710d7edec6

Comment by Sergey Cheremencev [ 21/Jun/19 ]

Hello !

Small update about my work on quota pools. The work still is in progress. But the main part is finished.
Now it passes simple tests (like sanity_quota test_1) with a couple quota pools with different hard limits.
However I see a lot of work in a future.

I have a question about "lfs quota".

  1. Should "lfs quota -u/g/p" show information for all existing pools for requested ID(u/g/p) ?
  2. I am going to introduce a key to show  quota information only for requested pool. For "lfs   setquota" to set pool settings I am using key "-o" because keys "-p/-P" are already used for project. But I can't use "-o" with "lfs quota" because it is used to set obd_uuid. Could someone suggest which key is better ? Is "-o obd_uuid" used somewhere ?
Comment by Andreas Dilger [ 21/Jun/19 ]

You could just stil with the long option "--pool".

Comment by Sergey Cheremencev [ 24/Jun/19 ]

what about the 1st question ? Should lfs quota without key "–pool" show information for all existing pools ?

Comment by Andreas Dilger [ 25/Jun/19 ]

I'm not really an expert in the quota code, but as long as this does not repeatedly list the OSTs, I think this would be OK. My concern would be if the output becomes too verbose, or if there isn't a way to limit the information to a specific quota type (maybe "--pool=none" to avoid the pool output)?

Comment by Gerrit Updater [ 25/Jul/19 ]

Sergey Cheremencev (c17829@cray.com) uploaded a new patch: https://review.whamcloud.com/35615
Subject: LU-11023 quota: quota pools for OSTs
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 2d983ab779d203d73b01f132cb991253855af51a

Comment by Gerrit Updater [ 09/Aug/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34389/
Subject: LU-11023 quota: remove quota pool ID
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: f6819c90c8532e017646c8173337a9c92250e60f

Comment by Andreas Dilger [ 12/Dec/19 ]

Since this work is already nearing completion, I'm wondering if there are additional developments in this area that you will pursue:

  • MDT pools and pool quotas? This is becoming increasingly important for limiting DoM space usage
  • integrating quota with OST object allocation on the MDS. It doesn't make sense to allocate objects on OSTs for which the user has no space. With SEL (and PFL when LU-13058 is landed) it would be possible to skip intermediate components on pools for which the user has no quota.
Comment by Sergey Cheremencev [ 16/Dec/19 ]

MDT pools and pool quotas? This is becoming increasingly important for limiting DoM space usage

From my side I did all things to make the process of implementing MDT quota pools simple as possible. MDT pools look like a distinct feature. Suggest to discuss it in a another ticket. Possibly we can implement MDT pools only for DOM. Anyway I believe Cray is interesting to have pool quotas on MDT pools and I will have opportunity(need to get approvement from management) to be involved in this development process. Let's start discussing!

integrating quota with OST object allocation on the MDS. It doesn't make sense to allocate objects on OSTs for which the user has no space. With SEL (and PFL when LU-13058 is landed) it would be possible to skip intermediate components on pools for which the user has no quota.

The key thing here is to provide quota pools state for usr/grp/prj to LOD layer. If OST belongs to a pool, LOD could ask QMT - does this user has quota at the pool. It looks like we need just to find lqe from global pool(qmt_pool_lqe_lookup(env, qmt, pooltype, qtype, id, NULL)) and check each entry in lqe global array for edquot. So if it is a simple patch, I can help to implement this.
But current patch is pretty big and I'd like to make this simple and small as much as possible. So I am voting to do this at another ticket when QP will be ready.

Comment by Sergey Cheremencev [ 30/Mar/20 ]

I am stuck with investigation of sanity-quota_69 failure. It fails only on configuration with 8 OSTs, 4 OSTs and 2 clients(review-dne-part-4).
The test fails with timeout after 423 minutes from the beginning. Thus I haven't needed logs that should relate to the several minutes after test's start. Is it possible to restart this test with reduced timeout to capture needed period? I propose to set it to 4 minutes.

Comment by Cory Spitz [ 30/Mar/20 ]

mdiep, I heard that you might be able to assist with Sergey's request. Can you?

Comment by Andreas Dilger [ 30/Mar/20 ]

sergey you could add debugging to the test script in your patch to dump the debug logs sooner (e.g. a background thread that calls "lctl dk /tmp/lustre-log-$(date +%s).log" every 5s for some time). I believe that Maloo will attach all "/tmp/*.log" files to the test results.

Comment by Minh Diep [ 30/Mar/20 ]

spitzcor, I am not sure what you're taking about.

Comment by Sergey Cheremencev [ 01/Apr/20 ]

sergey you could add debugging to the test script in your patch to dump the debug logs sooner

 I tried this approach but didn't get success. The latest sanity-quota_69 failure doesn't contain any debug logs I saved at tmp with name "$TMP/lustre-log-client-$(date +%s).log". Probably it should be similar with "sanity-quota.test_69.test_log.onyx-49vm1.log" ? If no, please advice another way.

Thanks.

Comment by Andreas Dilger [ 01/Apr/20 ]

Poking around a bit further, I see that lustre/tests/auster is uploading all of the logs from its $LOGDIR, and within test-framework.sh the generate_logname() function is using $LOGDIR/$TESTSUITE.$TESTNAME.$1.<hostname>.log for the individual logfiles. It looks like you could use "lctl dk $(generate_logname $(date +%s))" to dump the logs (similar to what gather_logs() does if an error is hit) and then they will be uploaded.

James, Minh, Charlie, please correct me if the above is not corrent for log files to be included into the Maloo report for a test session.

Comment by Sergey Cheremencev [ 02/Apr/20 ]

adilger, thank you for advice. However my last attempt when I used generate_logname also failed. The reason is not finally clear for me. At first look it doesn't relate to my patch - crash dump doesn't consist the reason of panic:

crash> dmesg | tail -n 2
[ 1593.570869] Lustre: lustre-OST0001-osc-ffff8800a60bc800: disconnect after 21s idle
[ 1593.573338] Lustre: Skipped 19 previous similar messages
crash> sys | grep PANIC
       PANIC: "" 

On the other side it is occurred in sanity-quota_69 when it calls lctl dk - https://testing-archive.whamcloud.com/gerrit-janitor/7821/results.html

Can someone assist me here ?

Comment by Andreas Dilger [ 02/Apr/20 ]

Looking earlier in the test logs, I see a few other stack traces in the oleg308-server-console.txt from a special test run for this patch:

[ 4326.625102] WARNING: CPU: 2 PID: 3431 at fs/proc/generic.c:399 proc_register+0x94/0xb0
[ 4326.627740] proc_dir_entry 'lustre-QMT0000/dt-qpool1' already registered
[ 4326.640806] CPU: 2 PID: 3431 Comm: llog_process_th Kdump: loaded Tainted: P        W  OE  ------------   3.10.0-7.7-debug #1
[ 4326.644194] Call Trace:
[ 4326.644610]  [<ffffffff817d1711>] dump_stack+0x19/0x1b
[ 4326.645525]  [<ffffffff8108ba58>] __warn+0xd8/0x100
[ 4326.646338]  [<ffffffff8108badf>] warn_slowpath_fmt+0x5f/0x80
[ 4326.649833]  [<ffffffff812c2434>] proc_register+0x94/0xb0
[ 4326.650741]  [<ffffffff812c2576>] proc_mkdir_data+0x66/0xa0
[ 4326.651683]  [<ffffffff812c25e5>] proc_mkdir+0x15/0x20
[ 4326.652710]  [<ffffffffa0315374>] lprocfs_register+0x24/0x80 [obdclass]
[ 4326.653941]  [<ffffffffa0aa2385>] qmt_pool_alloc+0x175/0x570 [lquota]
[ 4326.655347]  [<ffffffffa0aa74a4>] qmt_pool_new+0x224/0x4d0 [lquota]
[ 4326.656901]  [<ffffffffa032c83b>] class_process_config+0x22eb/0x2ee0 [obdclass]
[ 4326.660700]  [<ffffffffa032eec9>] class_config_llog_handler+0x819/0x14b0 [obdclass]
[ 4326.662767]  [<ffffffffa02f2582>] llog_process_thread+0x7d2/0x1a20 [obdclass]
[ 4326.665703]  [<ffffffffa02f4292>] llog_process_thread_daemonize+0xa2/0xe0 [obdclass]
[ 4326.676370] LustreError: 3431:0:(qmt_pool.c:208:qmt_pool_alloc()) lustre-QMT0000: failed to create proc entry for pool dt-qpool1 (-12)
[ 4326.680007] LustreError: 3431:0:(qmt_pool.c:935:qmt_pool_new()) Can't alloc pool qpool1
[ 4336.217899] LustreError: 3774:0:(qmt_pool.c:1343:qmt_pool_add_rem()) Can't add to lustre-OST0001_UUID pool qpool1, err -17
[ 4336.223934] LustreError: 3774:0:(qmt_pool.c:1343:qmt_pool_add_rem()) Skipped 5 previous similar messages

so it may be that the code tries to register this same proc entry multiple times, and then crashes during cleanup when it is freed multiple times?

Comment by Sergey Cheremencev [ 03/Apr/20 ]

so it may be that the code tries to register this same proc entry multiple times, and then crashes during cleanup when it is freed multiple times?

In such case I'd expect to see the reason of failure in crash dump, smth like "BUG: unable to handle kernel NULL pointer".

Anyway the reason is clear - I lost "dk" in my script causing timeout error:

do_facet mds1 $LCTL > $(generate_logname $(date +%s)) 
Comment by Gerrit Updater [ 14/May/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35615/
Subject: LU-11023 quota: quota pools for OSTs
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 09f9fb3211cd998c87e26df5217cc4ad84e6ce0b

Comment by Peter Jones [ 14/May/20 ]

Landed for 2.14

Comment by Cory Spitz [ 14/May/20 ]

pjones and adilger, can we rename this ticket from "Add OST/MDT pool quota feature" to "OST Quota Pools"? The landed code doesn't include MDT pools and it is probably better to say OST pool quotas because we have user quotas, project quotas and pool quotas, not quota pools.

Comment by Peter Jones [ 14/May/20 ]

I agree that this is more clear as to what is being provided in 2.14. Thanks for your attention to detail on this!

Comment by Cory Spitz [ 04/Sep/20 ]

pjones, I'm afraid I didn't have the proper attention to detail after all!
I said to rename it to "OST Quota Pools" above, but I also said, "probably better to say OST pool quotas because we have user quotas, project quotas and pool quotas, not quota pools."
I'm sorry about the confusion. Let's call it "OST Pool Quotas" per that rational.

Comment by Gerrit Updater [ 08/Oct/20 ]

Sergey Cheremencev (sergey.cheremencev@hpe.com) uploaded a new patch: https://review.whamcloud.com/40175
Subject: LU-11023 tests: test quota interop
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 760d9be975dca2370a0cda558289818868c801c0

Comment by Sergey Cheremencev [ 14/Oct/20 ]

There is no special ticket about Pool Quotas testing results.
Thus leaving a link to test report here - https://wiki.lustre.org/OST_Pool_Quotas_Test_Report.

Generated at Sat Feb 10 02:40:16 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.