Details
-
Improvement
-
Resolution: Unresolved
-
Major
-
None
-
Lustre 2.14.0, Lustre 2.12.5
-
None
-
3
-
9223372036854775807
Description
This is a simple extension to the SEL functionality for PFL files, originally suggested by Andreas.
One of the features of SEL is that if there is an intermediate layout component (Consider a PFL layout like: DOM->SSD->HDD, in this case SSD is the intermediate component), it will not be instantiated if that tier is low on space, instead, the next component (HDD) is extended downwards. This allows us to skip the SSD tier if it's full.
This is a nice feature, and there's no particular reason it has to be limited to SEL layouts. It's easy to do this for normal layouts, where the SSD component has a normally defined length.
So, this patch adds that functionality. The canonical case is a DOM->SSD->HDD layout where the SSD tier is low on space (or even out of space entirely). Currently, when the first write happens to the SSD component, it's simply instantiated. If there is absolutely no space, an error results. With this feature, in the low on space condition*, that intermediate component is removed.
*the same low on space condition as used in SEL, basically if one of the chosen OSTs is below the threshold value for striping. The stripe allocator will only stripe to these OSTs in absence of a better choice, so this indicates we're very low on space.
There is one detail: The SEL code uses the "extension size" as a way to estimate how much space this component might use, so it's factored in to the "low on space" calculation. There is no obvious substitute for this with a regular file, which leads to two options:
1. Act like the file will consume (effectively) zero space and only act if the OSTs are already low on space
2. Pick some amount of data to assume it will use - The most logical guess seems to be a multiple of stripe size, but perhaps an absolute value would be better, as stripe sizes can vary widely.
It's not clear that 1 isn't fine, and in either case, this is just an optimization.
Patch forthcoming.
Attachments
Issue Links
- is related to
-
LU-11023 OST Pool Quotas
-
- Resolved
-
-
LU-16857 OST object allocation should not select OSTs/pools where quota is exceeded
-
- Open
-
-
LU-12785 DOM2: dynamic DoM component size as MDT becomes full
-
- Resolved
-
- is related to
-
LU-10070 PFL self-extending file layout
-
- Resolved
-
-
LU-11918 Allow setting default file layout on root directory at mkfs time
-
- Open
-
-
LU-15011 implement lod pool spilling
-
- Resolved
-
I agree, the only downside is that it seems like it would require a little bit of plumbing - Handling quotas was rejected as part of the SEL work (at least initially) for that reason.
Although now that I think about it, my position at the time (in the design discussions within Cray) was based on the idea of integrating quota levels in to the stripe allocator decisions, which really would be kind of terrible.
But if we assume that quota pools and OST tiers are arranged sanely (ie, the pools used for quota match up with the pools used in the layout/tiering), which I think is fair (since things won't break if they are not - it will just give suboptimal behavior), then we could just make quota checking part of the "are these selected OSTs OK" step*, since the quota itself is split across the OSTs evenly.
*ie, when we check the OSTs selected by the stripe allocator to verify space levels
It's not quite as good as integrating quota in to the stripe allocator decisions, but that would be a huge amount of work and I think definitely overkill.
So, yeah, that would be manageable I think. Just some extra plumbing to check quotas from the LOD context.
But as you noted, pool quotas required first.