Data-on-MDT phase II
(LU-10176)
|
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.11.0 |
| Fix Version/s: | Lustre 2.12.0 |
| Type: | Technical task | Priority: | Minor |
| Reporter: | Andreas Dilger | Assignee: | Mikhail Pershin |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | DoM2 | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||||||
| Description |
|
If the DoM component extent_end is set larger than the MDT dom_stripesize (via lfs setstripe -Eextent_end -L mdt ) then this currently generates an error. Since users do not have any easy way to determine the dom_stripesize on the MDT, and it may in fact be different on a per-MDT basis (e.g. if MDT0000 sets dom_stripesize=0 because it was formatted before DoM, or it is adjusted automatically by the MDS when the MDT is nearly full). This complicates DoM usage for users (imagine a striped directory that has different DoM size limits for the MDTs it is striped over). The MDS should automatically adjust the component extent_end to match the MDT dom_stripesize, and if dom_stripesize=0 then the DoM component should be removed. |
| Comments |
| Comment by Rahul Deshmukh (Inactive) [ 17/Apr/18 ] |
|
Just curious here and hence few questions > The MDS should automatically adjust the component extent_end to match the MDT dom_stripesize, Does this mean that if mdt component size value greater than dom_stripesize also if dom_stripesize=0 then even if user mention DoM component it wont be considered and It seems we are doing kind of similar stuff (in lod_fix_desc_stripe_size()), while we are setting dom_stripesize value. i.e. even if the dom_stripesize is set to value less than LOV_MIN_STRIPE_SIZE, then it is set to default |
| Comment by Andreas Dilger [ 17/Apr/18 ] |
|
Also needing consideration here is that DoM components should be skipped entirely if there is no (or very little) space on the MDT. It might make sense to shrink dom_stripesize as the MDT begins to get full, but at least a simple fix is to just go directly to zero once the MDT free space is below some threshold. |
| Comment by Andreas Dilger [ 17/Apr/18 ] |
|
I tested out setting dom_maxstripesize and if this is less than LOV_MIN_STRIPE_SIZE sets it to the default stripe size, rather than the minimum: void lod_fix_desc_stripe_size(__u64 *val)
{
if (*val < LOV_MIN_STRIPE_SIZE) {
if (*val != 0)
LCONSOLE_INFO("Increasing default stripe size to "
"minimum value %u\n",
LOV_DESC_STRIPE_SIZE_DEFAULT);
*val = LOV_DESC_STRIPE_SIZE_DEFAULT;
} else if (*val & (LOV_MIN_STRIPE_SIZE - 1)) {
*val &= ~(LOV_MIN_STRIPE_SIZE - 1);
LCONSOLE_WARN("Changing default stripe size to %llu (a "
"multiple of %u)\n",
*val, LOV_MIN_STRIPE_SIZE);
}
}
Since this is also used for regular file striping (typically with stripe_size=0 to indicate the default) then it may be difficult to change this behaviour. One option is to pass in the "default" value from the caller, so that LOV can use LOV_DESC_STRIPE_SIZE_DEFAULT but DoM can use LOV_MIN_STRIPE_SIZE. Alternately, lod_dom_stripesize_seq_write() could check for a value below LOV_MIN_STRIPE_SIZE and return an error code to the caller. |
| Comment by Mikhail Pershin [ 19/Apr/18 ] |
|
yes, I found that too and already fixed this in a patch, it sets limit to the minimum rather than default value. |
| Comment by Gerrit Updater [ 19/Apr/18 ] |
|
Mike Pershin (mike.pershin@intel.com) uploaded a new patch: https://review.whamcloud.com/32073 |
| Comment by Mikhail Pershin [ 22/May/18 ] |
|
Andreas wrote in patch comments:
Put it here to don't forget and prepare workaround in later patches |
| Comment by Gerrit Updater [ 22/May/18 ] |
|
Mike Pershin (mike.pershin@intel.com) uploaded a new patch: https://review.whamcloud.com/32482 |
| Comment by Patrick Farrell (Inactive) [ 22/May/18 ] |
|
This question has likely been answered elsewhere, but yes, it's possible to know if a layout is a default layout. At file creation time, there's a moment where you inherit from a template. But after that there's no way to know, currently. It seems like your suggestion is if any MDT in the set is low on space, then remove the DOM component from the layout (presumably at file creation time rather than permanently, since ideally out of space is transient). So we stop doing DOM entirely because we can't skip that one MDT. That will require some interrogation of the MDTs but it should be possible, if a little awkward. An alternate possibility occurs to me, sort of adjacent to the self-extending PFL work and exploiting similar ideas. DOM is always the first component & so is always instantiated. But we could do the "stripe to target, then check space on target before really making objects" trick I'm using in the SEPFL work (posting soon, sorry). If there's insufficient space on a particular MDT, we could either try another MDT (...not so sure about this one...) or we could rewrite the layout dynamically to remove the DoM component, assuming it's followed by a regular OST component. (Which it doesn't have to be.) If we did the "try another MDT" solution, we'd either need to try all MDTs and fail once they were out of space, or we'd still have to do the "rewrite layout to replace DoM with regular OST layout" thing. Just stop doing DoM once one MDT is full does seem like the easiest (and perhaps most comprehensible) behavior. But there's another route if we wanted to take it. |
| Comment by Andreas Dilger [ 25/May/18 ] |
|
I'm not suggesting to disable DoM permanently if one MDT is full, only on a per-file basis. Firstly, MDT is the one that selects the file layout and should decide if it is full, and this can be done on a per-MDT basis. Secondly, the hash of the filename is what decides which MDT is used, so we can't try different MDTs to find space for the DoM data. |
| Comment by Gerrit Updater [ 07/Jun/18 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/32073/ |
| Comment by Gerrit Updater [ 07/Jun/18 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/32482/ |
| Comment by Peter Jones [ 22/Aug/18 ] |
|
Seems like all patches have landed for 2.12 |