Data-on-MDT phase II (LU-10176)

[LU-10808] DoM: component end should align with dom_stripesize Created: 12/Mar/18  Updated: 05/Feb/20  Resolved: 22/Aug/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.11.0
Fix Version/s: Lustre 2.12.0

Type: Technical task Priority: Minor
Reporter: Andreas Dilger Assignee: Mikhail Pershin
Resolution: Fixed Votes: 0
Labels: DoM2

Issue Links:
Related
is related to LU-10465 increase default stripe size to 4MB Resolved
is related to LU-10786 sanity-flr test_45: Create /mnt/lustr... Resolved
is related to LU-10917 If setstripe for DoM file fails, less... Resolved
is related to LU-11608 DoM2: inherited DoM component size is... Closed
is related to LU-10070 PFL self-extending file layout Resolved
Rank (Obsolete): 9223372036854775807

 Description   

If the DoM component extent_end is set larger than the MDT dom_stripesize (via lfs setstripe -Eextent_end -L mdt ) then this currently generates an error.  Since users do not have any easy way to determine the dom_stripesize on the MDT, and it may in fact be different on a per-MDT basis (e.g. if MDT0000 sets dom_stripesize=0 because it was formatted before DoM, or it is adjusted automatically by the MDS when the MDT is nearly full).  This complicates DoM usage for users (imagine a striped directory that has different DoM size limits for the MDTs it is striped over).

The MDS should automatically adjust the component extent_end to match the MDT dom_stripesize, and if dom_stripesize=0 then the DoM component should be removed.



 Comments   
Comment by Rahul Deshmukh (Inactive) [ 17/Apr/18 ]

 Just curious here and hence few questions

> The MDS should automatically adjust the component extent_end to match the MDT dom_stripesize,
> and if dom_stripesize=0 then the DoM component should be removed.

Does this mean that if mdt component size value greater than dom_stripesize
then it will be set to dom_stripesize

also if dom_stripesize=0 then even if user mention DoM component it wont be considered and
only PFL will be considered ?

It seems we are doing kind of similar stuff (in lod_fix_desc_stripe_size()), while we are setting dom_stripesize value.

i.e. even if the dom_stripesize is set to value less than LOV_MIN_STRIPE_SIZE, then it is set to default
value and if try to set value say 65K then it will be set as 64K as multiple of LOV_MIN_STRIPE_SIZE

Comment by Andreas Dilger [ 17/Apr/18 ]

Also needing consideration here is that DoM components should be skipped entirely if there is no (or very little) space on the MDT. It might make sense to shrink dom_stripesize as the MDT begins to get full, but at least a simple fix is to just go directly to zero once the MDT free space is below some threshold.

Comment by Andreas Dilger [ 17/Apr/18 ]

I tested out setting dom_maxstripesize and if this is less than LOV_MIN_STRIPE_SIZE sets it to the default stripe size, rather than the minimum:

void lod_fix_desc_stripe_size(__u64 *val)
{               
        if (*val < LOV_MIN_STRIPE_SIZE) {
                if (*val != 0)
                        LCONSOLE_INFO("Increasing default stripe size to "
                                      "minimum value %u\n",
                                      LOV_DESC_STRIPE_SIZE_DEFAULT);
                *val = LOV_DESC_STRIPE_SIZE_DEFAULT;
        } else if (*val & (LOV_MIN_STRIPE_SIZE - 1)) {
                *val &= ~(LOV_MIN_STRIPE_SIZE - 1);
                LCONSOLE_WARN("Changing default stripe size to %llu (a "
                              "multiple of %u)\n",
                              *val, LOV_MIN_STRIPE_SIZE);
        }
}

Since this is also used for regular file striping (typically with stripe_size=0 to indicate the default) then it may be difficult to change this behaviour. One option is to pass in the "default" value from the caller, so that LOV can use LOV_DESC_STRIPE_SIZE_DEFAULT but DoM can use LOV_MIN_STRIPE_SIZE. Alternately, lod_dom_stripesize_seq_write() could check for a value below LOV_MIN_STRIPE_SIZE and return an error code to the caller.

Comment by Mikhail Pershin [ 19/Apr/18 ]

yes, I found that too and already fixed this in a patch, it sets limit to the minimum rather than default value.

Comment by Gerrit Updater [ 19/Apr/18 ]

Mike Pershin (mike.pershin@intel.com) uploaded a new patch: https://review.whamcloud.com/32073
Subject: LU-10808 lod: align wrong DoM stripe values with defaults
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 33206489d8b3252d96ca4d23d7774d54d8b085fc

Comment by Mikhail Pershin [ 22/May/18 ]

Andreas wrote in patch comments:

What happens if we have a 16-way striped directory with a default layout like:

lfs setstripe -E 1M -L mdt -E 64M -c 1 -E -1 -c 128 /dir

and then one of the MDTs runs out of space? Ideally, it wouldn't run out of space (the MDS should reduce lod_dom_max_stripesize automatically as it gets full so the DoM components get smaller) but it will eventually run out of space. In case like this, we don't really want 1/16 of the files to return -EFBIG.

There is no way (currently) to disable file creation on a single MDT in the striped directory, so we would want to remove the DoM component from the file to reduce MDT space usage rather than failing file creation randomly based on the filename.

Is it possible to check if the layout is a default layout, or explicitly set by a user? It would be confusing if the user specifies a DoM component and it just silently disappears, but this is somewhat less of an issue if it is coming from a default setting (parent directory or filesystem root).

Put it here to don't forget and prepare workaround in later patches

Comment by Gerrit Updater [ 22/May/18 ]

Mike Pershin (mike.pershin@intel.com) uploaded a new patch: https://review.whamcloud.com/32482
Subject: LU-10808 lod: remove DoM component is disabled
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 6b1cc1f9b42e8cacab90099c9c56f0ab96d9a471

Comment by Patrick Farrell (Inactive) [ 22/May/18 ]

This question has likely been answered elsewhere, but yes, it's possible to know if a layout is a default layout.  At file creation time, there's a moment where you inherit from a template.  But after that there's no way to know, currently.

It seems like your suggestion is if any MDT in the set is low on space, then remove the DOM component from the layout (presumably at file creation time rather than permanently, since ideally out of space is transient).  So we stop doing DOM entirely because we can't skip that one MDT.  That will require some interrogation of the MDTs but it should be possible, if a little awkward.

An alternate possibility occurs to me, sort of adjacent to the self-extending PFL work and exploiting similar ideas.  DOM is always the first component & so is always instantiated.  But we could do the "stripe to target, then check space on target before really making objects" trick I'm using in the SEPFL work (posting soon, sorry).

If there's insufficient space on a particular MDT, we could either try another MDT (...not so sure about this one...) or we could rewrite the layout dynamically to remove the DoM component, assuming it's followed by a regular OST component.  (Which it doesn't have to be.)

If we did the "try another MDT" solution, we'd either need to try all MDTs and fail once they were out of space, or we'd still have to do the "rewrite layout to replace DoM with regular OST layout" thing.

Just stop doing DoM once one MDT is full does seem like the easiest (and perhaps most comprehensible) behavior.  But there's another route if we wanted to take it.

Comment by Andreas Dilger [ 25/May/18 ]

I'm not suggesting to disable DoM permanently if one MDT is full, only on a per-file basis. Firstly, MDT is the one that selects the file layout and should decide if it is full, and this can be done on a per-MDT basis.

Secondly, the hash of the filename is what decides which MDT is used, so we can't try different MDTs to find space for the DoM data.

Comment by Gerrit Updater [ 07/Jun/18 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/32073/
Subject: LU-10808 lod: align wrong DoM stripe values with defaults
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 9146d261f35b394e10afde3eec2d5895425261e0

Comment by Gerrit Updater [ 07/Jun/18 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/32482/
Subject: LU-10808 lod: remove DoM component if DoM is disabled
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 3fd758ed4f40f235d26df9fd7c6b459590fbe0cf

Comment by Peter Jones [ 22/Aug/18 ]

Seems like all patches have landed for 2.12

Generated at Sat Feb 10 02:38:22 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.