Details
-
Improvement
-
Resolution: Unresolved
-
Minor
-
None
-
None
-
3
-
9223372036854775807
Description
When migrating files from existing OSTs to newly-added OSTs, the "lfs find ... --skip=PCT" parameter can be used to print the requested percentage of all filenames found, rather than print every filename. That allows, say, 50% of existing files in the scanned directory tree to be migrated to the new OSTs (if the filesystem capacity was doubled), instead of migrating 100% of files from the first directory trees scanned (until the target OSTs are in balance), and 0% of files from remaining directories.
However, specifying the "--skip=PCT" percentage isn't ideal, since returning only a fraction of filenames may not balance the total space usage (e.g. if the other "lfs find" parameters exclude files that make up a large fraction of space. The "--skip=PCT" parameter is only a high-level directive, and would not work optimally if the existing OSTs are themselves imbalanced, so that more files should be migrated from some existing OSTs vs. other OSTs (i.e. it may not necessarily produce the desired outcome).
A further enhancement instead of specifying the "--skip=PCT" percentage directly (which the user would have to calculate themselves) would be "lfs find --skip-rebalance ..." that internally calculates the actual fraction/capacity of files that need to be skipped based on the fullness ratio of each OST vs. the average fullness of the whole filesystem.
The target would be for each OST to have approximately the same amount of free space at the end, and not necessarily be the same percentage full (though care must be taken with OST pools, where each pool's OST size and free space may be distinctly different). As a first approximation, calculate the ost_target_free = total_free_space / ost_count in the filesystem (or OST pool) to determine the target free space for each OST (in the pool). Then, lfs find can internally calculate the ratio/capacity of files on each OST with less than average free space to be migrated, and leave it up to "lfs migrate" and the MDS to balance the allocations across OSTs.
The "lfs migrate" command is itself dependent on the MDS OST object allocation to balance usage across OSTs. If the OSTs are imbalanced (over mdt.*.qos_threshold_rr=17% difference in free space between least and most full OSTs), then the MDS will itself be taking the free space of the OSTs into account. That is expected and mostly desired, since we want new OST objects to be allocated preferentially on OSTs with more space, but it will also proportionately allocate some migrated files on OSTs with less space.
The "lfs find --skip-rebalance" code should take this into account, and skip fewer files than the simple ratio of files needed to migrate for a few reasons:
- MDS OST balancing is targeted at filling all OSTs evenly when they reach 100% full, which will not be the case here
- being able to find candidate files to migrate sooner is (somewhat) better than later
- the other lfs find options may be skipping some fraction of files themselves, so may not result in a balanced system even if all the desired percentage of requested files are migrated
For each OSTs, based on the OST space usage calculate ost_free = ost_total - ost_used and print some fraction of files when ost_free < ost_target_free, and not print files when ost_free >= ost_target_free. The ratio of files to print can be calculated with ost_target_used = ost_total - ost_target_free and ost_ratio >= (ost_used - ost_target_used) / ost_used. In other words, if "ost_target_used == ost_used" then 0% of files are printed, and if ost_target_used == 0 (i.e. goal is to empty OST) then 100% of files are printed.
These calculations and decisions presuppose files only striped over a single OST, but most large files will be striped across multiple OSTs, so some combined decision is needed. If a majority of stripes are on "full" OSTs then the file is a good candidate to be migrated, but some combined percentage of the per-OST ratios would be needed to decide whether it should be skipped or not. Files with most stripes on "barely over target" OSTs should be printed less often than files on "way over target" OSTs.
The "lfs find" command should periodically monitor the OST free space balance (e.g. every 60s) to recalculate the targets and ensure the migration is working as intended, compensating for imbalances in file size, stripe allocation, and other concurrent usage of the filesystem. This will allow it to (dis)proportionately increase or decrease the skip rate for each OST in order to achieve the target balance. Once the OST free space delta is below qos_threshold_rr then "lfs find --skip-rebalance" could stop printing files and exit. If run independently of "lfs migrate" (rather than in a pipeline) then it would need to scan the whole directory tree to return candidate files.
The "--skip-rebalance" option should also consider filesystems with different OST pools. If an OST with more than expected free space is in one pool, say "flash", then migrating files not using that pool would not affect the space balance of the pool, and should be skipped (unless there are free space imbalance issues for other pools as well). If "--pool POOL" is one of the command-line options, then it should only consider files and the balance of OSTs in that pool, otherwise it should consider the balance of each pool separately (though consider some files may be in multiple pools...). Initially, if there are multiple pools in the filesystem, the command should print an error listing available pools and request that "--pool" is used to balance only a single pool at one time.
Being able to migrate individual objects from files (LU-9961) from full OSTs to empty OSTs would minimize data movement needed to achieve balance and avoid migrating some files onto existing OSTs, and could simplify handling since a decision could be made on each file stripe individually. There is some chance that this would be implemented as part of the FLR-EC project (LU-10911), but should not be considered as a prerequisite for this change.
Attachments
Issue Links
- Clones
-
LU-17699 add 'lfs find' parameter to return only a fraction of files for rebalancing
-
- Resolved
-