[LU-11022] FLR1.5: "lfs mirror" usability for Burst Buffer Created: 16/May/18 Updated: 04/Dec/23 Resolved: 09/Nov/19 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.13.0 |
| Type: | Improvement | Priority: | Minor |
| Reporter: | Andreas Dilger | Assignee: | Andreas Dilger |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | FLR2 | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||
| Sub-Tasks: |
|
||||||||||||||||||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||||||||||||||||||
| Description |
|
I've been going through a simple exercise for how to use FLR to mirror/unmirror files for a burst-buffer application. The workflow would be something like:
[*] We don't have any way to prevent users from using a pool if they want to. We need some kind of OST/pool quota to limit the amount of space a user can consume on a given OST/pool. It might be desirable to allow privileged users (e.g. job scheduler) to still create files on an OST/pool, even if it exceeds the user's quota, so that they can stage files there. The #1 item is not immediately in my control. I was trying out what commands would be used for #2. The obvious choice is lfs mirror extend -N<copies> /path/to/file, but one problem I see with this is that "-N<copies>" means add <copies> mirrors to the file, rather than make the number of mirrors = <copies>. This is problematic, since lfs mirror extend will keep on adding mirrors to the file, even if it already has mirrors >= <copies>, but not an insurmountable problem (the caller needs to use "lfs getstripe -N" to get the current number of mirror copies, then call "lfs mirror extend -N$((copies - current))" in most cases).
For #3 and #4 we would set a default layout on the output directory to create files with DoM + PFL layouts to keep the output files entirely on flash. For #5 we could use a ChangeLog user to follow files from each JobID to do resync (to HDD-based OSTs) in the background as they are closed, but it would make sense to apply a policy for this (e.g. migrate only 1/4 of incremental checkpoints out of the BB). For #6 it would use lfs mirror extend or lfs mirror resync to migrate files specified in the job submission script from the flash OSTs to HDD OSTs. What is difficult is to remove the flash OST replicas afterward. The lfs mirror split command requires specifying an explicit mirror ID, but lfs getstripe has no option to extract the mirror ID for a component. This raises the need for several new options:
|
| Comments |
| Comment by Jinshan Xiong [ 16/May/18 ] |
|
For #2, I think it should check if the corresponding files have already had mirrors on flash OSTs, before it tries to add new mirrors. Checking mirrors should be lightweight in general so I don't think this would be an issue. Anyway, extending file to have a specified number of mirrors is probably less useful in this case, because what we want is to have a mirror on flash OSTs, no matter how many mirrors it already has. For #6, yes if it drops mirrors at the end of job, it won't need database to keep track of files on burst buffer; but I'm not sure if this is the best approach to use burst buffer. If for any reason the jobs need to restart from the last checkpoint, we always want(and most likely) the required data are already inside burst buffer to avoid data transfer from slow HDD OSTs. I agree that file management database will add complexity but it's totally worth it.
The issue with deleting mirrors by flags is that sometimes those flags can't identify a mirror uniquely. lfs utilities are fundamental features, which could be used many features, so we don't want anything special for burst buffer at this level. Yes, there is no options to print mirror id only in lfs getstripe, so it will need a grep to print it out, like: [root@centos tests]# ../utils/lfs getstripe /mnt/lustre/tm | grep lcme_mirror_id *lcme_mirror_id*: 1 *lcme_mirror_id*: 2 I think there should be utilities developed for burst buffer to iterate and check mirror ID by components through liblustreapi, but I don't think this should be a feature that belongs to lfs itself. Only burst buffer knows which OSTs are being identified as burst buffer. I agree on the rest. |
| Comment by Gerrit Updater [ 18/May/18 ] |
|
Bobi Jam (bobijam@hotmail.com) uploaded a new patch: https://review.whamcloud.com/32455 |
| Comment by Gerrit Updater [ 24/Jul/18 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32455/ |
| Comment by Gerrit Updater [ 26/Jun/19 ] |
|
Alex Zhuravlev (bzzz@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35329 |
| Comment by Gerrit Updater [ 27/Jul/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35329/ |
| Comment by Peter Jones [ 27/Jul/19 ] |
|
There are no remaining patches left for this ticket - is this task now complete? |
| Comment by Andreas Dilger [ 29/Jul/19 ] |
It won't be possible to close this issue until the sub-task |
| Comment by Joseph Gmitter (Inactive) [ 15/Aug/19 ] |
|
Perhaps we should break out |
| Comment by Peter Jones [ 10/Sep/19 ] |
|
All work in 2.13 landed |
| Comment by Andreas Dilger [ 14/Sep/19 ] |
|
Reopen this issue since it is a high-level tracker for multiple different issues related to tiered storage in Lustre. |
| Comment by Gerrit Updater [ 16/Sep/19 ] |
|
Jian Yu (yujian@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36194 |
| Comment by Gerrit Updater [ 23/Sep/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36194/ |