[LU-11571] MDT pool Created: 25/Oct/18 Updated: 08/Feb/24 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major |
| Reporter: | CEA | Assignee: | Peter Jones |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||||||
| Description |
|
I brought the question up at JLUG2018, and cannot find any ticket for this; somewhere to discuss (and for others to show interest) might be good. Please mark as duplicate if I missed another one.
Basically, the way we're using DNE v1 right now is mostly to isolate populations: users we have little control over get their own MDT where if they cause problems only impact themselves, and the rest gets MDT0/other MDTs with hardcoded repartition.
With DoM we'll want to add more MDTs, yet still isolate populations. It would be good if we could shard/round-robin directory creations on a set of MDTs definted per directory.
One open question was "shall this be the same as the OST pool, or another one"? For our use case, just adding the MDTs to the pool would work (we also separate populations per pool) In the worst case if someone needs something independent they'd need to create a cross product of mdt/ost pools, which isn't great but could kind of work at small scale... I can't think of much usage to have these separately though. (I actually need to look at how PFL and pools interact, been out of touch; if we can have multiple pools set this way having a separate one for MDTs could just work?) A simple rule of "no MDT set in the pool = any MDT" and similar for OST would probably be sound enough...
Any other open questions? I'm not going to offer to look at the code for this yet, given how slow I am with the file create lock LU... Maybe once I'm done here and can free more time
Thanks! |
| Comments |
| Comment by Peter Jones [ 25/Oct/18 ] |
|
Thanks martinetd. Having a Jira ticket to collect input from interested parties is definitely an important precursor to any detailed design work/development. |
| Comment by Andreas Dilger [ 13/Jun/21 ] |
|
sergey, are there any plans at HPE to implement MDT pools for 2.15, now that the OST pool quotas work is landed? |
| Comment by Andreas Dilger [ 24/Jan/23 ] |
|
Just a quick notes here for whomever works on this in the future. There is already some basic infrastructure for MDT pools in the code (in particular, struct lmv_user_md has a "lum_pool_name" field already), but it is unused. Some other code like lod_pool_qos_penalties_calc() would need to be updated to include available inodes in the weighting for MDT pools. |
| Comment by Andreas Dilger [ 16/Aug/23 ] |
|
There is increasing demand for > 40 MDTs in a single filesystem, and implementing MDT pools will at least partially segregate MDTs from each other during recovery, so that there is less chance of something going wrong when there are N^2 MDT recoveries happening simultaneously. |