[LU-9007] Improved object allocator for FLR composite files Created: 11/Jan/17 Updated: 13/May/22 Resolved: 09/Aug/18 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.12.0 |
| Type: | Improvement | Priority: | Minor |
| Reporter: | Joseph Gmitter (Inactive) | Assignee: | Zhenyu Xu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | FLR2 | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||||||
| Description |
|
The current MDS object allocator is designed only to allocate objects for one file at the time the file is first created. For progressive file layouts, at a minimum the allocator will need to be enhanced in order to avoid allocating objects on OSTs that are already part of a file's other components. If files have multiple objects allocated to the same OSTs before objects are allocated from unused OSTs, there may be a significant performance loss due to oversubscribing the bandwidth on that OST compared to the other OSTs. The only exception may be for a fully-striped component at the end of the file (see Example Progressive Layouts for more detail), where it would be acceptable to allocate objects across all of the available OSTs to maximize the bandwidth available for the file. |
| Comments |
| Comment by Andreas Dilger [ 01/Mar/17 ] |
|
This work is also a pre-requisite for FLR-related improvements to the MDS object allocator. While PFL requires that the objects are preferably not on the same OSTs between components, this is not a hard requirement. At worst this impacts performance, and in some cases (e.g. widely striped last component) it may even be desirable to re-use the same OSTs in order to maximize the bandwidth of large files. FLR has similar, but more specific requirements for OST selection on components with overlapping extents, in order of decreasing priority:
|
| Comment by Jinshan Xiong (Inactive) [ 02/Mar/17 ] |
|
Do we actually distinguish if the components are for generic PFL or FLR? it seems like to be a bad idea to me to know that information at LOD layer. I would like to make this allocation policy as generic and best-effort. Sparse OST index has been supported for a long time. How do you think if we partition OST indices based on the distance? The distance is defined by servers, racks, and switches. Anyway, more information the allocation policy can get the better decision it can make. |
| Comment by Andreas Dilger [ 02/Mar/17 ] |
|
I think encoding anything into the OST index is a non-starter. This would totally break for existing filesystems, and administrators would have a hard time getting it right, and then it would break if they needed to move nodes around for some reason. We already have OSS and OSS failover information in the LOD, so we may as well use it. In fact, the QOS RR allocator already spreads stripes across OSS nodes to avoid contention if possible. We can add in other rack/switch/power information later if we actually need it. I don't think that understanding "PFL" vs. "FLR" in LOD is quite the right thing, but rather it will understand layout components and whether they are sequential of overlapping, and select the best OSTs in that case. |
| Comment by Andreas Dilger [ 03/Apr/18 ] |
|
One simple proposal is to check the NID of each OST to put automatically separate OSTs into fault domains based on which OSS that are located on. This is not perfect, but is simple and works for all existing systems without additional input from the administrator. The existing QOS RR allocator will already prefer to distribute allocations across OSS nodes if possible, but this can be extended to actually and a requirement for all allocations. Secondly, a simple integer domain value can be assigned to each OST by the administrator, and the LOD can use this to separate OSTs into independent groups. OSTs with the same domain number should not be used for redundancy for other components. |
| Comment by Gerrit Updater [ 15/May/18 ] |
|
Bobi Jam (bobijam@hotmail.com) uploaded a new patch: https://review.whamcloud.com/32404 |
| Comment by Gerrit Updater [ 12/Jul/18 ] |
|
Bobi Jam (bobijam@hotmail.com) uploaded a new patch: https://review.whamcloud.com/32813 |
| Comment by Gerrit Updater [ 24/Jul/18 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32404/ |
| Comment by Gerrit Updater [ 09/Aug/18 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32813/ |
| Comment by Peter Jones [ 09/Aug/18 ] |
|
Landed for 2.12 |