[LU-10169] Spillover space Created: 30/Oct/17 Updated: 27/Aug/19 Resolved: 27/Aug/19 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.10.1 |
| Fix Version/s: | None |
| Type: | New Feature | Priority: | Minor |
| Reporter: | Nathan Rutman | Assignee: | Patrick Farrell (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
I had an alternate but somewhat overlapping thought to Spillover space (MRP-4598): ExampleInitially: Runs out of space when file is at 10GB; we add a 2nd replica with a PFL layout Layout v2: complex Benefits
Questions
|
| Comments |
| Comment by Nathan Rutman [ 30/Oct/17 ] |
|
> mark Layer1 as stale. |
| Comment by Andreas Dilger [ 30/Oct/17 ] |
|
I've thought about this issue and possible solutions in the past as well, and can share my ideas here (they may also be somewhere else). I agree that handling the single OST full issue is desirable, but my hope is that PFL will avoid this to a large extent, as would better OST space balancing as described in LU-9 and LU-9809. While it IS possible to modify a layout while it is actively being written by a client, there are some caveats. There is not currently any way for a client to modify an existing component directly. This is done to prevent clients from introducing corruption into the layout (e.g. referencing objects owned by another file/user, or objects that do not exist). Also, until FLR is landed the components must be strictly non-overlapping. Currently the methods to update composite layout are:
I'm not against fixing this issue more directly, but at a minimum we would need a new layout operation to truncate the end of an existing component (@10GB in your example) before adding a new component to cover the rest of the file. That wouldn't be too hard, and would preserve the semantics that clients cannot manipulate layouts directly. The next problem is where to truncate the original layout? There is no guarantee that the object on the full OST will have a nice size like 1GB, and currently there is a requirement that layouts must have sizes that are a multiple of 64KB. That implied we need to truncate the full object at the nearest multiple of 64KB, since we can't write more data to that OST, and write the remainder to the new component. Not a huge deal for < 64KB of data, but the one full stripe is not the largest issue. The final issue is that the other OSTs the remaining stripes are on are presumably not full, so they may have continued being written before the client noticed one OST is full, and the file could be written from many other clients. That means potentially multiple GB of potentially sparse data that needs to be copied over to the new component atomically before the original layout is truncated and a new component is added. Taking this to the extreme, even if we had a layout that had a "ragged" starting offset to handle the different-sized objects, there would still be the issue of holes in the original component that could not be filled, if the file was not being written linearly from start to end. While linear write is the most common case, there would definitely be times where that wasn't true, so even very complex solutions (which I would be against) wouldn't solve all cases. That said, if this could be fixed for the common single client linear writing case (i.e. truncate existing layout, add new component, copy a small amount of data that was truncated off), it would not be worse than what we have today. This could be simplified further if the layout was changed before an OST was totally full, which would essentially become a form of self-extending PFL layout in the end. An ounce of prevention in the form of PFL and not filling OSTs to 100% in the first place is worth a pound of cure. |
| Comment by Nathan Rutman [ 31/Oct/17 ] |
yes - this is why I was suggesting something like watching grant to trigger the layout change before an actual ENOSPC. If all writers can't flush at this point, it's ENOSPC and give up. But we can avoid those cases just by more aggressively changing layout at say 95% full or something. We would truncate the layout at the furthest written extent, rounded up to something nice (say a full stripe size), again assuming we left ourselves plenty of spare room on each OST that hosts a stripe. That way we don't have to re-write or copy anything. Holes are perfectly fine – this will become one component of a PFL, and subsequent writes can fill in those holes if they want (since we left ourselves extra space). (Sure, you could come up with a sparse file scenario where this breaks down, but in those cases we just return ENOSPC as today.)
neither can be addressed with a static layout determined at file create time. Eg. someone creates a tiered PFL on flash/disk OSTs with plenty of room, then someone else fills all the flash drives with a checkpoint. |
| Comment by Nathan Rutman [ 29/Jan/18 ] |
|
Not handled as well by 10070:
Not handled as well by 10169:
|
| Comment by Patrick Farrell (Inactive) [ 27/Aug/19 ] |
|
This was implemented in |