[LU-8365] Fix mballoc stream allocator to better use free space at start of drive Created: 04/Jul/16 Updated: 10/Aug/21 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.8.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Lokesh Nagappa Jaliminche (Inactive) | Assignee: | Yang Sheng |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | ldiskfs | ||
| Attachments: |
|
||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||||||
| Description |
|
Provide a mechanism to reset the ldiskfs extents allocation position to near the beginning of a drive |
| Comments |
| Comment by Gerrit Updater [ 04/Jul/16 ] |
|
lokesh.jaliminche (lokesh.jaliminche@seagate.com) uploaded a new patch: http://review.whamcloud.com/21142 |
| Comment by Andreas Dilger [ 17/Sep/16 ] |
|
This patch exposes that mballoc is not doing as good a job in group selection for empty HDDs as it might. Biasing allocations to the start of the disk can improve performance, but only if the start of the disk has free space. Some possibilities to try that may actually fix mballoc, in order of increasing difficulty: |
| Comment by Lokesh Nagappa Jaliminche (Inactive) [ 21/Sep/16 ] |
|
Thanks for the details, working on it. |
| Comment by Andreas Dilger [ 15/Sep/18 ] |
|
Hi Yang Sheng, Please feel free to ask if you have questions. I'd like to have something to look at late next week, if possible. We need to run some benchmarks on real hardware to ensure this is doing the right thing. It would be OK to include the patch https://review.whamcloud.com/21142 for testing/debugging, but I don't consider that a real fix for this issue. In a semi-related area, I also noticed in the current ext4-prealloc.patch while looking at this issue that there is a bug in the code:
/* don't use group allocation for large files */
size = max(size, isize);
+ if ((ac->ac_o_ex.fe_len >= sbi->s_mb_small_req) ||
+ (size >= sbi->s_mb_large_req)) {
ac->ac_flags |= EXT4_MB_STREAM_ALLOC;
return;
}
+ /*
+ * request is so large that we don't care about
+ * streaming - it overweights any possible seek
+ */
+ if (ac->ac_o_ex.fe_len >= sbi->s_mb_large_req)
+ return;
It looks like we can never get to the second condition because fe_len > s_mb_small_req will always be true first. This has been true all the way back to the original version of this patch (commit d8d8fd9192a5. It seems like the s_mb_large_req check should be moved before EXT4_MB_STREAM_ALLOC is set, so that it allows large allocations to behave differently? |
| Comment by Yang Sheng [ 18/Sep/18 ] |
|
Hi, Alex, Looks like the 'stream allocation' has been changed since upstream patch(4ba74d00a2025). Could you please review it whether correct for original purpose. Other question is why we need s_mb_small_req? As i understand, 'stream allocation' would be used while request size less than s_mb_large_req. Then what purpose is s_mb_small_req? Could you give a point for that please? Thanks, |
| Comment by Gerrit Updater [ 18/Sep/18 ] |
|
Yang Sheng (ys@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33195 |
| Comment by Yang Sheng [ 27/Sep/18 ] |
|
Hi, Alex, Could you please give a advice for this patch? 0001-ext4-Fix-bugs-in-mballoc-s-stream-allocation-mode.patch Thanks, |
| Comment by Andreas Dilger [ 28/Sep/18 ] |
|
I'm not sure why you attached the patch here? That is what gerrit is for. |
| Comment by Yang Sheng [ 28/Sep/18 ] |
|
Hi, Andreas, This is the patch has already landed to upstream. I just want to get some input from Alex whether it is correct for stream allocation. Since it changes logic of stream allocation. Thanks, |
| Comment by Gerrit Updater [ 01/Nov/18 ] |
|
Yang Sheng (ys@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33548 |
| Comment by Alexander Zarochentsev [ 26/Feb/19 ] |
|
are there any performance tests for this patch https://review.whamcloud.com/33195 ? |
| Comment by Andreas Dilger [ 26/Feb/19 ] |
|
Ihara had started running some tests on the patch, but I don't recall ever seeing the results. The main goal was to automate the original "manually reset to the start of the disk during benchmarking" behavior under normal usage. In particular, jump back to earlier groups when a bunch of free space becomes available, without having to continually scan the earlier groups for free space. The potential drawback is if this happens too frequently it could cause excessive seeking, but since it should only happen when there is a large amount of space any seek overhead should be smaller than the seek rate * IO size. |
| Comment by Gerrit Updater [ 03/Mar/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/21142/ |
| Comment by Gerrit Updater [ 10/May/19 ] |
|
Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34842 |
| Comment by Gerrit Updater [ 03/Jul/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34842/ |