[LU-13363] unbalanced round-robin for object allocation in OST pool Created: 17/Mar/20 Updated: 16/Jun/22 Resolved: 06/Jun/22 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.14.0 |
| Fix Version/s: | Lustre 2.16.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Shuichi Ihara | Assignee: | Alex Zhuravlev |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Two OST pools with two different sizes of OSTs within the same filesystem |
||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||
| Description |
|
Here is an example. create two OST pools with 12 OSTs. pool 'nvme' consists of OST index[0-7]. lctl pool_new scratch.nvme lctl pool_new scratch.hdd lctl pool_add scratch.nvme OST[0-7] lctl pool_add scratch.hdd OST[8-b] If an client creates 48 files (new files) into an directory which is associated with 8 OSTs by OST pool, it would expect 6 OST objects per OST, but results was totally unbalanced. Used 8 of 12 OSTs with an OST pool ost index
0 1 2 3 4 5 6 7
t1. 4 10 3 8 5 6 8 4
t2. 6 5 6 7 8 4 10 2
t3. 3 10 8 6 5 9 6 1
t4. 4 10 6 5 4 6 8 5
t5. 6 6 7 4 6 5 8 6
If filesystem created on just 8 OSTs and no OST pool, OST objects were allocated to across 8 OSTs in an balanced and round-robin worked perfectly. Just 8 OST without OST pool ost index
0 1 2 3 4 5 6 7
t1. 6 6 6 6 6 6 6 6
t2. 6 6 6 6 6 6 6 6
t3. 6 6 6 6 6 6 6 6
t4. 6 6 6 6 6 6 6 6
t5. 6 6 6 6 6 6 6 6
|
| Comments |
| Comment by Andreas Dilger [ 17/Mar/20 ] |
|
Presumably the OST0008-OST000B size is much different than OST0000-OST0007? It might be that the pool allocation is incorrectly using QOS because the global OST imbalance, even though the OSTs within the pool are still balanced. If you configure with only the NVMe OST0000-OST0007, but create the pool on only 6 of them, is the allocation balanced? |
| Comment by Andreas Dilger [ 17/Mar/20 ] |
|
It looks like this may be a duplicate of |
| Comment by Shuichi Ihara [ 17/Mar/20 ] |
|
Yes, if all OSTs are same capacity and created an OST pool from few OSTs, it's balanced very well. if different capacity of OST are mixed in filesystem, it causes problem even it creates OST pool on same capacity of devices. |
| Comment by Andreas Dilger [ 17/Mar/20 ] |
|
Notes for fixing this issue from
|
| Comment by Alex Zhuravlev [ 31/Mar/20 ] |
|
it sounds like each pool needs own lu_qos and all logic should be built around that per-pool structure? |
| Comment by Andreas Dilger [ 31/Mar/20 ] |
|
There are definitely going to be OSTs in multiple pools, and allocations that are outside pools. I think there should be common data fields, like OST fullness, that are shared across pools, and other per-pool information that is not shared. I don't think we need to have totally perfect coordination between allocations in two different pools or in a pool and outside the pool. However, simple decisions like "is this pool within qos_threshold_rr" can be easily checked for all of the OSTs in the pool, regardless of whether the OST is in another pool as well. If the pool is balanced, then it should just do round-robin allocations within that pool. |
| Comment by Emoly Liu [ 03/Apr/20 ] |
|
I made a patch to calculateĀ penalties per-ost in a pool. At first, I tried to add qos structure to pool_desc, similar idea to Alex's, but finally I found we don't need that because what we want is just to rebalance data in this pool each time. Here is my test on 6 OSTs. pool1 is on OST[0-3] and OST[0-3] have similar available space, as follows. Then, I created 48 files on them. [root@centos7-3 tests]# lfs df UUID 1K-blocks Used Available Use% Mounted on lustre-OST0000_UUID 325368 115908 182300 39% /mnt/lustre[OST:0] lustre-OST0001_UUID 325368 126152 172056 43% /mnt/lustre[OST:1] lustre-OST0002_UUID 325368 136388 161820 46% /mnt/lustre[OST:2] lustre-OST0003_UUID 325368 131276 166932 45% /mnt/lustre[OST:3] lustre-OST0004_UUID 325368 13512 284696 5% /mnt/lustre[OST:4] lustre-OST0005_UUID 325368 13516 284692 5% /mnt/lustre[OST:5] Without the patch, the files distribution is OST0 OST1 OST2 OST3 13 11 14 10 With the patch, OST0 OST1 OST2 OST3 12 12 12 12 I will submit this tentative patch later. |
| Comment by Alex Zhuravlev [ 03/Apr/20 ] |
|
I think rebalancing on every allocation is too expensive. |
| Comment by Gerrit Updater [ 03/Apr/20 ] |
|
Emoly Liu (emoly@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38136 |
| Comment by Gerrit Updater [ 06/Jun/22 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/38136/ |
| Comment by Peter Jones [ 06/Jun/22 ] |
|
Landed for 2.16 |