[LU-15393] object allocation when OST is lost Created: 22/Dec/21 Updated: 20/May/23 Resolved: 11/Jun/22 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.15.0 |
| Fix Version/s: | Lustre 2.16.0, Lustre 2.15.3 |
| Type: | Improvement | Priority: | Minor |
| Reporter: | Alexander Boyko | Assignee: | Alexander Boyko |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | patch | ||
| Issue Links: |
|
||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
Currently ltd_qos.lq_rw_sem is used at next LOD paths lod_qos_statfs_update() write - does not protect anything I hope it will gone with call graph for these functions lod_qos_prep_create() {
lod_qos_statfs_update()
rc = lod_ost_alloc_qos()
if (rc == -EAGAIN)
rc = lod_ost_alloc_rr() {
lod_qos_calc_rr()
lod_check_and_reserve_ost() {
lod_qos_declare_object_on()
}
}
}
lod_qos_declare_object_on() could block on object creation when OST was lost, failover or so. This leads that ltd_qos.lq_rw_sem would be hold I'm suggesting a patch to unblock lod_ost_alloc_qos() threads with EAGAIN, it leads to lod_ost_alloc_rr() where semaphore is shared for read. So creation threads could take health OSTs and allocates objects. |
| Comments |
| Comment by Gerrit Updater [ 22/Dec/21 ] |
|
"Alexander Boyko <alexander.boyko@hpe.com>" uploaded a new patch: https://review.whamcloud.com/45921 |
| Comment by Alex Zhuravlev [ 22/Dec/21 ] |
|
normally we prefer RR policy and QoS is used when space usage is not well balanced among OST. |
| Comment by Alexander Boyko [ 27/Dec/21 ] |
|
Well, QoS lost accuracy when Lustre starts to support more than one MDT. MDTs could create objects in parallel and doesn't know of each other. Actually every call of lod_qos_statfs_update() brings LQ_DIRTY flag with any load on OSTs (strict check). avail = OST_TGT(lod,idx)->ltd_statfs.os_bavail;
if (lod_statfs_and_check(env, lod, idx,
&OST_TGT(lod, idx)->ltd_statfs, 0))
continue;
if (OST_TGT(lod,idx)->ltd_statfs.os_bavail != avail)
/* recalculate weigths */
set_bit(LQ_DIRTY, &lod->lod_qos.lq_flags);
With a new OSTs speeds over 60GB/s, 1minute write equal to 3.6TB of data. Default maxage is 5 seconds, 300GB. Actually QoS doesn't make any profit for balancing during load, it only slows objects allocation and burns CPU. I see it benefit only with slow periodic IO. |
| Comment by Alex Zhuravlev [ 13/Jan/22 ] |
|
so just disable QoS on a specific setup and that's it? |
| Comment by Gerrit Updater [ 31/Jan/22 ] |
|
"Alexander Boyko <alexander.boyko@hpe.com>" uploaded a new patch: https://review.whamcloud.com/46388 |
| Comment by Gerrit Updater [ 11/Jun/22 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/45921/ |
| Comment by Gerrit Updater [ 11/Jun/22 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/46388/ |
| Comment by Peter Jones [ 11/Jun/22 ] |
|
Landed for 2.16 |
| Comment by Gerrit Updater [ 23/Jun/22 ] |
|
"Alexander Boyko <alexander.boyko@hpe.com>" uploaded a new patch: https://review.whamcloud.com/47715 |
| Comment by Gerrit Updater [ 01/Sep/22 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/47715/ |
| Comment by Andreas Dilger [ 15/Sep/22 ] |
|
The recovery-small test_152 failed once: Not sure if this is a problem yet. |
| Comment by Gerrit Updater [ 10/Nov/22 ] |
|
"Jian Yu <yujian@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49095 |
| Comment by Gerrit Updater [ 10/Nov/22 ] |
|
"Jian Yu <yujian@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49096 |
| Comment by Gerrit Updater [ 10/Nov/22 ] |
|
"Jian Yu <yujian@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49097 |
| Comment by Gerrit Updater [ 08/Mar/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49095/ |
| Comment by Gerrit Updater [ 08/Mar/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49096/ |
| Comment by Gerrit Updater [ 08/Mar/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49097/ |