[LU-15393] object allocation when OST is lost Created: 22/Dec/21  Updated: 20/May/23  Resolved: 11/Jun/22

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.15.0
Fix Version/s: Lustre 2.16.0, Lustre 2.15.3

Type: Improvement Priority: Minor
Reporter: Alexander Boyko Assignee: Alexander Boyko
Resolution: Fixed Votes: 0
Labels: patch

Issue Links:
Related
is related to LU-13073 Multiple MDS deadlocks (in lod_qos_pr... Resolved
is related to LU-14277 any create blocked due any OST fail Resolved
Rank (Obsolete): 9223372036854775807

 Description   

Currently ltd_qos.lq_rw_sem is used at next LOD paths

lod_qos_statfs_update() write - does not protect anything I hope it will gone with LU-14277
lod_qos_calc_rr() write - refill pool array if LQ_DIRTY was set, rare
lod_ost_alloc_rr() read - whole path for objects reservation
lod_mdt_alloc_rr() read - the same
lod_ost_alloc_qos() write - whole path for OST weight calculation and objects allocation
lod_mdt_alloc_qos() write - the same
lu_qos_add_tgt() write - adds a new target marks LQ_DIRTY, rare
lu_qos_del_tgt() write - dels a target, marks LQ_DIRTY, rare

call graph for these functions

lod_qos_prep_create() {
        lod_qos_statfs_update()
        rc = lod_ost_alloc_qos()
        if (rc == -EAGAIN)
                rc = lod_ost_alloc_rr() {
                                lod_qos_calc_rr()
                                lod_check_and_reserve_ost() {
                                        lod_qos_declare_object_on()
                                }
                }
}

lod_qos_declare_object_on() could block on object creation when OST was lost, failover or so. This leads that ltd_qos.lq_rw_sem would be hold
by lod_ost_alloc_rr() for read all failover time. This also means that other creation threads would stuck at
lod_ost_alloc_qos() on down_write(). No matter how many OSTs Lustre could use, all creation threads would hang in this case.

I'm suggesting a patch to unblock lod_ost_alloc_qos() threads with EAGAIN, it leads to lod_ost_alloc_rr() where semaphore is shared for read. So creation threads could take health OSTs and allocates objects.



 Comments   
Comment by Gerrit Updater [ 22/Dec/21 ]

"Alexander Boyko <alexander.boyko@hpe.com>" uploaded a new patch: https://review.whamcloud.com/45921
Subject: LU-15393 lod: use killable semaphore for creation path
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: f787a6ba0432f91096d92a656e5265416df11692

Comment by Alex Zhuravlev [ 22/Dec/21 ]

normally we prefer RR policy and QoS is used when space usage is not well balanced among OST.

Comment by Alexander Boyko [ 27/Dec/21 ]

Well, QoS lost accuracy when Lustre starts to support more than one MDT. MDTs could create objects in parallel and doesn't know of each other. Actually every call of lod_qos_statfs_update()  brings LQ_DIRTY flag with any load on OSTs (strict check).

                avail = OST_TGT(lod,idx)->ltd_statfs.os_bavail;
                if (lod_statfs_and_check(env, lod, idx,
                                         &OST_TGT(lod, idx)->ltd_statfs, 0))
                        continue;
                if (OST_TGT(lod,idx)->ltd_statfs.os_bavail != avail)
                        /* recalculate weigths */
                        set_bit(LQ_DIRTY, &lod->lod_qos.lq_flags);

With a new OSTs speeds over 60GB/s, 1minute write equal to 3.6TB of data.  Default maxage is 5 seconds, 300GB.  Actually QoS doesn't make any profit for balancing during load, it only slows objects allocation and burns CPU. I see it benefit only with slow periodic IO.

Comment by Alex Zhuravlev [ 13/Jan/22 ]

so just disable QoS on a specific setup and that's it?
or tune the threshold ..

Comment by Gerrit Updater [ 31/Jan/22 ]

"Alexander Boyko <alexander.boyko@hpe.com>" uploaded a new patch: https://review.whamcloud.com/46388
Subject: LU-15393 lod: skip qos for qos_threshold_rr=100
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 8a62f1e7aca81152248ca20caed3d893681dd3cf

Comment by Gerrit Updater [ 11/Jun/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/45921/
Subject: LU-15393 lod: use killable semaphore for creation path
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: f46782b4c7dcaacd0046ebad3e3d84c2bb0367d4

Comment by Gerrit Updater [ 11/Jun/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/46388/
Subject: LU-15393 lod: skip qos for qos_threshold_rr=100
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 2f23140d5c1396fd0b247bd7f9c249f6e24096b7

Comment by Peter Jones [ 11/Jun/22 ]

Landed for 2.16

Comment by Gerrit Updater [ 23/Jun/22 ]

"Alexander Boyko <alexander.boyko@hpe.com>" uploaded a new patch: https://review.whamcloud.com/47715
Subject: LU-15393 tests: check QoS hang with OST failover
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 7d87322d92352865cc86438cba517d98aad0c789

Comment by Gerrit Updater [ 01/Sep/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/47715/
Subject: LU-15393 tests: check QoS hang with OST failover
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 52057d85eaef8c7b5262f0718629fabff919ff1d

Comment by Andreas Dilger [ 15/Sep/22 ]

The recovery-small test_152 failed once:
https://testing.whamcloud.com/test_sets/2ac04215-a77d-4436-8b38-65a379dd5855

Not sure if this is a problem yet.

Comment by Gerrit Updater [ 10/Nov/22 ]

"Jian Yu <yujian@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49095
Subject: LU-15393 lod: use killable semaphore for creation path
Project: fs/lustre-release
Branch: b2_15
Current Patch Set: 1
Commit: eaf1700f3d57ae88b48099611219ea6f3d2de75f

Comment by Gerrit Updater [ 10/Nov/22 ]

"Jian Yu <yujian@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49096
Subject: LU-15393 lod: skip qos for qos_threshold_rr=100
Project: fs/lustre-release
Branch: b2_15
Current Patch Set: 1
Commit: 17b646aac70cf702d1358e65bf8ce22f16f41dfd

Comment by Gerrit Updater [ 10/Nov/22 ]

"Jian Yu <yujian@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49097
Subject: LU-15393 tests: check QoS hang with OST failover
Project: fs/lustre-release
Branch: b2_15
Current Patch Set: 1
Commit: 1a44918703ab0f75c3ee7ab45bf9d6db7c1a6674

Comment by Gerrit Updater [ 08/Mar/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49095/
Subject: LU-15393 lod: use killable semaphore for creation path
Project: fs/lustre-release
Branch: b2_15
Current Patch Set:
Commit: 18c098261104fef9350e932d124d78296b0cc135

Comment by Gerrit Updater [ 08/Mar/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49096/
Subject: LU-15393 lod: skip qos for qos_threshold_rr=100
Project: fs/lustre-release
Branch: b2_15
Current Patch Set:
Commit: 0b1aa418ac26d879d4794db1aab360a2230c891d

Comment by Gerrit Updater [ 08/Mar/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49097/
Subject: LU-15393 tests: check QoS hang with OST failover
Project: fs/lustre-release
Branch: b2_15
Current Patch Set:
Commit: 3692450355585c1a3a8502ce0f96a36650941f96

Generated at Sat Feb 10 03:17:55 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.