Details
-
Improvement
-
Resolution: Fixed
-
Minor
-
Lustre 2.15.0
-
9223372036854775807
Description
Currently ltd_qos.lq_rw_sem is used at next LOD paths
lod_qos_statfs_update() write - does not protect anything I hope it will gone with LU-14277
lod_qos_calc_rr() write - refill pool array if LQ_DIRTY was set, rare
lod_ost_alloc_rr() read - whole path for objects reservation
lod_mdt_alloc_rr() read - the same
lod_ost_alloc_qos() write - whole path for OST weight calculation and objects allocation
lod_mdt_alloc_qos() write - the same
lu_qos_add_tgt() write - adds a new target marks LQ_DIRTY, rare
lu_qos_del_tgt() write - dels a target, marks LQ_DIRTY, rare
call graph for these functions
lod_qos_prep_create() { lod_qos_statfs_update() rc = lod_ost_alloc_qos() if (rc == -EAGAIN) rc = lod_ost_alloc_rr() { lod_qos_calc_rr() lod_check_and_reserve_ost() { lod_qos_declare_object_on() } } }
lod_qos_declare_object_on() could block on object creation when OST was lost, failover or so. This leads that ltd_qos.lq_rw_sem would be hold
by lod_ost_alloc_rr() for read all failover time. This also means that other creation threads would stuck at
lod_ost_alloc_qos() on down_write(). No matter how many OSTs Lustre could use, all creation threads would hang in this case.
I'm suggesting a patch to unblock lod_ost_alloc_qos() threads with EAGAIN, it leads to lod_ost_alloc_rr() where semaphore is shared for read. So creation threads could take health OSTs and allocates objects.