[LU-7954] lod_alloc_qos() does not check OS_STATE_DEGRADED Created: 29/Mar/16  Updated: 16/Sep/16  Resolved: 16/Sep/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.9.0

Type: Bug Priority: Minor
Reporter: Andreas Dilger Assignee: WC Triage
Resolution: Duplicate Votes: 0
Labels: None

Issue Links:
Related
is related to LU-522 sanity.sh test_27x failed with "OST0 ... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

The OS_STATE_DEGRADED sent from the OST only reduces the probability that an OST will be used, to avoid placing load on an OST that is doing RAID rebuild, and will typically avoid the OST if they are otherwise equally used. It does not completely prevent the OST from being used if there are not enough other suitable OSTs available to meet the client's request.

If OSTs are already quite imbalanced due to uneven space usage, it appears that the "QOS" allocator does not take the OS_STATE_DEGRADED flag into account when selecting OSTs.

It seems that lod_alloc_qos() is doing many, but not all of the checks from lod_check_and_reserve_ost(), and it would be better if this function was split into two - lod_check_ost() to handle the first half of the "hard" reasons to skip an OST that can also be used in the first "good_osts" loop of lod_alloc_qos(), and lod_reserve_ost() to check the "soft" reasons (precreated and later) to skip an OST that can also be used in the second "nfound" loop of lod_alloc_qos().



 Comments   
Comment by Andreas Dilger [ 16/Sep/16 ]

This was fixed via patch http://review.whamcloud.com/20747 "LU-522 lod: do not ignore degraded flag of ost" for v2_8_57_0-35-g994aa41 (i.e. 2.9.0).

Generated at Sat Feb 10 02:13:21 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.