Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7954

lod_alloc_qos() does not check OS_STATE_DEGRADED

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • Lustre 2.9.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      The OS_STATE_DEGRADED sent from the OST only reduces the probability that an OST will be used, to avoid placing load on an OST that is doing RAID rebuild, and will typically avoid the OST if they are otherwise equally used. It does not completely prevent the OST from being used if there are not enough other suitable OSTs available to meet the client's request.

      If OSTs are already quite imbalanced due to uneven space usage, it appears that the "QOS" allocator does not take the OS_STATE_DEGRADED flag into account when selecting OSTs.

      It seems that lod_alloc_qos() is doing many, but not all of the checks from lod_check_and_reserve_ost(), and it would be better if this function was split into two - lod_check_ost() to handle the first half of the "hard" reasons to skip an OST that can also be used in the first "good_osts" loop of lod_alloc_qos(), and lod_reserve_ost() to check the "soft" reasons (precreated and later) to skip an OST that can also be used in the second "nfound" loop of lod_alloc_qos().

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: