Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15393

object allocation when OST is lost

    XMLWordPrintable

Details

    • 9223372036854775807

    Description

      Currently ltd_qos.lq_rw_sem is used at next LOD paths

      lod_qos_statfs_update() write - does not protect anything I hope it will gone with LU-14277
      lod_qos_calc_rr() write - refill pool array if LQ_DIRTY was set, rare
      lod_ost_alloc_rr() read - whole path for objects reservation
      lod_mdt_alloc_rr() read - the same
      lod_ost_alloc_qos() write - whole path for OST weight calculation and objects allocation
      lod_mdt_alloc_qos() write - the same
      lu_qos_add_tgt() write - adds a new target marks LQ_DIRTY, rare
      lu_qos_del_tgt() write - dels a target, marks LQ_DIRTY, rare

      call graph for these functions

      lod_qos_prep_create() {
              lod_qos_statfs_update()
              rc = lod_ost_alloc_qos()
              if (rc == -EAGAIN)
                      rc = lod_ost_alloc_rr() {
                                      lod_qos_calc_rr()
                                      lod_check_and_reserve_ost() {
                                              lod_qos_declare_object_on()
                                      }
                      }
      }
      

      lod_qos_declare_object_on() could block on object creation when OST was lost, failover or so. This leads that ltd_qos.lq_rw_sem would be hold
      by lod_ost_alloc_rr() for read all failover time. This also means that other creation threads would stuck at
      lod_ost_alloc_qos() on down_write(). No matter how many OSTs Lustre could use, all creation threads would hang in this case.

      I'm suggesting a patch to unblock lod_ost_alloc_qos() threads with EAGAIN, it leads to lod_ost_alloc_rr() where semaphore is shared for read. So creation threads could take health OSTs and allocates objects.

      Attachments

        Issue Links

          Activity

            People

              aboyko Alexander Boyko
              aboyko Alexander Boyko
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: