Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-977

incorrect round robin object allocation

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: Lustre 2.8.0
    • Labels:
    • Environment:
      any lustre from a 1.6.0
    • Severity:
      3
    • Bugzilla ID:
      24,194
    • Rank (Obsolete):
      7276

      Description

      https://bugzilla.lustre.org/show_bug.cgi?id=24194

      bug issued due incorrect locking in lov_qos code and can be easy replicated by test

      diff --git a/lustre/lov/lov_qos.c b/lustre/lov/lov_qos.c 
      index a101e9c..64ccefb 100644 
      --- a/lustre/lov/lov_qos.c 
      +++ b/lustre/lov/lov_qos.c 
      @@ -627,6 +627,8 @@ static int alloc_rr(struct lov_obd *lov, int *idx_arr, int *stripe_cnt, 
      
       repeat_find: 
               array_idx = (lqr->lqr_start_idx + lqr->lqr_offset_idx) % osts->op_count; 
      + CFS_FAIL_TIMEOUT_MS(OBD_FAIL_MDS_LOV_CREATE_RACE, 100); 
      + 
               idx_pos = idx_arr; 
       #ifdef QOS_DEBUG 
               CDEBUG(D_QOS, "pool '%s' want %d startidx %d startcnt %d offset %d "
      
      test_51() {
              local obj1
              local obj2
              local old_rr
      
              mkdir -p $DIR1/$tfile-1/
              mkdir -p $DIR2/$tfile-2/
              old_rr=$(do_facet $SINGLEMDS lctl get_param -n 'lov.lustre-MDT*/qos_threshold_rr' | sed -e
      's/%//')
              do_facet $SINGLEMDS lctl set_param -n 'lov.lustre-MDT*/qos_threshold_rr' 100
      #define OBD_FAIL_MDS_LOV_CREATE_RACE     0x148
              do_facet $SINGLEMDS "lctl set_param fail_loc=0x80000148"
              touch $DIR1/$tfile-1/file1 &
              PID1=$!
              touch $DIR2/$tfile-2/file2 &
              PID2=$!
              wait $PID2
              wait $PID1
              do_facet $SINGLEMDS "lctl set_param fail_loc=0x0"
              do_facet $SINGLEMDS "lctl set_param -n 'lov.lustre-MDT*/qos_threshold_rr' $old_rr"
      
              obj1=$($GETSTRIPE -o $DIR1/$tfile-1/file1)
              obj2=$($GETSTRIPE -o $DIR1/$tfile-2/file2)
              [ $obj1 -eq $obj2 ] && error "must different ost used"
      }
      run_test 51 "alloc_rr should be allocate on correct order"
      

      bug found in 2.x but should be exist in 1.8 also.

      CFS_FAIL_TIMEOUT_MS can be replaced with CFS_RACE()

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                bogl Bob Glossman (Inactive)
                Reporter:
                shadow Alexey Lyashkov
              • Votes:
                0 Vote for this issue
                Watchers:
                18 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: