Details
-
Bug
-
Resolution: Fixed
-
Major
-
None
-
any lustre from a 1.6.0
-
3
-
24,194
-
7276
Description
https://bugzilla.lustre.org/show_bug.cgi?id=24194
bug issued due incorrect locking in lov_qos code and can be easy replicated by test
diff --git a/lustre/lov/lov_qos.c b/lustre/lov/lov_qos.c index a101e9c..64ccefb 100644 --- a/lustre/lov/lov_qos.c +++ b/lustre/lov/lov_qos.c @@ -627,6 +627,8 @@ static int alloc_rr(struct lov_obd *lov, int *idx_arr, int *stripe_cnt, repeat_find: array_idx = (lqr->lqr_start_idx + lqr->lqr_offset_idx) % osts->op_count; + CFS_FAIL_TIMEOUT_MS(OBD_FAIL_MDS_LOV_CREATE_RACE, 100); + idx_pos = idx_arr; #ifdef QOS_DEBUG CDEBUG(D_QOS, "pool '%s' want %d startidx %d startcnt %d offset %d " test_51() { local obj1 local obj2 local old_rr mkdir -p $DIR1/$tfile-1/ mkdir -p $DIR2/$tfile-2/ old_rr=$(do_facet $SINGLEMDS lctl get_param -n 'lov.lustre-MDT*/qos_threshold_rr' | sed -e 's/%//') do_facet $SINGLEMDS lctl set_param -n 'lov.lustre-MDT*/qos_threshold_rr' 100 #define OBD_FAIL_MDS_LOV_CREATE_RACE 0x148 do_facet $SINGLEMDS "lctl set_param fail_loc=0x80000148" touch $DIR1/$tfile-1/file1 & PID1=$! touch $DIR2/$tfile-2/file2 & PID2=$! wait $PID2 wait $PID1 do_facet $SINGLEMDS "lctl set_param fail_loc=0x0" do_facet $SINGLEMDS "lctl set_param -n 'lov.lustre-MDT*/qos_threshold_rr' $old_rr" obj1=$($GETSTRIPE -o $DIR1/$tfile-1/file1) obj2=$($GETSTRIPE -o $DIR1/$tfile-2/file2) [ $obj1 -eq $obj2 ] && error "must different ost used" } run_test 51 "alloc_rr should be allocate on correct order"
bug found in 2.x but should be exist in 1.8 also.
CFS_FAIL_TIMEOUT_MS can be replaced with CFS_RACE()
Attachments
Issue Links
- is related to
-
LU-9780 Add test for fix added in LU-977
- Resolved
-
LU-14377 parallel-scale test rr_alloc fails with ''Uneven distribution detected: difference between maximum files per OST (1528) and minimum files per OST (1525) must not be greater than 2''
- Resolved
-
LU-9 Optimize weighted QOS Round-Robin allocator
- Open
- Trackbacks
-
Lustre 1.8.x known issues tracker While testing against Lustre b18 branch, we would hit known bugs which were already reported in Lustre Bugzilla https://bugzilla.lustre.org/. In order to move away from relying on Bugzilla, we would create a JIRA