Details
-
Bug
-
Resolution: Fixed
-
Major
-
None
-
any lustre from a 1.6.0
-
3
-
24,194
-
7276
Description
https://bugzilla.lustre.org/show_bug.cgi?id=24194
bug issued due incorrect locking in lov_qos code and can be easy replicated by test
diff --git a/lustre/lov/lov_qos.c b/lustre/lov/lov_qos.c
index a101e9c..64ccefb 100644
--- a/lustre/lov/lov_qos.c
+++ b/lustre/lov/lov_qos.c
@@ -627,6 +627,8 @@ static int alloc_rr(struct lov_obd *lov, int *idx_arr, int *stripe_cnt,
repeat_find:
array_idx = (lqr->lqr_start_idx + lqr->lqr_offset_idx) % osts->op_count;
+ CFS_FAIL_TIMEOUT_MS(OBD_FAIL_MDS_LOV_CREATE_RACE, 100);
+
idx_pos = idx_arr;
#ifdef QOS_DEBUG
CDEBUG(D_QOS, "pool '%s' want %d startidx %d startcnt %d offset %d "
test_51() {
local obj1
local obj2
local old_rr
mkdir -p $DIR1/$tfile-1/
mkdir -p $DIR2/$tfile-2/
old_rr=$(do_facet $SINGLEMDS lctl get_param -n 'lov.lustre-MDT*/qos_threshold_rr' | sed -e
's/%//')
do_facet $SINGLEMDS lctl set_param -n 'lov.lustre-MDT*/qos_threshold_rr' 100
#define OBD_FAIL_MDS_LOV_CREATE_RACE 0x148
do_facet $SINGLEMDS "lctl set_param fail_loc=0x80000148"
touch $DIR1/$tfile-1/file1 &
PID1=$!
touch $DIR2/$tfile-2/file2 &
PID2=$!
wait $PID2
wait $PID1
do_facet $SINGLEMDS "lctl set_param fail_loc=0x0"
do_facet $SINGLEMDS "lctl set_param -n 'lov.lustre-MDT*/qos_threshold_rr' $old_rr"
obj1=$($GETSTRIPE -o $DIR1/$tfile-1/file1)
obj2=$($GETSTRIPE -o $DIR1/$tfile-2/file2)
[ $obj1 -eq $obj2 ] && error "must different ost used"
}
run_test 51 "alloc_rr should be allocate on correct order"
bug found in 2.x but should be exist in 1.8 also.
CFS_FAIL_TIMEOUT_MS can be replaced with CFS_RACE()
Attachments
Issue Links
- is related to
-
LU-9780 Add test for fix added in LU-977
-
- Resolved
-
-
LU-14377 parallel-scale test rr_alloc fails with ''Uneven distribution detected: difference between maximum files per OST (1528) and minimum files per OST (1525) must not be greater than 2''
-
- Resolved
-
-
LU-9 Optimize weighted QOS Round-Robin allocator
-
- Open
-
- Trackbacks
-
Lustre 1.8.x known issues tracker
While testing against Lustre b18 branch, we would hit known bugs which were already reported in Lustre Bugzilla https://bugzilla.lustre.org/. In order to move away from relying on Bugzilla, we would create a JIRA