[LU-977] incorrect round robin object allocation - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: Lustre 2.8.0
Affects Version/s: None
Labels:
- llnl
- patch
Environment:
any lustre from a 1.6.0

Severity:
3
Bugzilla ID:
24,194
Rank (Obsolete):
7276

Description

https://bugzilla.lustre.org/show_bug.cgi?id=24194

bug issued due incorrect locking in lov_qos code and can be easy replicated by test

diff --git a/lustre/lov/lov_qos.c b/lustre/lov/lov_qos.c 
index a101e9c..64ccefb 100644 
--- a/lustre/lov/lov_qos.c 
+++ b/lustre/lov/lov_qos.c 
@@ -627,6 +627,8 @@ static int alloc_rr(struct lov_obd *lov, int *idx_arr, int *stripe_cnt, 

 repeat_find: 
         array_idx = (lqr->lqr_start_idx + lqr->lqr_offset_idx) % osts->op_count; 
+ CFS_FAIL_TIMEOUT_MS(OBD_FAIL_MDS_LOV_CREATE_RACE, 100); 
+ 
         idx_pos = idx_arr; 
 #ifdef QOS_DEBUG 
         CDEBUG(D_QOS, "pool '%s' want %d startidx %d startcnt %d offset %d "

test_51() {
        local obj1
        local obj2
        local old_rr

        mkdir -p $DIR1/$tfile-1/
        mkdir -p $DIR2/$tfile-2/
        old_rr=$(do_facet $SINGLEMDS lctl get_param -n 'lov.lustre-MDT*/qos_threshold_rr' | sed -e
's/%//')
        do_facet $SINGLEMDS lctl set_param -n 'lov.lustre-MDT*/qos_threshold_rr' 100
#define OBD_FAIL_MDS_LOV_CREATE_RACE     0x148
        do_facet $SINGLEMDS "lctl set_param fail_loc=0x80000148"
        touch $DIR1/$tfile-1/file1 &
        PID1=$!
        touch $DIR2/$tfile-2/file2 &
        PID2=$!
        wait $PID2
        wait $PID1
        do_facet $SINGLEMDS "lctl set_param fail_loc=0x0"
        do_facet $SINGLEMDS "lctl set_param -n 'lov.lustre-MDT*/qos_threshold_rr' $old_rr"

        obj1=$($GETSTRIPE -o $DIR1/$tfile-1/file1)
        obj2=$($GETSTRIPE -o $DIR1/$tfile-2/file2)
        [ $obj1 -eq $obj2 ] && error "must different ost used"
}
run_test 51 "alloc_rr should be allocate on correct order"

bug found in 2.x but should be exist in 1.8 also.

CFS_FAIL_TIMEOUT_MS can be replaced with CFS_RACE()

Attachments

Issue Links

is related to

LU-9780 Add test for fix added in LU-977

Resolved

LU-14377 parallel-scale test rr_alloc fails with ''Uneven distribution detected: difference between maximum files per OST (1528) and minimum files per OST (1525) must not be greater than 2''

Resolved

LU-9 Optimize weighted QOS Round-Robin allocator

Open

Trackbacks

Lustre 1.8.x known issues tracker While testing against Lustre b18 branch, we would hit known bugs which were already reported in Lustre Bugzilla https://bugzilla.lustre.org/. In order to move away from relying on Bugzilla, we would create a JIRA

Activity

[LU-977] incorrect round robin object allocation

Rahul Deshmukh (Inactive) added a comment - 29/Apr/15 8:12 AM

Posted the new patch to address this issue. Please review.

Rahul Deshmukh (Inactive) added a comment - 29/Apr/15 8:12 AM Posted the new patch to address this issue. Please review.

Gerrit Updater added a comment - 29/Apr/15 8:08 AM

Rahul Deshmukh (rahul.deshmukh@seagate.com) uploaded a new patch: http://review.whamcloud.com/14636
Subject: ~~LU-977~~ lod: Patch to protect lqr_start_idx
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 86aa10e5b8c6b944c10ce224078ba4f6aafbe6eb

Gerrit Updater added a comment - 29/Apr/15 8:08 AM Rahul Deshmukh (rahul.deshmukh@seagate.com) uploaded a new patch: http://review.whamcloud.com/14636 Subject: LU-977 lod: Patch to protect lqr_start_idx Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 86aa10e5b8c6b944c10ce224078ba4f6aafbe6eb

Alex Zhuravlev added a comment - 27/May/13 5:26 AM

Cory, notice "totally", which is possible only with very strong locking around allocation, IMHO. which is virtually not possible with DNE and not very good on multicore?

Alex Zhuravlev added a comment - 27/May/13 5:26 AM Cory, notice "totally", which is possible only with very strong locking around allocation, IMHO. which is virtually not possible with DNE and not very good on multicore?

Alexey Lyashkov added a comment - 21/May/13 6:39 PM

If i correctly understand Alex, they mean different MDT may allocate on same OST so OST may have a different allocated objects. But it's false if each MDT have own OST pools assigned.

Alexey Lyashkov added a comment - 21/May/13 6:39 PM If i correctly understand Alex, they mean different MDT may allocate on same OST so OST may have a different allocated objects. But it's false if each MDT have own OST pools assigned.

Cory Spitz added a comment - 21/May/13 3:38 PM

I don't see how 'precise' RR is not possible with DNE. If an application wants evenly balanced stripe allocation, that should still be possible as the allocators aren't linked in DNE. So then if the one MDS allocator hasn't switched to the space-based allocator, then round-robin should still be (mostly) 'precise', correct?

Cory Spitz added a comment - 21/May/13 3:38 PM I don't see how 'precise' RR is not possible with DNE. If an application wants evenly balanced stripe allocation, that should still be possible as the allocators aren't linked in DNE. So then if the one MDS allocator hasn't switched to the space-based allocator, then round-robin should still be (mostly) 'precise', correct?

Alex Zhuravlev added a comment - 20/Mar/13 11:01 AM

totally precise RR is not possible with DNE, for example.

Alex Zhuravlev added a comment - 20/Mar/13 11:01 AM totally precise RR is not possible with DNE, for example.

Alexey Lyashkov added a comment - 19/Mar/13 5:38 AM

Eugene Birkine added a comment - 06/Dec/11 9:05 PM
Debug log file from MDS with qos_threshold_rr=100 during 16 file writes. The file distribution was:
testfs-OST0000
2
testfs-OST0001
3
testfs-OST0002
2
testfs-OST0003
1
testfs-OST0004
2
testfs-OST0005
3
testfs-OST0006
1
testfs-OST0007
2

Alexey Lyashkov added a comment - 19/Mar/13 5:38 AM Eugene Birkine added a comment - 06/Dec/11 9:05 PM Debug log file from MDS with qos_threshold_rr=100 during 16 file writes. The file distribution was: testfs-OST0000 2 testfs-OST0001 3 testfs-OST0002 2 testfs-OST0003 1 testfs-OST0004 2 testfs-OST0005 3 testfs-OST0006 1 testfs-OST0007 2

Keith Mannthey (Inactive) added a comment - 19/Mar/13 12:27 AM

Alexey, What is the worst case allocation that you have seen? It still sounds like you want a "totally precise" client / ost allocation mapping.

Keith Mannthey (Inactive) added a comment - 19/Mar/13 12:27 AM Alexey, What is the worst case allocation that you have seen? It still sounds like you want a "totally precise" client / ost allocation mapping.

Alexey Lyashkov added a comment - 15/Mar/13 7:04 AM

did you have plans to fix it?

Alexey Lyashkov added a comment - 15/Mar/13 7:04 AM did you have plans to fix it?

Alexey Lyashkov added a comment - 14/Jan/13 6:23 AM

Alex,

about second, i mean if we have 20 allocations and 5 ost's - we need to have 4 allocations on each ost's - otherwise that is isn't round-robin allocation. and we have more load to same one or more ost's with same workload pattern.

Alexey Lyashkov added a comment - 14/Jan/13 6:23 AM Alex, about second, i mean if we have 20 allocations and 5 ost's - we need to have 4 allocations on each ost's - otherwise that is isn't round-robin allocation. and we have more load to same one or more ost's with same workload pattern.

Alex Zhuravlev added a comment - 13/Jan/13 4:49 AM

the 2nd requirement can't be achieved just because object doesn't imply same amount of data and IO pattern. so, I don't think some variation will be that bad.

Alex Zhuravlev added a comment - 13/Jan/13 4:49 AM the 2nd requirement can't be achieved just because object doesn't imply same amount of data and IO pattern. so, I don't think some variation will be that bad.

People

Assignee:: Bob Glossman (Inactive)

Reporter:: Alexey Lyashkov

Votes:: 0 Vote for this issue

Watchers:: 18 Start watching this issue

Dates

Created:: 10/Jan/12 1:33 AM

Updated:: 19/Oct/22 1:03 AM

Resolved:: 21/Aug/15 10:10 PM