[LU-17276] Use interval tree in flock to improve the scalability Created: 09/Nov/23  Updated: 07/Feb/24

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Question/Request Priority: Minor
Reporter: Yang Sheng Assignee: Yang Sheng
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Related
is related to LU-17502 distribute flock locking across multi... Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Running a test to exercise flock (N clients requesting a lock on different non-overlapping regions) showed high load on the MDS. There was previously a similar issue with regular extent locks (which are used to control data coherency for regular reads and writes) and adding an extra structure called interval tree reduced the load. Likely the same structure can be used for flock.
 
With a trivial benchmark, single client, local setup, non-overlapping locks:

FLOCKS_TEST 5: SET write 1000 flock(s) took 0.06s 16048.30/sec
FLOCKS_TEST 5: SET write 2000 flock(s) took 0.14s 14526.76/sec
FLOCKS_TEST 5: SET write 5000 flock(s) took 0.60s 8264.82/sec
FLOCKS_TEST 5: SET write 10000 flock(s) took 2.94s 3401.57/sec
FLOCKS_TEST 5: SET write 25000 flock(s) took 36.29s 688.98/sec
FLOCKS_TEST 5: SET write 50000 flock(s) took 281.29s 177.75/sec
FLOCKS_TEST 5: SET write 75000 flock(s) took 661.73s 113.34/sec

Test case in patch https://review.whamcloud.com/53094 "LU-17276 tests: flock scalability test".



 Comments   
Comment by Gerrit Updater [ 06/Dec/23 ]

"Yang Sheng <ys@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53346
Subject: LU-17276 ldlm: use interval tree in flock
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: d23a8e9e8c8ec998fef271655ffb2d3624f4337a

Comment by Gerrit Updater [ 13/Dec/23 ]

"Yang Sheng <ys@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53447
Subject: LU-17276 ldlm: modify the struct for flock change
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 596ed09b0c20ee534a565dfd2ca1dcf9850f3e4b

Comment by Gerrit Updater [ 18/Dec/23 ]

"Yang Sheng <ys@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53495
Subject: LU-17276 ldlm: prealloc lock for split case
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: aae8278485923d8da2a1fbcede71eead00f1ba42

Comment by Gerrit Updater [ 25/Dec/23 ]

"Yang Sheng <ys@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53551
Subject: LU-17276 ldlm: use interval tree in flock
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 0c847f2df985b9362d74adfd9c8b5c78cee09fe7

Comment by Yang Sheng [ 26/Dec/23 ]

As record:

The flock patch was separated in 3 parts:
1. https://review.whamcloud.com/53447 ldlm: add interval in flock
2. https://review.whamcloud.com/53495 ldlm: prealloc lock for split case
3. https://review.whamcloud.com/53551 ldlm: use interval tree in flock

test result as below:

locks master new-wop(1+3) new(1+2+3) master-split new-split(1+2+3) new-wop-split(1+3)
1000 0.585 0.078 0.376 1.038 0.827 0.835
2000 0.876 0.117 0.670 3.192 1.454 1.860
5000 2.964 0.265 1.686 22.962 5.278 5.242
10000 12.877 0.821 3.858 98.711 16.583 21.610
25000 85.232 6.680 15.380 793.776 116.486 187.822
50000 328.482 58.561 69.250 NA 544.515 657.517
90000 913.543 240.566 275.410 NA 1681.352 2080.174

Note: The master == original lustre

          The  new-wop == (without prealloc)patch#1+#3               

          The new == patch#1+#2+#3

          The split means testing with split case(enqueue a lock then split it). 

We need alloc a new lock while a lock was split. eg.

    exist lock-PR(1,10), enqueue a lock-PW(5,6)
    then lock1(1,4);lock2(5,6);lock3(7,10) 
    The lock3 will be alloced finally.

But the res lock must be released while allocation. So we have to restart whole process from beginning after lock was taken again. I was thinking prealloc a lock before taken res lock to avoid release it in middle. After testing, looks like the side effect big than benefit. So the 2nd patch is useless anyway. It is really sad.

BTW: Hi, Andreas, where should be placed for the flock performance testing case? sanity is obvious not.

 

Thanks,

YangSheng

Comment by Andreas Dilger [ 26/Dec/23 ]

It looks like the "wop" tests are showing the best performance, which I agree is unexpected, especially in the "split" case where there will always need to be a new lock allocated each time.

It might be interesting to see what happens when there are 250k or 500k locks on a resource? It might be that the overhead of allocating an extra lock is high compared to the cost of re-scanning the list if it is small, but the overhead gets higher when the list is large? It looks like the "1+2+3" times were getting closer to "1+2" with more locks? This might mean preallocating the lock when a resource has > 100k entries but not otherwise?

Comment by Gerrit Updater [ 27/Dec/23 ]

"Yang Sheng <ys@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53558
Subject: LU-17276 tests: performance test case for flock
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: c7f9ef1da6a60c0b5d10c655ea8736521288ed80

Comment by Andreas Dilger [ 27/Dec/23 ]

It looks like switching to prealloc when >= 50,000 locks on a resource is probably a reasonable balance. That saves 113s for the contended/split case, while only slowing the uncontended case by 11s. With that many locks on a single object it is very likely that there will be conflicts between the locks.

It might be possible to make this smarter by checking the lock type (eg. read is unlikely to split vs. write is more likely to split) and make a prealloc decision based on that, and/or keep a history of whether recent locks were conflicting or not. However, I expect the number of times this added complexity will be helpful is relatively low, since few applications will be flocking a single file so many times. The most important thing is to avoid the O(n^2) behavior of "master+split" that we have today.

Comment by Yang Sheng [ 27/Dec/23 ]

Hi, Andreas,

 

Since split could occur in all of scene except get_lock operation.  Make decision base on lock number is reasonable, but it is hard to determine the certain value since different environment. 

I agree it is a rare case that flock one file many times. Also the split case is not too frequent in real world. Looks like not valuable to add the complexity for that.  If you think it is still helpful in some cases, maybe we can make it as a optional feature control by proc?

 

Thanks,

YangSheng

Comment by Gerrit Updater [ 10/Jan/24 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/53558/
Subject: LU-17276 tests: performance test case for flock
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 6377859352b861861c7a46e1e9a11e4be75d626e

Comment by Gerrit Updater [ 15/Jan/24 ]

"Yang Sheng <ys@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53675
Subject: LU-17276 ldlm: don't release ldlm_interval buffer
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 82195ab76f817ed23034cbcdc052118e738d0a86

Comment by Gerrit Updater [ 01/Feb/24 ]

"Yang Sheng <ys@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53881
Subject: LU-17276 tests: Enqueue same range flocks
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 6e1cd7e2ca075fea96b501373182466b6e007756

Comment by Gerrit Updater [ 07/Feb/24 ]

"Neil Brown <neilb@suse.de>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53950
Subject: LU-17276 ldlm: store LDLM_FLOCK locks in an interval tree
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: d765eaf9ea0513381cfefac848e33713912916a5

Comment by Gerrit Updater [ 07/Feb/24 ]

"Neil Brown <neilb@suse.de>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53951
Subject: LU-17276 ldlm: use interval tree for searching in flock
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: d89e93247d9cd0696a861ca9b7b6e448dd6c2aaa

Generated at Sat Feb 10 03:34:06 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.