[LU-11290] Batch callbacks in osc_page_gang_lookup Created: 28/Aug/18  Updated: 02/Jun/21  Resolved: 02/Jun/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.15.0

Type: Bug Priority: Minor
Reporter: Patrick Farrell (Inactive) Assignee: Dongyang Li
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-14711 Canceling lock with a lot of cached d... Resolved
is related to LU-9920 Use pagevec for marking pages dirty Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Lock cancellation can be very time consuming when there are many pages under a lock.

One easy area for improvement is the osc_page_gang_lookup callback functions - These work on individual pages from an array, but they can be trivially modified to work on the entire array at once, reducing overhead.

I've got a simple patch for this, which improves performance of lock cancellation by about 10%, and truncate [when pages are in cache] (which also uses the call back) by about 5%.



 Comments   
Comment by Gerrit Updater [ 28/Aug/18 ]

Patrick Farrell (paf@cray.com) uploaded a new patch: https://review.whamcloud.com/33089
Subject: LU-11290: Batch gang_lookup cbs
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 5c53ea14823b8fd710be967390a3bf84fc55c725

Comment by Gerrit Updater [ 09/Jul/20 ]

Alexander Zarochentsev (alexander.zarochentsev@hpe.com) uploaded a new patch: https://review.whamcloud.com/39327
Subject: LU-11290 ldlm: page discard speedup
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 7731839accebd6f1dae77ad7757a074dedbefb0f

Comment by Alexander Zarochentsev [ 13/Jul/20 ]

pages discard speedup from https://review.whamcloud.com/39327 can be seen in a simple test with a 30G cached file:

[root@c-lmo037 ~]# ls -lh /mnt/cslmo17/
total 31G
-rw-r--r-- 1 root root  31G Jun  4 17:11 20G
drwxrwxrwx 2 root root 4.0K Jun  3 18:18 d0.metabench
[root@c-lmo037 ~]# 

no fix:

[root@c-lmo037 ~]# cat /mnt/testfs/20G > /dev/null; time echo 1 >> /mnt/cslmo17/20G 

real	0m9.622s
user	0m0.001s
sys	0m0.000s
[root@c-lmo037 ~]# cat /mnt/testfs/20G > /dev/null; time echo 1 >> /mnt/cslmo17/20G 

real	0m9.421s
user	0m0.000s
sys	0m0.001s
[root@c-lmo037 ~]#

and with the fix:

[root@c-lmo037 ~]# cat /mnt/testfs/20G > /dev/null; time echo 1 >> /mnt/cslmo17/20G 

real	0m7.140s
user	0m0.001s
sys	0m0.000s
[root@c-lmo037 ~]# cat /mnt/testfs/20G > /dev/null; time echo 1 >> /mnt/cslmo17/20G 

real	0m7.240s
user	0m0.000s
sys	0m0.001s
[root@c-lmo037 ~]#

23% less time spent.

Comment by Gerrit Updater [ 19/Nov/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39327/
Subject: LU-11290 ldlm: page discard speedup
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 0f48cd0b9856fe1ea920b8abab3579ded0b9511e

Comment by Peter Jones [ 19/Nov/20 ]

Does the original patch from paf still need to land or can it be abandoned?

Comment by Andreas Dilger [ 26/May/21 ]

I think patch: https://review.whamcloud.com/33089 "LU-11290: Batch gang_lookup cbs" is still useful to land, as it improves a different part of the code.

Comment by Alexander Zarochentsev [ 27/May/21 ]

>Batch gang_lookup cbs" is still useful to land, as it improves a different part of the code.
yes it gives another 5% to lock cancel speed. I will upload it soon.

Comment by Gerrit Updater [ 02/Jun/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33089/
Subject: LU-11290 osc: Batch gang_lookup cbs
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 0d6d0b7bc95a82dee02d35d0a8a41d24692cad45

Comment by Peter Jones [ 02/Jun/21 ]

Landed for 2.15

Generated at Sat Feb 10 02:42:35 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.