[LU-9906] Allow Lustre page dropping to use pagevec_release Created: 23/Aug/17  Updated: 19/Dec/19  Resolved: 21/Nov/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.12.0, Lustre 2.10.7

Type: Bug Priority: Minor
Reporter: Patrick Farrell (Inactive) Assignee: Patrick Farrell (Inactive)
Resolution: Fixed Votes: 0
Labels: performance

Attachments: File master-patch28667-read.svg     File master-read.svg    
Issue Links:
Related
is related to LU-9920 Use pagevec for marking pages dirty Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

When Lustre releases a lot of cached pages at once, it still calls page_release, instead of pagevec_release. When clearing OST ldlm lock lrus, the ldlm_bl threads end up spending much of their time contending for the zone lock taken by page_release.

With many namespaces and parallel lru clearing (as Cray does at the end of each job), this can be a significant time sink. Using pagevec release is much better. Patch coming shortly.



 Comments   
Comment by Gerrit Updater [ 23/Aug/17 ]

Patrick Farrell (paf@cray.com) uploaded a new patch: https://review.whamcloud.com/28667
Subject: LU-9906 clio: use pagevec_release for many pages
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: da6033fa0d989c5a8ff5a0bf9d1a8d1f4350a0b1

Comment by Patrick Farrell (Inactive) [ 21/Sep/17 ]

Quoting Andreas in LU-9920:
"Patrick, a similar issue exists when pages are dropped from cache upon lock cancellation. It would be useful to clean this up to use invalidate_page_range() or similar to drop pages from cache (at least in stripe_size chunks) instead of doing it one page at a time as it does today."

Invalidate_page_range is something else, but I think this does what you're talking about. I don't think we can drop pages in such large chunks, pagevec_release is the best I'm aware of without writing our own. (And I wonder about holding the relevant lock long enough to drop stripe_size chunks.)

Comment by Gerrit Updater [ 14/Dec/17 ]

Patrick Farrell (paf@cray.com) uploaded a new patch: https://review.whamcloud.com/30531
Subject: LU-9906 osd: use pagevec for putting pages
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: f48a740f3bad58ffda44d268454775a4fd26d5a6

Comment by Gerrit Updater [ 14/Feb/18 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30531/
Subject: LU-9906 osd: use pagevec for putting pages
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 2a2adfd04245a24148d8de29b8558cd98c92bffa

Comment by Shuichi Ihara [ 15/Nov/18 ]

patch https://review.whamcloud.com/#/c/28667 gives huge contributions for single client performance improvements.
In fact, today, there is a single performance b/w limit if network b/w is higher than IB EDR bandwidth. (e.g. 2 x IB EDR with MR on client)
This is not LNET/MR problem, but we confirmed this is because of overhead on lru reclaim in CLIO.
Using pagevec for lru reclaim in addition original patch 28667 shows 32% write and ~60% read performance gains.

Here is test results.
I've tested with both 1MB buffered IO and 16MB O_DIRECT to make sure no LNET/MR issue and saturate network bandwith without IO pass of buffered IO.

1 x client (2 x Intel Platinum 8160 CPU @ 2.10GHz, 192GB Memory)

parameter
lctl set_param osc.*.max_pages_per_rpc=16M osc.*.max_rpcs_in_flight=16 osc.*.max_dirty_mb=512 osc.*.checksums=0 llite.*.max_read_ahead_mb=2048

IOR command
mpirun -np 48 ior -w -r -t 16m -b 16g -F -e -vv -o /scratch0/file -i 1 -B (O_DIRECT)
mpirun -np 48 ior -w -r -t 16m -b 16g -F -e -vv -o /scratch0/file -i 1 (buffered)
  mode write(GB/s) read(GB/s)
master O_DIRECT 20.8 21.8
master+patch28667 O_DIRECT 20.7 22.2
master Buffered 11.6 12.3
master+patch28667 Buffered 15.3 19.6
Comment by Patrick Farrell (Inactive) [ 15/Nov/18 ]

That's really impressive.

What kernel version are you running there?  I'm curious specifically if you have queued spinlocks.  I haven't looked at lru_reclaim specifically, but the other areas affected by this patch got much better with new kernel versions.  (ie the patch is less important if you have queued spinlocks)

Comment by Shuichi Ihara [ 15/Nov/18 ]

i'm testing on 3.10.0-693.21.1.el7.x86_64.
please see two attached flamegraph for ior read.
https://jira.whamcloud.com/secure/attachment/31475/master-read.svg (without patch)
https://jira.whamcloud.com/secure/attachment/31474/master-patch28667-read.svg (with patch 28667)

cost reduction at discard_pagevec() is from 57.59% to 17.48% after patch.

Comment by Patrick Farrell (Inactive) [ 15/Nov/18 ]

Huh!  Thank you for the detailed look.  I am surprised it's so large with the queued spinlocks, but I'm glad it's helping so much.  Nice find.

Comment by Andreas Dilger [ 15/Nov/18 ]

This is great. It shows that the performance is nearly identical for buffered and unbuffered large reads.

It would seem like the next big user is osc_lru_alloc(), but it may be that looks like it is taking a lot of time because there is an enforced wait when there are not enough pages. Given that we are very close to peak performance for the reads, it probably makes more sense to focus on improving the write side.

Comment by Gerrit Updater [ 21/Nov/18 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/28667/
Subject: LU-9906 clio: use pagevec_release for many pages
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: b4a959eb61bc7e6a64261c704f3f3f5e220c2f02

Comment by Peter Jones [ 21/Nov/18 ]

Landed for 2.12

Comment by Gerrit Updater [ 08/Jan/19 ]

Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33988
Subject: LU-9906 osd: use pagevec for putting pages
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: e380923f87494519f8a9281ace0c53054f8aab5c

Comment by Gerrit Updater [ 15/Feb/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33988/
Subject: LU-9906 osd: use pagevec for putting pages
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: 76f01221aaf3c4a65a4f1b9af1363838921843a1

Comment by Patrick Farrell (Inactive) [ 15/Feb/19 ]

Landing just the OSD side patch to b2_10 is good here - It was required for some kernel compatibility changes (LU-10565), and is trivial.

There is no need to land the other patch from this ticket - https://review.whamcloud.com/28667/ LU-9906 clio: use pagevec_release for many pages.  The two patches here are independent, and the clio one is non-trivial.  Not a good candidate for a maintenance branch.

Generated at Sat Feb 10 02:30:21 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.