Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9906

Allow Lustre page dropping to use pagevec_release

Details

    • 3
    • 9223372036854775807

    Description

      When Lustre releases a lot of cached pages at once, it still calls page_release, instead of pagevec_release. When clearing OST ldlm lock lrus, the ldlm_bl threads end up spending much of their time contending for the zone lock taken by page_release.

      With many namespaces and parallel lru clearing (as Cray does at the end of each job), this can be a significant time sink. Using pagevec release is much better. Patch coming shortly.

      Attachments

        Issue Links

          Activity

            [LU-9906] Allow Lustre page dropping to use pagevec_release

            Landing just the OSD side patch to b2_10 is good here - It was required for some kernel compatibility changes (LU-10565), and is trivial.

            There is no need to land the other patch from this ticket - https://review.whamcloud.com/28667/ LU-9906 clio: use pagevec_release for many pages.  The two patches here are independent, and the clio one is non-trivial.  Not a good candidate for a maintenance branch.

            pfarrell Patrick Farrell (Inactive) added a comment - Landing just the OSD side patch to b2_10 is good here - It was required for some kernel compatibility changes ( LU-10565 ), and is trivial. There is no need to land the other patch from this ticket - https://review.whamcloud.com/28667/   LU-9906  clio: use pagevec_release for many pages.  The two patches here are independent, and the clio one is non-trivial.  Not a good candidate for a maintenance branch.

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33988/
            Subject: LU-9906 osd: use pagevec for putting pages
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set:
            Commit: 76f01221aaf3c4a65a4f1b9af1363838921843a1

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33988/ Subject: LU-9906 osd: use pagevec for putting pages Project: fs/lustre-release Branch: b2_10 Current Patch Set: Commit: 76f01221aaf3c4a65a4f1b9af1363838921843a1

            Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33988
            Subject: LU-9906 osd: use pagevec for putting pages
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set: 1
            Commit: e380923f87494519f8a9281ace0c53054f8aab5c

            gerrit Gerrit Updater added a comment - Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33988 Subject: LU-9906 osd: use pagevec for putting pages Project: fs/lustre-release Branch: b2_10 Current Patch Set: 1 Commit: e380923f87494519f8a9281ace0c53054f8aab5c
            pjones Peter Jones added a comment -

            Landed for 2.12

            pjones Peter Jones added a comment - Landed for 2.12

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/28667/
            Subject: LU-9906 clio: use pagevec_release for many pages
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: b4a959eb61bc7e6a64261c704f3f3f5e220c2f02

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/28667/ Subject: LU-9906 clio: use pagevec_release for many pages Project: fs/lustre-release Branch: master Current Patch Set: Commit: b4a959eb61bc7e6a64261c704f3f3f5e220c2f02

            This is great. It shows that the performance is nearly identical for buffered and unbuffered large reads.

            It would seem like the next big user is osc_lru_alloc(), but it may be that looks like it is taking a lot of time because there is an enforced wait when there are not enough pages. Given that we are very close to peak performance for the reads, it probably makes more sense to focus on improving the write side.

            adilger Andreas Dilger added a comment - This is great. It shows that the performance is nearly identical for buffered and unbuffered large reads. It would seem like the next big user is osc_lru_alloc() , but it may be that looks like it is taking a lot of time because there is an enforced wait when there are not enough pages. Given that we are very close to peak performance for the reads, it probably makes more sense to focus on improving the write side.

            Huh!  Thank you for the detailed look.  I am surprised it's so large with the queued spinlocks, but I'm glad it's helping so much.  Nice find.

            paf Patrick Farrell (Inactive) added a comment - Huh!  Thank you for the detailed look.  I am surprised it's so large with the queued spinlocks, but I'm glad it's helping so much.  Nice find.

            i'm testing on 3.10.0-693.21.1.el7.x86_64.
            please see two attached flamegraph for ior read.
            https://jira.whamcloud.com/secure/attachment/31475/master-read.svg (without patch)
            https://jira.whamcloud.com/secure/attachment/31474/master-patch28667-read.svg (with patch 28667)

            cost reduction at discard_pagevec() is from 57.59% to 17.48% after patch.

            sihara Shuichi Ihara added a comment - i'm testing on 3.10.0-693.21.1.el7.x86_64. please see two attached flamegraph for ior read. https://jira.whamcloud.com/secure/attachment/31475/master-read.svg (without patch) https://jira.whamcloud.com/secure/attachment/31474/master-patch28667-read.svg (with patch 28667) cost reduction at discard_pagevec() is from 57.59% to 17.48% after patch.

            That's really impressive.

            What kernel version are you running there?  I'm curious specifically if you have queued spinlocks.  I haven't looked at lru_reclaim specifically, but the other areas affected by this patch got much better with new kernel versions.  (ie the patch is less important if you have queued spinlocks)

            paf Patrick Farrell (Inactive) added a comment - That's really impressive. What kernel version are you running there?  I'm curious specifically if you have queued spinlocks.  I haven't looked at lru_reclaim specifically, but the other areas affected by this patch got much better with new kernel versions.  (ie the patch is less important if you have queued spinlocks)

            patch https://review.whamcloud.com/#/c/28667 gives huge contributions for single client performance improvements.
            In fact, today, there is a single performance b/w limit if network b/w is higher than IB EDR bandwidth. (e.g. 2 x IB EDR with MR on client)
            This is not LNET/MR problem, but we confirmed this is because of overhead on lru reclaim in CLIO.
            Using pagevec for lru reclaim in addition original patch 28667 shows 32% write and ~60% read performance gains.

            Here is test results.
            I've tested with both 1MB buffered IO and 16MB O_DIRECT to make sure no LNET/MR issue and saturate network bandwith without IO pass of buffered IO.

            1 x client (2 x Intel Platinum 8160 CPU @ 2.10GHz, 192GB Memory)
            
            parameter
            lctl set_param osc.*.max_pages_per_rpc=16M osc.*.max_rpcs_in_flight=16 osc.*.max_dirty_mb=512 osc.*.checksums=0 llite.*.max_read_ahead_mb=2048
            
            IOR command
            mpirun -np 48 ior -w -r -t 16m -b 16g -F -e -vv -o /scratch0/file -i 1 -B (O_DIRECT)
            mpirun -np 48 ior -w -r -t 16m -b 16g -F -e -vv -o /scratch0/file -i 1 (buffered)
            
              mode write(GB/s) read(GB/s)
            master O_DIRECT 20.8 21.8
            master+patch28667 O_DIRECT 20.7 22.2
            master Buffered 11.6 12.3
            master+patch28667 Buffered 15.3 19.6
            sihara Shuichi Ihara added a comment - patch https://review.whamcloud.com/#/c/28667 gives huge contributions for single client performance improvements. In fact, today, there is a single performance b/w limit if network b/w is higher than IB EDR bandwidth. (e.g. 2 x IB EDR with MR on client) This is not LNET/MR problem, but we confirmed this is because of overhead on lru reclaim in CLIO. Using pagevec for lru reclaim in addition original patch 28667 shows 32% write and ~60% read performance gains. Here is test results. I've tested with both 1MB buffered IO and 16MB O_DIRECT to make sure no LNET/MR issue and saturate network bandwith without IO pass of buffered IO. 1 x client (2 x Intel Platinum 8160 CPU @ 2.10GHz, 192GB Memory) parameter lctl set_param osc.*.max_pages_per_rpc=16M osc.*.max_rpcs_in_flight=16 osc.*.max_dirty_mb=512 osc.*.checksums=0 llite.*.max_read_ahead_mb=2048 IOR command mpirun -np 48 ior -w -r -t 16m -b 16g -F -e -vv -o /scratch0/file -i 1 -B (O_DIRECT) mpirun -np 48 ior -w -r -t 16m -b 16g -F -e -vv -o /scratch0/file -i 1 (buffered)   mode write(GB/s) read(GB/s) master O_DIRECT 20.8 21.8 master+patch28667 O_DIRECT 20.7 22.2 master Buffered 11.6 12.3 master+patch28667 Buffered 15.3 19.6

            People

              paf Patrick Farrell (Inactive)
              paf Patrick Farrell (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: