Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13814

DIO performance: cl_page struct removal for DIO path

Details

    • Improvement
    • Resolution: Unresolved
    • Major
    • None
    • None
    • None
    • 9223372036854775807

    Description

      When doing DIO at ~10 GiB/s (see LU-13798LU-13799), about 70-75% of the time is still spent on working with the cl_page struct.

      This means allocating it, setting it up, and then moving it around & managing it.  We use the cl_page to track the vm pages, and in doing so, we put it on lists and move it from list to list, and update the state of the cl_page... (literally, cl_page_state)

      It's possible to improve this by doing cl_page allocations in batch, this results in roughly a 30% drop in time spent in cl_page work, and makes it possible to get close to 15 GiB/s.

      Fundamentally, none of this is necessary for DIO.  The cl_page struct is for tracking per-page information, but all of the pages in a DIO submission (at the ll_direct_rw_page level) are the same - They have the same owner, the same page flags, they are part of the same stripe...  If we do unaligned DIO, the first and last page can have a starting & ending offset, but that's it, and we can associate that with the DIO itself, not the individual pages.

      So the proposal is to switch from using the cl_page struct to track pages in a DIO, and instead use the array of pages which describes the user buffer (ie, the kiocb and the results of ll_get_user_pages).

      The brw_page member of the cl_page struct seems like it will still be necessary, but this isn't such a big deal - We can allocate those separately, at a fraction of the cost of setting up and managing the full cl_page abstraction.

      Back of the envelope calculations suggest that this would save about 60-75% of the time in submitting DIO in the current optimized path, which performs at 10 GiB/s.

       

      That calculation suggests we could reach single threaded DIO performance in the 25-40 GiB/s range.  Presumably some other issues will prevent hitting such high rates, but I think it is reasonable to think we could reach 20+ GiB/s, with sufficient network hardware.  (We will likely have to accept "idle CPU time in the submitting thread while waiting for the network" as a proxy indicator, since networks in the 30 GiB/s/node range are not readily available for testing.)

      This improvement would of course also apply to buffered i/o via this path (see LU-13805), with the fast buffering version seeing a smaller benefit (but still large).

      This change would also likely make it easier (from a coding perspective) to move the buffer allocation & memcopy() in the ptlrpcd threads, which is a key part of improving the performance of the fast buffering.

      Attachments

        Issue Links

          Activity

            [LU-13814] DIO performance: cl_page struct removal for DIO path

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52139/
            Subject: LU-13814 osc: clarify osc_transfer_pinned usage
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: fc9adb3e8f01cd3e880e2d0e18f50e44fa445a4f

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52139/ Subject: LU-13814 osc: clarify osc_transfer_pinned usage Project: fs/lustre-release Branch: master Current Patch Set: Commit: fc9adb3e8f01cd3e880e2d0e18f50e44fa445a4f

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52136/
            Subject: LU-13814 osc: add osc_dio_submit
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 87d17d0a16ab9ad63f828004df7265c279215772

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52136/ Subject: LU-13814 osc: add osc_dio_submit Project: fs/lustre-release Branch: master Current Patch Set: Commit: 87d17d0a16ab9ad63f828004df7265c279215772

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52113/
            Subject: LU-13814 clio: convert lov submit to cl_dio_pages
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: b32321feef7b1bf4eebe8bb3ea0cc6a945e4a285

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52113/ Subject: LU-13814 clio: convert lov submit to cl_dio_pages Project: fs/lustre-release Branch: master Current Patch Set: Commit: b32321feef7b1bf4eebe8bb3ea0cc6a945e4a285

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52112/
            Subject: LU-13814 clio: rename 'cl_page_completion'
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: f00863ff25b3be68b220dd2c6b6234fd6e3a0d8e

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52112/ Subject: LU-13814 clio: rename 'cl_page_completion' Project: fs/lustre-release Branch: master Current Patch Set: Commit: f00863ff25b3be68b220dd2c6b6234fd6e3a0d8e

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52111/
            Subject: LU-13814 clio: add cl_dio_pages_complete
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 4b6cbdf80bba74d1b226ca4db220ed04d0179534

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52111/ Subject: LU-13814 clio: add cl_dio_pages_complete Project: fs/lustre-release Branch: master Current Patch Set: Commit: 4b6cbdf80bba74d1b226ca4db220ed04d0179534

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52100/
            Subject: LU-13814 clio: add cl_sync_io_note batch
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 304fb2b16d1a4d583f9749b1029d00ff51202adc

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52100/ Subject: LU-13814 clio: add cl_sync_io_note batch Project: fs/lustre-release Branch: master Current Patch Set: Commit: 304fb2b16d1a4d583f9749b1029d00ff51202adc

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52110/
            Subject: LU-13814 lov: add lov dio_pages_init
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 7b0d40ababc08ad91f466015f7b6aa9372a5a37b

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52110/ Subject: LU-13814 lov: add lov dio_pages_init Project: fs/lustre-release Branch: master Current Patch Set: Commit: 7b0d40ababc08ad91f466015f7b6aa9372a5a37b

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/58530/
            Subject: LU-13814 clio: add coo_dio_pages_init
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 2db2c9dcebc67459a31ac3becc0616b50ef6020c

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/58530/ Subject: LU-13814 clio: add coo_dio_pages_init Project: fs/lustre-release Branch: master Current Patch Set: Commit: 2db2c9dcebc67459a31ac3becc0616b50ef6020c

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52109/
            Subject: LU-13814 clio: add cl_dio_pages_init
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 8c0d073c1746329b986afdf303dea7787a6fb42d

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52109/ Subject: LU-13814 clio: add cl_dio_pages_init Project: fs/lustre-release Branch: master Current Patch Set: Commit: 8c0d073c1746329b986afdf303dea7787a6fb42d

            "Shaun Tancheff <shaun.tancheff@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/58765
            Subject: LU-13814 llite: pass cl_dio_pages
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 4ac2278d6d08d379ff7b7f82d778ad67db431dab

            gerrit Gerrit Updater added a comment - "Shaun Tancheff <shaun.tancheff@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/58765 Subject: LU-13814 llite: pass cl_dio_pages Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 4ac2278d6d08d379ff7b7f82d778ad67db431dab

            stancheff , I think one of the recent landings might've caused this:
            https://testing.whamcloud.com/test_sets/1755e4e7-0cbf-4af1-89a7-16a3dd0761c0 / https://testing.whamcloud.com/test_logs/10dc91ae-7f31-4cbe-bd3e-63d3aa4cc7f7/show_text

            [33482.829916]  ? __die_body+0x1a/0x60
            [33482.830454]  ? page_fault_oops+0x131/0x540
            [33482.831014]  ? fixup_exception+0x22/0x310
            [33482.831553]  ? exc_page_fault+0x69/0x150
            [33482.832117]  ? asm_exc_page_fault+0x22/0x30
            [33482.832705]  ? ll_release_user_pages+0x15/0x100 [obdclass 5c9d0d0aacec9e40a066e5918d9b7ca5a10175b8]
            [33482.833843]  cl_sub_dio_end+0x221/0x490 [obdclass 5c9d0d0aacec9e40a066e5918d9b7ca5a10175b8]
            [33482.834923]  ? __pfx_cl_sub_dio_end+0x10/0x10 [obdclass 5c9d0d0aacec9e40a066e5918d9b7ca5a10175b8]
            [33482.836036]  cl_sync_io_note+0x158/0x2a0 [obdclass 5c9d0d0aacec9e40a066e5918d9b7ca5a10175b8]
            [33482.837114]  ll_direct_IO+0xa3a/0xdd0 [lustre daee26eae67f7564cd7c4c99151c3ecf00837ae4]
            [33482.838378]  ? atime_needs_update+0xa3/0x110
            [33482.838961]  ? touch_atime+0x34/0x150
            [33482.839472]  generic_file_read_iter+0x87/0x120
            [33482.840089]  vvp_io_read_start+0x6c2/0x8a0 [lustre daee26eae67f7564cd7c4c99151c3ecf00837ae4]
            [33482.841162]  cl_io_start+0x70/0x140 [obdclass 5c9d0d0aacec9e40a066e5918d9b7ca5a10175b8]
            [33482.842185]  cl_io_loop+0x9e/0x230 [obdclass 5c9d0d0aacec9e40a066e5918d9b7ca5a10175b8]
            [33482.843198]  ? ll_cl_add+0x95/0x100 [lustre daee26eae67f7564cd7c4c99151c3ecf00837ae4]
            [33482.844186]  ll_file_io_generic+0xa20/0x10a0 [lustre daee26eae67f7564cd7c4c99151c3ecf00837ae4]
            [33482.845262]  do_file_read_iter+0xd2c/0x1050 [lustre daee26eae67f7564cd7c4c99151c3ecf00837ae4]
            [33482.846319]  __kernel_read+0xf0/0x280
            [33482.846857]  pcc_attach_data_archive+0x432/0xb70 [lustre daee26eae67f7564cd7c4c99151c3ecf00837ae4]
            [33482.847975]  pcc_readonly_attach+0x4c0/0xd90 [lustre daee26eae67f7564cd7c4c99151c3ecf00837ae4]
            [33482.849052]  ? pcc_readonly_attach_sync+0x1d3/0x2c0 [lustre daee26eae67f7564cd7c4c99151c3ecf00837ae4]
            [33482.850201]  pcc_readonly_attach_sync+0x1d3/0x2c0 [lustre daee26eae67f7564cd7c4c99151c3ecf00837ae4]
            [33482.851313]  pcc_file_open+0x9c4/0x1040 [lustre daee26eae67f7564cd7c4c99151c3ecf00837ae4]
            [33482.852340]  ll_atomic_open+0x968/0x9c0 [lustre daee26eae67f7564cd7c4c99151c3ecf00837ae4]
            [33482.853371]  ? __d_lookup+0x72/0xb0
            [33482.853855]  path_openat+0x644/0x1050
            [33482.854374]  do_filp_open+0xc5/0x140
            [33482.854867]  ? kmem_cache_alloc+0x18a/0x340
            [33482.855445]  ? getname_flags+0x46/0x1e0
            [33482.855962]  ? do_sys_openat2+0x248/0x320
            [33482.856508]  do_sys_openat2+0x248/0x320
            [33482.857031]  do_sys_open+0x57/0x80
            [33482.857508]  do_syscall_64+0x5b/0x80
            [33482.858022]  ? handle_mm_fault+0x196/0x2f0
            [33482.858590]  ? do_user_addr_fault+0x267/0x890
            [33482.859171]  ? exc_page_fault+0x69/0x150
            [33482.859699]  entry_SYSCALL_64_after_hwframe+0x7c/0xe6 

            I know the recent patches are from me, but you know the user/kernel iovec handling much better than I do - could you take a look?

            paf0186 Patrick Farrell added a comment - stancheff , I think one of the recent landings might've caused this: https://testing.whamcloud.com/test_sets/1755e4e7-0cbf-4af1-89a7-16a3dd0761c0 / https://testing.whamcloud.com/test_logs/10dc91ae-7f31-4cbe-bd3e-63d3aa4cc7f7/show_text [33482.829916] ? __die_body+0x1a/0x60 [33482.830454] ? page_fault_oops+0x131/0x540 [33482.831014] ? fixup_exception+0x22/0x310 [33482.831553] ? exc_page_fault+0x69/0x150 [33482.832117] ? asm_exc_page_fault+0x22/0x30 [33482.832705] ? ll_release_user_pages+0x15/0x100 [obdclass 5c9d0d0aacec9e40a066e5918d9b7ca5a10175b8] [33482.833843] cl_sub_dio_end+0x221/0x490 [obdclass 5c9d0d0aacec9e40a066e5918d9b7ca5a10175b8] [33482.834923] ? __pfx_cl_sub_dio_end+0x10/0x10 [obdclass 5c9d0d0aacec9e40a066e5918d9b7ca5a10175b8] [33482.836036] cl_sync_io_note+0x158/0x2a0 [obdclass 5c9d0d0aacec9e40a066e5918d9b7ca5a10175b8] [33482.837114] ll_direct_IO+0xa3a/0xdd0 [lustre daee26eae67f7564cd7c4c99151c3ecf00837ae4] [33482.838378] ? atime_needs_update+0xa3/0x110 [33482.838961] ? touch_atime+0x34/0x150 [33482.839472] generic_file_read_iter+0x87/0x120 [33482.840089] vvp_io_read_start+0x6c2/0x8a0 [lustre daee26eae67f7564cd7c4c99151c3ecf00837ae4] [33482.841162] cl_io_start+0x70/0x140 [obdclass 5c9d0d0aacec9e40a066e5918d9b7ca5a10175b8] [33482.842185] cl_io_loop+0x9e/0x230 [obdclass 5c9d0d0aacec9e40a066e5918d9b7ca5a10175b8] [33482.843198] ? ll_cl_add+0x95/0x100 [lustre daee26eae67f7564cd7c4c99151c3ecf00837ae4] [33482.844186] ll_file_io_generic+0xa20/0x10a0 [lustre daee26eae67f7564cd7c4c99151c3ecf00837ae4] [33482.845262] do_file_read_iter+0xd2c/0x1050 [lustre daee26eae67f7564cd7c4c99151c3ecf00837ae4] [33482.846319] __kernel_read+0xf0/0x280 [33482.846857] pcc_attach_data_archive+0x432/0xb70 [lustre daee26eae67f7564cd7c4c99151c3ecf00837ae4] [33482.847975] pcc_readonly_attach+0x4c0/0xd90 [lustre daee26eae67f7564cd7c4c99151c3ecf00837ae4] [33482.849052] ? pcc_readonly_attach_sync+0x1d3/0x2c0 [lustre daee26eae67f7564cd7c4c99151c3ecf00837ae4] [33482.850201] pcc_readonly_attach_sync+0x1d3/0x2c0 [lustre daee26eae67f7564cd7c4c99151c3ecf00837ae4] [33482.851313] pcc_file_open+0x9c4/0x1040 [lustre daee26eae67f7564cd7c4c99151c3ecf00837ae4] [33482.852340] ll_atomic_open+0x968/0x9c0 [lustre daee26eae67f7564cd7c4c99151c3ecf00837ae4] [33482.853371] ? __d_lookup+0x72/0xb0 [33482.853855] path_openat+0x644/0x1050 [33482.854374] do_filp_open+0xc5/0x140 [33482.854867] ? kmem_cache_alloc+0x18a/0x340 [33482.855445] ? getname_flags+0x46/0x1e0 [33482.855962] ? do_sys_openat2+0x248/0x320 [33482.856508] do_sys_openat2+0x248/0x320 [33482.857031] do_sys_open+0x57/0x80 [33482.857508] do_syscall_64+0x5b/0x80 [33482.858022] ? handle_mm_fault+0x196/0x2f0 [33482.858590] ? do_user_addr_fault+0x267/0x890 [33482.859171] ? exc_page_fault+0x69/0x150 [33482.859699] entry_SYSCALL_64_after_hwframe+0x7c/0xe6 I know the recent patches are from me, but you know the user/kernel iovec handling much better than I do - could you take a look?

            People

              paf0186 Patrick Farrell
              paf0186 Patrick Farrell
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated: