Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12578

obdfilter-survey w/echo-client does not gain from direct-io optimizations

Details

    • Improvement
    • Resolution: Fixed
    • Minor
    • Lustre 2.13.0
    • None
    • None
    • 9223372036854775807

    Description

      During testing on SSD platform this was reported:

      CPU usage is in 95-100% range during obdfilter-survey. The perf trace looks very much like if LU-11347 is not applied but it is and OSS caches are disabled.

      This is due to the lifetime of pages vs the obdfilter/obdecho lifetime:

      dt_bufs_get / osd_bufs_get does allocates pages while
      dt_bufs_put / osd_bufs_put marks the pages up-to-date and recycles them.
      This is the typical pattern for longer lived osd threads.

      For obdecho the threads are short lived and so the pattern matches
      init/get/put/fini meaning the survey may not be a clear representation of actual osd performance.

       

      Attachments

        Issue Links

          Activity

            [LU-12578] obdfilter-survey w/echo-client does not gain from direct-io optimizations
            pjones Peter Jones added a comment -

            Landed for 2.13

            pjones Peter Jones added a comment - Landed for 2.13

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35700/
            Subject: LU-12578 obdecho: reuse an cl env cache for obdecho survey
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 55c33b70c46fde1b62f9852dc361d382a7722009

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35700/ Subject: LU-12578 obdecho: reuse an cl env cache for obdecho survey Project: fs/lustre-release Branch: master Current Patch Set: Commit: 55c33b70c46fde1b62f9852dc361d382a7722009

            When you write "obdfilter" in this ticket, it seems you mean "obdfilter-survey" with evho_client? It is confusing because the code that is now "ofd" was formerly called "obdfilter".

            adilger Andreas Dilger added a comment - When you write "obdfilter" in this ticket, it seems you mean "obdfilter-survey" with evho_client? It is confusing because the code that is now "ofd" was formerly called "obdfilter".

            Shaun Tancheff (stancheff@cray.com) uploaded a new patch: https://review.whamcloud.com/35596
            Subject: LU-12578 obdecho: Reduce CPU needed for obdfilter
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 6466a04bdb9b662afeeb98b33b6afbb09a8c6897

            gerrit Gerrit Updater added a comment - Shaun Tancheff (stancheff@cray.com) uploaded a new patch: https://review.whamcloud.com/35596 Subject: LU-12578 obdecho: Reduce CPU needed for obdfilter Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 6466a04bdb9b662afeeb98b33b6afbb09a8c6897

            a perf snapshot:

                99.84%  [kernel.kallsyms]
                        |
                        |--80.86%--native_queued_spin_lock_slowpath
                        |          |
                        |          |--75.52%--_raw_spin_lock
                        |          |          |
                        |          |          |--36.83%--get_page_from_freelist
                        |          |          |          __alloc_pages_nodemask
                        |          |          |          |
                        |          |          |           --36.73%--osd_bufs_get
                        |          |          |                     ofd_preprw_write.isra.34
                        |          |          |                     ofd_preprw
                        |          |          |                     echo_client_prep_commit.isra.52
                        |          |          |                     echo_client_iocontrol
                        |          |          |                     class_handle_ioctl
                        |          |          |                     obd_class_ioctl
                        |          |          |                     do_vfs_ioctl
                        |          |          |                     ksys_ioctl
                        |          |          |                     __x64_sys_ioctl
                        |          |          |                     do_syscall_64
                        |          |          |                     entry_SYSCALL_64_after_hwframe
                        |          |          |                     __GI___ioctl
                        |          |          |
                        |          |          |--36.40%--free_pcppages_bulk
                        |          |          |          |
                        |          |          |           --36.40%--free_unref_page
                        |          |          |                     |
                        |          |          |                      --36.40%--osd_key_fini
                        |          |          |                                key_fini
                        |          |          |                                keys_fini.part.48
                        |          |          |                                lu_env_fini
                        |          |          |                                echo_client_iocontrol
                        |          |          |                                class_handle_ioctl
                        |          |          |                                obd_class_ioctl
                        |          |          |                                do_vfs_ioctl
            stancheff Shaun Tancheff added a comment - a perf snapshot: 99.84% [kernel.kallsyms] | |--80.86%--native_queued_spin_lock_slowpath | | | |--75.52%--_raw_spin_lock | | | | | |--36.83%--get_page_from_freelist | | | __alloc_pages_nodemask | | | | | | | --36.73%--osd_bufs_get | | | ofd_preprw_write.isra.34 | | | ofd_preprw | | | echo_client_prep_commit.isra.52 | | | echo_client_iocontrol | | | class_handle_ioctl | | | obd_class_ioctl | | | do_vfs_ioctl | | | ksys_ioctl | | | __x64_sys_ioctl | | | do_syscall_64 | | | entry_SYSCALL_64_after_hwframe | | | __GI___ioctl | | | | | |--36.40%--free_pcppages_bulk | | | | | | | --36.40%--free_unref_page | | | | | | | --36.40%--osd_key_fini | | | key_fini | | | keys_fini.part.48 | | | lu_env_fini | | | echo_client_iocontrol | | | class_handle_ioctl | | | obd_class_ioctl | | | do_vfs_ioctl
            stancheff Shaun Tancheff added a comment - - edited

            For obdfilter 'env' doesn't stay around to hold onto the oti_dio_pages[] long enough ... so it doesn't get the advantage of the page pooling there.

            stancheff Shaun Tancheff added a comment - - edited For obdfilter 'env' doesn't stay around to hold onto the oti_dio_pages[] long enough ... so it doesn't get the advantage of the page pooling there.

            well, it's a bit different:

            		if (unlikely(!oti->oti_dio_pages[cur])) {
            			LASSERT(cur < PTLRPC_MAX_BRW_PAGES);
            			page = alloc_page(gfp_mask);
            			if (!page)
            				return NULL;
            			oti->oti_dio_pages[cur] = page;
            

            oti_dio_pages stores allocated pages within @env

            bzzz Alex Zhuravlev added a comment - well, it's a bit different: if (unlikely(!oti->oti_dio_pages[cur])) { LASSERT(cur < PTLRPC_MAX_BRW_PAGES); page = alloc_page(gfp_mask); if (!page) return NULL; oti->oti_dio_pages[cur] = page; oti_dio_pages stores allocated pages within @env

            cray-2-12 / master

            osd_bufs_get()
              -  for (i = 0; i < npages; i+, lnb) { for (i = 0; i < npages; i, lnb+) {
                       lnb->lnb_page = osd_get_page(env, dt, lnb->lnb_file_offset,      gfp_mask);

            osd_get_page()
                 if (osd_use_page_cache(d))

            {           // page from page cache      }

            else

            {          page = *alloc_page*(gfp_mask);      }

            I'll push a patch a little while ... it's just a starting off point, feel free the tear it apart

             

             

            stancheff Shaun Tancheff added a comment - cray-2-12 / master osd_bufs_get ()   -  for (i = 0; i < npages; i+ , lnb ) { for (i = 0; i < npages; i , lnb +) {            lnb->lnb_page = osd_get_page (env, dt, lnb->lnb_file_offset,      gfp_mask); osd_get_page ()      if (osd_use_page_cache(d)) {           // page from page cache      } else {          page = *alloc_page*(gfp_mask);      } I'll push a patch a little while ... it's just a starting off point, feel free the tear it apart    

            why do you think that osd_bufs_get() allocates the pages? what exact version do you use?

            bzzz Alex Zhuravlev added a comment - why do you think that osd_bufs_get() allocates the pages? what exact version do you use?

            People

              stancheff Shaun Tancheff
              stancheff Shaun Tancheff
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: