Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9782

High CPU usage with random IO test.

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: Lustre 2.7.0, Lustre 2.8.0, Lustre 2.9.0, Lustre 2.10.0
    • Fix Version/s: Lustre 2.11.0, Lustre 2.10.2
    • Labels:
    • Environment:
      any lustre with commit 144b5a65c16, and likely before.
    • Severity:
      3
    • Rank (Obsolete):
      9223372036854775807

      Description

      osd-ldiskfs have a several osd_is_mapped calls when extent tree scanned to get decission about error for rewrite case. But extent tree is huge with while random io write test and search is CPU expensive in this case. typical perf output is

      |--75.99%-- rb_next
                  |          |          
                  |          |--94.49%-- ldiskfs_es_find_delayed_extent_range
                  |          |          ldiskfs_fiemap
                  |          |          osd_is_mapped
                  |          |          osd_declare_write_commit
                  |          |          ofd_commitrw_write.isra.32
                  |          |          ofd_commitrw
                  |          |          obd_commitrw.constprop.39
                  |          |          tgt_brw_write
                  |          |          tgt_request_handle
                  |          |          ptlrpc_server_handle_request
                  |          |          ptlrpc_main
                  |          |          kthread
                  |          |          ret_from_fork
                  |          |          
                  |          |--5.49%-- ldiskfs_fiemap
                  |          |          osd_is_mapped
                  |          |          osd_declare_write_commit
                  |          |          ofd_commitrw_write.isra.32
                  |          |          ofd_commitrw
                  |          |          obd_commitrw.constprop.39
                  |          |          tgt_brw_write
                  |          |          tgt_request_handle
                  |          |          ptlrpc_server_handle_request
                  |          |          ptlrpc_main
                  |          |          kthread
                  |          |          ret_from_fork
                  |           --0.02%-- [...]
                  |          
                  |--21.80%-- ldiskfs_es_find_delayed_extent_range
                  |          |          
                  |          |--100.00%-- ldiskfs_fiemap
                  |          |          osd_is_mapped
                  |          |          osd_declare_write_commit
                  |          |          ofd_commitrw_write.isra.32
                  |          |          ofd_commitrw
                  |          |          obd_commitrw.constprop.39
                  |          |          tgt_brw_write
                  |          |          tgt_request_handle
                  |          |          ptlrpc_server_handle_request
                  |          |          ptlrpc_main
                  |          |          kthread
                  |          |          ret_from_fork
                  |           --0.00%-- [...]
      

      avoid second search by caching result on hight bits of lnb_flags which not used in wire increase a performance dramatically

      • without patch with default mkfsoptions: CPU usage 100%‚Äč with 55-60K random 4K writes
      • with the patch: CPU usage 50-60% with 250K random 4K writes

        Attachments

          Activity

            People

            • Assignee:
              wc-triage WC Triage
              Reporter:
              shadow Alexey Lyashkov
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: