Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12345

backport - ext4: optimize ext4_find_delalloc_range() in nodelalloc mode

Details

    • 3
    • 9223372036854775807

    Description

      From 8c48f7e88e293b9dd422bd8884842aea85d30b22
      ..
      We found performance regression when using bigalloc with "nodelalloc"
      (1MB cluster size):
      1. mke2fs -C 1048576 -O ^has_journal,bigalloc /dev/sda
      2. mount -o nodelalloc /dev/sda /test/
      3. time dd if=/dev/zero of=/test/io bs=1048576 count=1024
      The "dd" will cost about 2 seconds to finish, but if we mke2fs without
      "bigalloc", "dd" will only cost less than 1 second.
       
       

      Attachments

        Issue Links

          Activity

            [LU-12345] backport - ext4: optimize ext4_find_delalloc_range() in nodelalloc mode
            pjones Peter Jones added a comment -

            Landed for 2.13

            pjones Peter Jones added a comment - Landed for 2.13

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34982/
            Subject: LU-12345 ldiskfs: optimize nodelalloc mode
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: af48ae8bff289b2bc083a888efeafa3c48df91e2

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34982/ Subject: LU-12345 ldiskfs: optimize nodelalloc mode Project: fs/lustre-release Branch: master Current Patch Set: Commit: af48ae8bff289b2bc083a888efeafa3c48df91e2

            adilger going to make bigalloc testing in near future.

            artem_blagodarenko Artem Blagodarenko (Inactive) added a comment - adilger going to make bigalloc testing in near future.

            Other than this (obviously significant) performance problems, have you seen other problems with bigalloc? I'm worried that there are hidden problems because the osd-ldiskfs code is dealing with blocks instead of chunks in various places (chunk, C2B, and B2C do not appear anywhere in that code).

            adilger Andreas Dilger added a comment - Other than this (obviously significant) performance problems, have you seen other problems with bigalloc ? I'm worried that there are hidden problems because the osd-ldiskfs code is dealing with blocks instead of chunks in various places ( chunk , C2B , and B2C do not appear anywhere in that code).

            Artem Blagodarenko (c17828@cray.com) uploaded a new patch: https://review.whamcloud.com/34982
            Subject: LU-12345 ldiskfs: optimize nodelalloc mode
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 7e74f2751f8cfdce2915d9b92259cdfc78d3a8d4

            gerrit Gerrit Updater added a comment - Artem Blagodarenko (c17828@cray.com) uploaded a new patch: https://review.whamcloud.com/34982 Subject: LU-12345 ldiskfs: optimize nodelalloc mode Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 7e74f2751f8cfdce2915d9b92259cdfc78d3a8d4

            adilger,

            There is "ext4: optimize ext4_find_delalloc_range() in nodelalloc
                mode" patch that prevents executin
                ldiskfs_es_find_delayed_extent_range() if no delalloc enabled. The
                same optimization also added for ldiskfs_find_delayed_extent()
                function that improve performance dromaticaly.
             
                Here is results of testing on two node system.
                Without the patch:
                avg-cpu:  %user   %nice %system %iowait  %steal   %idle
                           0.00    0.00   56.30    0.06    0.00   43.63
             
                 Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
                 avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
                 sds               0.00     0.00    0.00 1174.00     0.00     4.59
                 8.00     0.84    0.71    0.00    0.71   0.01   1.20
             
                 With patch:
                 08/29/2018 01:13:22 AM
                 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
                            0.00    0.00    4.13   30.37    0.00   65.50
             
                Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s      wMB/s
                avgrq-sz avgqu-sz   await r_await w_await  svctm %util
                sds               0.00     0.00    0.00 54117.82     0.00     211.43
                8.00   152.59    2.82    0.00    2.82   0.02 99.01

            artem_blagodarenko Artem Blagodarenko (Inactive) added a comment - adilger , There is "ext4: optimize ext4_find_delalloc_range() in nodelalloc     mode" patch that prevents executin     ldiskfs_es_find_delayed_extent_range() if no delalloc enabled. The     same optimization also added for ldiskfs_find_delayed_extent()     function that improve performance dromaticaly.       Here is results of testing on two node system.     Without the patch:     avg-cpu:  %user   %nice %system %iowait  %steal   %idle                0.00    0.00   56.30    0.06    0.00   43.63        Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s      avgrq-sz avgqu-sz   await r_await w_await  svctm  %util      sds               0.00     0.00    0.00 1174.00     0.00     4.59      8.00     0.84    0.71    0.00    0.71   0.01   1.20        With patch:      08/29/2018 01:13:22 AM      avg-cpu:  %user   %nice %system %iowait  %steal   %idle                 0.00    0.00    4.13   30.37    0.00   65.50       Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s      wMB/s     avgrq-sz avgqu-sz   await r_await w_await  svctm %util     sds               0.00     0.00    0.00 54117.82     0.00     211.43     8.00   152.59    2.82    0.00    2.82   0.02 99.01

            > I don't think osd-ldiskfs is using delalloc? This may affect local ext4 performance but I don't think it will affect Lustre.

            During IOR single thread 4K random re-write over 16GB file on one of our machine 
                the CPU usage on the OSS is 100% and resulting IOPS are extremely
                low. Here is a snippet of perf report:
             
               |-89.87%- ldiskfs_es_find_delayed_extent_range
                |          ldiskfs_fiemap
                |          osd_is_mapped
                |          osd_declare_write_commit
                |          ofd_commitrw_write
                |          ofd_commitrw
                |          tgt_brw_write
                |          tgt_request_handle
                |          ptlrpc_server_handle_request
                |          ptlrpc_main
                |          kthread
                |          ret_from_fork

            artem_blagodarenko Artem Blagodarenko (Inactive) added a comment - > I don't think osd-ldiskfs is using delalloc? This may affect local ext4 performance but I don't think it will affect Lustre. During IOR single thread 4K random re-write over 16GB file on one of our machine      the CPU usage on the OSS is 100% and resulting IOPS are extremely     low. Here is a snippet of perf report:      |- 89.87% - ldiskfs_es_find_delayed_extent_range     |          ldiskfs_fiemap     |          osd_is_mapped     |          osd_declare_write_commit     |          ofd_commitrw_write     |          ofd_commitrw     |          tgt_brw_write     |          tgt_request_handle     |          ptlrpc_server_handle_request     |          ptlrpc_main     |          kthread     |          ret_from_fork

            I remember one reason do not use delalloc is missing suport for ordered writes. i.e. allocation is driven by VM as process of memory cleanup/release, not by JBD which is required to know when specific pages have been commited. this is different from ZFS where data allocation is done as part of commiting TXG.

            bzzz Alex Zhuravlev added a comment - I remember one reason do not use delalloc is missing suport for ordered writes. i.e. allocation is driven by VM as process of memory cleanup/release, not by JBD which is required to know when specific pages have been commited. this is different from ZFS where data allocation is done as part of commiting TXG.

            I don't think osd-ldiskfs is using delalloc? This may affect local ext4 performance but I don't think it will affect Lustre.

            adilger Andreas Dilger added a comment - I don't think osd-ldiskfs is using delalloc? This may affect local ext4 performance but I don't think it will affect Lustre.

            People

              artem_blagodarenko Artem Blagodarenko (Inactive)
              artem_blagodarenko Artem Blagodarenko (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: