[LU-12345] backport - ext4: optimize ext4_find_delalloc_range() in nodelalloc mode Created: 28/May/19  Updated: 01/Nov/22  Resolved: 01/Jun/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.13.0, Lustre 2.12.5

Type: Bug Priority: Major
Reporter: Artem Blagodarenko (Inactive) Assignee: Artem Blagodarenko (Inactive)
Resolution: Fixed Votes: 0
Labels: patch

Issue Links:
Related
is related to LU-12103 Improve block allocation for large pa... Resolved
is related to LU-16286 nodelalloc optimization is missing fo... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

From 8c48f7e88e293b9dd422bd8884842aea85d30b22
..
We found performance regression when using bigalloc with "nodelalloc"
(1MB cluster size):
1. mke2fs -C 1048576 -O ^has_journal,bigalloc /dev/sda
2. mount -o nodelalloc /dev/sda /test/
3. time dd if=/dev/zero of=/test/io bs=1048576 count=1024
The "dd" will cost about 2 seconds to finish, but if we mke2fs without
"bigalloc", "dd" will only cost less than 1 second.
 
 



 Comments   
Comment by Andreas Dilger [ 28/May/19 ]

I don't think osd-ldiskfs is using delalloc? This may affect local ext4 performance but I don't think it will affect Lustre.

Comment by Alex Zhuravlev [ 28/May/19 ]

I remember one reason do not use delalloc is missing suport for ordered writes. i.e. allocation is driven by VM as process of memory cleanup/release, not by JBD which is required to know when specific pages have been commited. this is different from ZFS where data allocation is done as part of commiting TXG.

Comment by Artem Blagodarenko (Inactive) [ 28/May/19 ]

> I don't think osd-ldiskfs is using delalloc? This may affect local ext4 performance but I don't think it will affect Lustre.

During IOR single thread 4K random re-write over 16GB file on one of our machine 
    the CPU usage on the OSS is 100% and resulting IOPS are extremely
    low. Here is a snippet of perf report:
 
   |-89.87%- ldiskfs_es_find_delayed_extent_range
    |          ldiskfs_fiemap
    |          osd_is_mapped
    |          osd_declare_write_commit
    |          ofd_commitrw_write
    |          ofd_commitrw
    |          tgt_brw_write
    |          tgt_request_handle
    |          ptlrpc_server_handle_request
    |          ptlrpc_main
    |          kthread
    |          ret_from_fork

Comment by Artem Blagodarenko (Inactive) [ 28/May/19 ]

adilger,

There is "ext4: optimize ext4_find_delalloc_range() in nodelalloc
    mode" patch that prevents executin
    ldiskfs_es_find_delayed_extent_range() if no delalloc enabled. The
    same optimization also added for ldiskfs_find_delayed_extent()
    function that improve performance dromaticaly.
 
    Here is results of testing on two node system.
    Without the patch:
    avg-cpu:  %user   %nice %system %iowait  %steal   %idle
               0.00    0.00   56.30    0.06    0.00   43.63
 
     Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
     avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
     sds               0.00     0.00    0.00 1174.00     0.00     4.59
     8.00     0.84    0.71    0.00    0.71   0.01   1.20
 
     With patch:
     08/29/2018 01:13:22 AM
     avg-cpu:  %user   %nice %system %iowait  %steal   %idle
                0.00    0.00    4.13   30.37    0.00   65.50
 
    Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s      wMB/s
    avgrq-sz avgqu-sz   await r_await w_await  svctm %util
    sds               0.00     0.00    0.00 54117.82     0.00     211.43
    8.00   152.59    2.82    0.00    2.82   0.02 99.01

Comment by Gerrit Updater [ 28/May/19 ]

Artem Blagodarenko (c17828@cray.com) uploaded a new patch: https://review.whamcloud.com/34982
Subject: LU-12345 ldiskfs: optimize nodelalloc mode
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 7e74f2751f8cfdce2915d9b92259cdfc78d3a8d4

Comment by Andreas Dilger [ 29/May/19 ]

Other than this (obviously significant) performance problems, have you seen other problems with bigalloc? I'm worried that there are hidden problems because the osd-ldiskfs code is dealing with blocks instead of chunks in various places (chunk, C2B, and B2C do not appear anywhere in that code).

Comment by Artem Blagodarenko (Inactive) [ 30/May/19 ]

adilger going to make bigalloc testing in near future.

Comment by Gerrit Updater [ 01/Jun/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34982/
Subject: LU-12345 ldiskfs: optimize nodelalloc mode
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: af48ae8bff289b2bc083a888efeafa3c48df91e2

Comment by Peter Jones [ 01/Jun/19 ]

Landed for 2.13

Comment by Gerrit Updater [ 11/Feb/20 ]

Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37538
Subject: LU-12345 ldiskfs: optimize nodelalloc mode
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: bbf850c5ac55b631081848a67831d23ac4a241a8

Comment by Gerrit Updater [ 14/Apr/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37538/
Subject: LU-12345 ldiskfs: optimize nodelalloc mode
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: f91552d85cd086b37f78524113cadf799c045220

Generated at Sat Feb 10 02:51:45 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.