[LU-16433] single client performance regression in SSF workload Created: 24/Dec/22 Updated: 18/Feb/23 Resolved: 03/Jan/23 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.15.2 |
| Fix Version/s: | Lustre 2.16.0, Lustre 2.15.2 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Shuichi Ihara | Assignee: | Jian Yu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Lustre-2.15.2, Rokeylinux 8.6 (4.18.0-372.32.1.el8_6.x86_64), OFED-5.4-3.6.8.1 |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
a client performance regression was found in 2.15.2-RC1 (commit:e21498bcaa). # mpirun -np 16 ior -a POSIX -i 1 -d 10 -w -r -b 16g -t 1m -C -Q 17 -e -vv -o //exafs/d0/d1/d2/ost_stripe/file lustre-2.15.1 access bw(MiB/s) IOPS Latency(s) block(KiB) xfer(KiB) open(s) wr/rd(s) close(s) total(s) iter ------ --------- ---- ---------- ---------- --------- -------- -------- -------- -------- ---- write 2489.25 2489.28 0.006428 16777216 1024.00 0.000936 105.31 0.000238 105.31 0 read 4176 4176 0.003803 16777216 1024.00 0.001695 62.77 3.92 62.77 0 write 2423.58 2423.60 0.006452 16777216 1024.00 0.000586 108.16 2.45 108.16 1 read 4197 4197 0.003652 16777216 1024.00 0.001982 62.46 3.98 62.46 1 write 2502.32 2502.34 0.006375 16777216 1024.00 0.000404 104.76 0.305282 104.76 2 read 4211 4211 0.003683 16777216 1024.00 0.001679 62.25 3.99 62.25 2 Max Write: 2502.32 MiB/sec (2623.88 MB/sec) Max Read: 4211.19 MiB/sec (4415.75 MB/sec) lustre-2.15.2-RC1 access bw(MiB/s) IOPS Latency(s) block(KiB) xfer(KiB) open(s) wr/rd(s) close(s) total(s) iter ------ --------- ---- ---------- ---------- --------- -------- -------- -------- -------- ---- write 2103.65 2103.68 0.007142 16777216 1024.00 0.001769 124.61 7.60 124.61 0 read 4204 4204 0.003159 16777216 1024.00 0.001461 62.35 10.59 62.35 0 write 2169.58 2169.69 0.006903 16777216 1024.00 0.000912 120.82 7.72 120.83 1 read 4282 4282 0.003722 16777216 1024.00 0.137671 61.22 2.78 61.22 1 write 2133.24 2133.25 0.007500 16777216 1024.00 0.000380 122.88 3.60 122.89 2 read 4088 4088 0.003689 16777216 1024.00 0.001053 64.13 3.68 64.13 2 Max Write: 2169.58 MiB/sec (2274.97 MB/sec) Max Read: 4282.19 MiB/sec (4490.20 MB/sec) it is ~14% performance regression in 2.15.2-RC1 compared to lustre-2.15.1. After investigations and 'git bisect' tells us "commit: [6d4559f6b948a93aaf5e94c4eb47cd9ebcf7ba95] Here is another test result after revered patch " lustre-2.15.2-RC1 + reverted commit:6d4559f6b9 ( access bw(MiB/s) IOPS Latency(s) block(KiB) xfer(KiB) open(s) wr/rd(s) close(s) total(s) iter ------ --------- ---- ---------- ---------- --------- -------- -------- -------- -------- ---- write 2497.41 2497.44 0.006407 16777216 1024.00 0.001115 104.97 0.000791 104.97 0 read 4217 4217 0.003773 16777216 1024.00 0.001680 62.16 3.37 62.16 0 write 2471.13 2471.14 0.006475 16777216 1024.00 0.000375 106.08 0.000292 106.08 1 read 4083 4083 0.003765 16777216 1024.00 0.001659 64.20 3.23 64.20 1 write 2457.91 2457.92 0.006509 16777216 1024.00 0.000412 106.65 0.010367 106.65 2 read 4163 4163 0.003771 16777216 1024.00 0.001909 62.97 6.35 62.97 2 Max Write: 2497.41 MiB/sec (2618.72 MB/sec) Max Read: 4217.39 MiB/sec (4422.25 MB/sec) |
| Comments |
| Comment by Peter Jones [ 24/Dec/22 ] |
|
Jian Is this something that can be avoided in the Peter |
| Comment by Jian Yu [ 24/Dec/22 ] |
|
In patch https://review.whamcloud.com/47924 (" lustre/llite/vvp_internal.h -#ifndef HAVE_ACCOUNT_PAGE_DIRTIED_EXPORT +#if !defined(HAVE_ACCOUNT_PAGE_DIRTIED_EXPORT) || \ +defined(HAVE_KALLSYMS_LOOKUP_NAME) extern unsigned int (*vvp_account_page_dirtied)(struct page *page, struct address_space *mapping); #endif lustre/llite/vvp_io.c /* kernels without HAVE_KALLSYMS_LOOKUP_NAME also don't have account_dirty_page * exported, and if we can't access that symbol, we can't do page dirtying in * batch (taking the xarray lock only once) so we just fall back to a looped * call to __set_page_dirty_nobuffers */ #ifndef HAVE_KALLSYMS_LOOKUP_NAME for (i = 0; i < count; i++) __set_page_dirty_nobuffers(pvec->pages[i]); #else + /* + * In kernel 5.14.21, kallsyms_lookup_name is defined but + * account_page_dirtied is not exported. + */ + if (!vvp_account_page_dirtied) { + for (i = 0; i < count; i++) + __set_page_dirty_nobuffers(pvec->pages[i]); + goto end; + } + In Rocky Linux 8.6 kernel 4.18.0-372.32.1.el8_6.x86_64, both account_page_dirtied and kallsyms_lookup_name are exported. So, I need to change the check of vvp_account_page_dirtied to HAVE_ACCOUNT_PAGE_DIRTIED_EXPORT. This can resolved the client performance regression issue on Rocky Linux 8.6. |
| Comment by Gerrit Updater [ 25/Dec/22 ] |
|
"Jian Yu <yujian@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49512 |
| Comment by Shuichi Ihara [ 27/Dec/22 ] |
|
confirmed that patch https://review.whamcloud.com/c/fs/lustre-release/+/49512 solved problem and performance was back. access bw(MiB/s) IOPS Latency(s) block(KiB) xfer(KiB) open(s) wr/rd(s) close(s) total(s) iter ------ --------- ---- ---------- ---------- --------- -------- -------- -------- -------- ---- write 2440.73 2440.76 0.006555 16777216 1024.00 0.001139 107.40 0.000249 107.40 0 read 4027 4027 0.003897 16777216 1024.00 0.001635 65.09 3.60 65.09 0 write 2427.14 2427.15 0.006584 16777216 1024.00 0.000384 108.00 0.126996 108.01 1 read 4132 4132 0.003715 16777216 1024.00 0.001663 63.44 5.11 63.44 1 write 2421.75 2421.76 0.006581 16777216 1024.00 0.000384 108.25 1.39 108.25 2 read 4082 4082 0.003875 16777216 1024.00 0.001668 64.22 3.72 64.22 2 Max Write: 2440.73 MiB/sec (2559.29 MB/sec) Max Read: 4132.11 MiB/sec (4332.83 MB/sec) |
| Comment by Gerrit Updater [ 28/Dec/22 ] |
|
"Xing Huang <hxing@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49520 |
| Comment by Gerrit Updater [ 03/Jan/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49512/ |
| Comment by Gerrit Updater [ 03/Jan/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49520/ |
| Comment by Peter Jones [ 03/Jan/23 ] |
|
Landed for 2.16 |
| Comment by Andreas Dilger [ 16/Feb/23 ] |
|
Should this issue be re-opened to investigate/address the performance loss for newer kernels? I don't think it is only SLES15sp4 that is affected, but any kernel since Linux 5.2 where account_page_dirtied() is not exported, like Ubuntu 22.04, RHEL9.x. The patch landed here defers this problem while kallsyms_lookup_name() can work around that lack, but that is also removed in newer kernels. There should be some way that we can work with the new page cache more efficiently for large page ranges, since that is what xarray and folios are supposed to be for... |
| Comment by Patrick Farrell [ 16/Feb/23 ] |
|
We could re-open it, but as it stands, xarray is just a re-API of the radix tree, and non-single-page-folios aren't supported in the page cache yet. Setting folios aside, last I checked, the operations we'd need to do much in batch aren't exported. At the very least, my focus is on the DIO stuff - I'm more interested in pushing buffered I/O through the DIO path once unaligned support is fully working. That would offer much larger gains. (Not that it's not worth working on the buffered path, but ...) So re-opening is probably a decent idea, but I wouldn't prioritize it. |
| Comment by Patrick Farrell [ 17/Feb/23 ] |
|
sihara , whether we re-open this or not, be aware this problem exists in Linux 5.2 and newer (and there is no obvious way to fix it). So, as Andreas said, Ubuntu 22.04 + and RHEL9. |
| Comment by Shaun Tancheff [ 17/Feb/23 ] |
|
I would note that 2.15.2-RC1 does not have |
| Comment by Patrick Farrell [ 17/Feb/23 ] |
|
Shaun, I don't totally understand your question - The performance regression is about whether or not we have access to the necessary symbols to do things in batch. This patch fixes it for some 'intermediate' kernels, where we can still use kallsyms_lookup_name() to find non-exported symbols, but that's gone in newer kernels. So we know exactly why the regression is occurring and where it's occurring. If HPE is interested in avoiding the regression on intermediate kernels for 2.15, you could push the patch to b2_15 and I think we'd be happy to land it. But we have no solution for the latest kernels. |
| Comment by Andreas Dilger [ 18/Feb/23 ] |
|
Shaun, I see patch https://review.whamcloud.com/49520 " |
| Comment by Shaun Tancheff [ 18/Feb/23 ] |
|
Sorry, I didn't read through the collapsed comments. Patrick is correct. Post removal of kallsyms* we do not have a way to acquire the account_page_dirtied / folio_account_dirtied directly. On the plus side it looks like we might be able to 'vectorize' folio_account_dirtied and provide a local vvp_account_dirtied_folios() for those kernels. There now a vvp_set_folio_dirty_batched() under LU-16577 that may be useful. |