Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16433

single client performance regression in SSF workload

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0, Lustre 2.15.2
    • Lustre 2.15.2
    • None
    • Lustre-2.15.2, Rokeylinux 8.6 (4.18.0-372.32.1.el8_6.x86_64), OFED-5.4-3.6.8.1
    • 3
    • 9223372036854775807

    Description

      a client performance regression was found in 2.15.2-RC1 (commit:e21498bcaa).
      Tested workload is single client and SSF(single shared file) from 16 processes.

      # mpirun -np 16 ior -a POSIX -i 1 -d 10 -w -r -b 16g -t 1m -C -Q 17 -e -vv -o //exafs/d0/d1/d2/ost_stripe/file 
      

      lustre-2.15.1

      access    bw(MiB/s)  IOPS       Latency(s)  block(KiB) xfer(KiB)  open(s)    wr/rd(s)   close(s)   total(s)   iter
      ------    ---------  ----       ----------  ---------- ---------  --------   --------   --------   --------   ----
      write     2489.25    2489.28    0.006428    16777216   1024.00    0.000936   105.31     0.000238   105.31     0   
      read      4176       4176       0.003803    16777216   1024.00    0.001695   62.77      3.92       62.77      0   
      write     2423.58    2423.60    0.006452    16777216   1024.00    0.000586   108.16     2.45       108.16     1   
      read      4197       4197       0.003652    16777216   1024.00    0.001982   62.46      3.98       62.46      1   
      write     2502.32    2502.34    0.006375    16777216   1024.00    0.000404   104.76     0.305282   104.76     2   
      read      4211       4211       0.003683    16777216   1024.00    0.001679   62.25      3.99       62.25      2   
      
      Max Write: 2502.32 MiB/sec (2623.88 MB/sec)
      Max Read:  4211.19 MiB/sec (4415.75 MB/sec)
      

      lustre-2.15.2-RC1

      access    bw(MiB/s)  IOPS       Latency(s)  block(KiB) xfer(KiB)  open(s)    wr/rd(s)   close(s)   total(s)   iter
      ------    ---------  ----       ----------  ---------- ---------  --------   --------   --------   --------   ----
      write     2103.65    2103.68    0.007142    16777216   1024.00    0.001769   124.61     7.60       124.61     0   
      read      4204       4204       0.003159    16777216   1024.00    0.001461   62.35      10.59      62.35      0   
      write     2169.58    2169.69    0.006903    16777216   1024.00    0.000912   120.82     7.72       120.83     1   
      read      4282       4282       0.003722    16777216   1024.00    0.137671   61.22      2.78       61.22      1   
      write     2133.24    2133.25    0.007500    16777216   1024.00    0.000380   122.88     3.60       122.89     2   
      read      4088       4088       0.003689    16777216   1024.00    0.001053   64.13      3.68       64.13      2  
      
      Max Write: 2169.58 MiB/sec (2274.97 MB/sec)
      Max Read:  4282.19 MiB/sec (4490.20 MB/sec)
      

      it is ~14% performance regression in 2.15.2-RC1 compared to lustre-2.15.1.

      After investigations and 'git bisect' tells us "commit: [6d4559f6b948a93aaf5e94c4eb47cd9ebcf7ba95] LU-15959 kernel: new kernel [SLES15 SP4 5.14.21-150400.24.18.1]" caused this performance regression.

      Here is another test result after revered patch "LU-15959 kernel: new kernel [SLES15 SP4 5.14.21-150400.24.18.1]" from lustre-2.15.2-RC1 and it confirmed the performance was back to same level of 2.15.1.

      lustre-2.15.2-RC1 + reverted commit:6d4559f6b9 (LU-15959 kernel: new kernel [SLES15 SP4 5.14.21-150400.24.18.1])

      access    bw(MiB/s)  IOPS       Latency(s)  block(KiB) xfer(KiB)  open(s)    wr/rd(s)   close(s)   total(s)   iter
      ------    ---------  ----       ----------  ---------- ---------  --------   --------   --------   --------   ----
      write     2497.41    2497.44    0.006407    16777216   1024.00    0.001115   104.97     0.000791   104.97     0   
      read      4217       4217       0.003773    16777216   1024.00    0.001680   62.16      3.37       62.16      0   
      write     2471.13    2471.14    0.006475    16777216   1024.00    0.000375   106.08     0.000292   106.08     1   
      read      4083       4083       0.003765    16777216   1024.00    0.001659   64.20      3.23       64.20      1   
      write     2457.91    2457.92    0.006509    16777216   1024.00    0.000412   106.65     0.010367   106.65     2   
      read      4163       4163       0.003771    16777216   1024.00    0.001909   62.97      6.35       62.97      2   
      
      Max Write: 2497.41 MiB/sec (2618.72 MB/sec)
      Max Read:  4217.39 MiB/sec (4422.25 MB/sec)
      

      Attachments

        Issue Links

          Activity

            [LU-16433] single client performance regression in SSF workload

            I would note that 2.15.2-RC1 does not have LU-16433. It is possible that you could check if this fixes the performance regression?

            stancheff Shaun Tancheff added a comment - I would note that 2.15.2-RC1 does not have LU-16433 . It is possible that you could check if this fixes the performance regression?

            sihara , whether we re-open this or not, be aware this problem exists in Linux 5.2 and newer (and there is no obvious way to fix it).  So, as Andreas said, Ubuntu 22.04 + and RHEL9.

            paf0186 Patrick Farrell added a comment - sihara , whether we re-open this or not, be aware this problem exists in Linux 5.2 and newer (and there is no obvious way to fix it).  So, as Andreas said, Ubuntu 22.04 + and RHEL9.

            We could re-open it, but as it stands, xarray is just a re-API of the radix tree, and non-single-page-folios aren't supported in the page cache yet.  Setting folios aside, last I checked, the operations we'd need to do much in batch aren't exported.

            At the very least, my focus is on the DIO stuff - I'm more interested in pushing buffered I/O through the DIO path once unaligned support is fully working.  That would offer much larger gains.  (Not that it's not worth working on the buffered path, but ...)

            So re-opening is probably a decent idea, but I wouldn't prioritize it.

            paf0186 Patrick Farrell added a comment - We could re-open it, but as it stands, xarray is just a re-API of the radix tree, and non-single-page-folios aren't supported in the page cache yet.  Setting folios aside, last I checked, the operations we'd need to do much in batch aren't exported. At the very least, my focus is on the DIO stuff - I'm more interested in pushing buffered I/O through the DIO path once unaligned support is fully working.  That would offer much larger gains.  (Not that it's not worth working on the buffered path, but ...) So re-opening is probably a decent idea, but I wouldn't prioritize it.

            Should this issue be re-opened to investigate/address the performance loss for newer kernels?

            I don't think it is only SLES15sp4 that is affected, but any kernel since Linux 5.2 where account_page_dirtied() is not exported, like Ubuntu 22.04, RHEL9.x. The patch landed here defers this problem while kallsyms_lookup_name() can work around that lack, but that is also removed in newer kernels.

            There should be some way that we can work with the new page cache more efficiently for large page ranges, since that is what xarray and folios are supposed to be for...

            adilger Andreas Dilger added a comment - Should this issue be re-opened to investigate/address the performance loss for newer kernels? I don't think it is only SLES15sp4 that is affected, but any kernel since Linux 5.2 where account_page_dirtied() is not exported, like Ubuntu 22.04, RHEL9.x. The patch landed here defers this problem while kallsyms_lookup_name() can work around that lack, but that is also removed in newer kernels. There should be some way that we can work with the new page cache more efficiently for large page ranges, since that is what xarray and folios are supposed to be for...
            pjones Peter Jones added a comment -

            Landed for 2.16

            pjones Peter Jones added a comment - Landed for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49520/
            Subject: LU-16433 llite: check vvp_account_page_dirtied
            Project: fs/lustre-release
            Branch: b2_15
            Current Patch Set:
            Commit: 1c6e03a53cb374c10cf2d9e5a22fdb304f81e8bf

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49520/ Subject: LU-16433 llite: check vvp_account_page_dirtied Project: fs/lustre-release Branch: b2_15 Current Patch Set: Commit: 1c6e03a53cb374c10cf2d9e5a22fdb304f81e8bf

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49512/
            Subject: LU-16433 llite: check vvp_account_page_dirtied
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 61c4c2b5e5d7d919149921d913322586ba5ebcab

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49512/ Subject: LU-16433 llite: check vvp_account_page_dirtied Project: fs/lustre-release Branch: master Current Patch Set: Commit: 61c4c2b5e5d7d919149921d913322586ba5ebcab

            "Xing Huang <hxing@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49520
            Subject: LU-16433 llite: check vvp_account_page_dirtied
            Project: fs/lustre-release
            Branch: b2_15
            Current Patch Set: 1
            Commit: b95cf135117cd24ef5403aa111ae82fd14215efb

            gerrit Gerrit Updater added a comment - "Xing Huang <hxing@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49520 Subject: LU-16433 llite: check vvp_account_page_dirtied Project: fs/lustre-release Branch: b2_15 Current Patch Set: 1 Commit: b95cf135117cd24ef5403aa111ae82fd14215efb

            confirmed that patch https://review.whamcloud.com/c/fs/lustre-release/+/49512 solved problem and performance was back.
            lustre-2.15.2-RC1 + patch https://review.whamcloud.com/c/fs/lustre-release/+/49512

            access    bw(MiB/s)  IOPS       Latency(s)  block(KiB) xfer(KiB)  open(s)    wr/rd(s)   close(s)   total(s)   iter
            ------    ---------  ----       ----------  ---------- ---------  --------   --------   --------   --------   ----
            write     2440.73    2440.76    0.006555    16777216   1024.00    0.001139   107.40     0.000249   107.40     0   
            read      4027       4027       0.003897    16777216   1024.00    0.001635   65.09      3.60       65.09      0   
            write     2427.14    2427.15    0.006584    16777216   1024.00    0.000384   108.00     0.126996   108.01     1   
            read      4132       4132       0.003715    16777216   1024.00    0.001663   63.44      5.11       63.44      1   
            write     2421.75    2421.76    0.006581    16777216   1024.00    0.000384   108.25     1.39       108.25     2   
            read      4082       4082       0.003875    16777216   1024.00    0.001668   64.22      3.72       64.22      2   
            
            Max Write: 2440.73 MiB/sec (2559.29 MB/sec)
            Max Read:  4132.11 MiB/sec (4332.83 MB/sec)
            
            sihara Shuichi Ihara added a comment - confirmed that patch https://review.whamcloud.com/c/fs/lustre-release/+/49512 solved problem and performance was back. lustre-2.15.2-RC1 + patch https://review.whamcloud.com/c/fs/lustre-release/+/49512 access bw(MiB/s) IOPS Latency(s) block(KiB) xfer(KiB) open(s) wr/rd(s) close(s) total(s) iter ------ --------- ---- ---------- ---------- --------- -------- -------- -------- -------- ---- write 2440.73 2440.76 0.006555 16777216 1024.00 0.001139 107.40 0.000249 107.40 0 read 4027 4027 0.003897 16777216 1024.00 0.001635 65.09 3.60 65.09 0 write 2427.14 2427.15 0.006584 16777216 1024.00 0.000384 108.00 0.126996 108.01 1 read 4132 4132 0.003715 16777216 1024.00 0.001663 63.44 5.11 63.44 1 write 2421.75 2421.76 0.006581 16777216 1024.00 0.000384 108.25 1.39 108.25 2 read 4082 4082 0.003875 16777216 1024.00 0.001668 64.22 3.72 64.22 2 Max Write: 2440.73 MiB/sec (2559.29 MB/sec) Max Read: 4132.11 MiB/sec (4332.83 MB/sec)

            "Jian Yu <yujian@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49512
            Subject: LU-16433 llite: define and check vvp_account_page_dirtied
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 11b721714311ce9f11a596eaa13c368d27096d96

            gerrit Gerrit Updater added a comment - "Jian Yu <yujian@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49512 Subject: LU-16433 llite: define and check vvp_account_page_dirtied Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 11b721714311ce9f11a596eaa13c368d27096d96
            yujian Jian Yu added a comment -

            In patch https://review.whamcloud.com/47924 ("LU-15959 kernel: new kernel [SLES15 SP4 5.14.21-150400.24.18.1]"), the following changes are related:

            lustre/llite/vvp_internal.h
            -#ifndef HAVE_ACCOUNT_PAGE_DIRTIED_EXPORT
            +#if !defined(HAVE_ACCOUNT_PAGE_DIRTIED_EXPORT) || \
            +defined(HAVE_KALLSYMS_LOOKUP_NAME)
             extern unsigned int (*vvp_account_page_dirtied)(struct page *page,
                                                            struct address_space *mapping);
             #endif
            
            lustre/llite/vvp_io.c
            /* kernels without HAVE_KALLSYMS_LOOKUP_NAME also don't have account_dirty_page
             * exported, and if we can't access that symbol, we can't do page dirtying in
             * batch (taking the xarray lock only once) so we just fall back to a looped
             * call to __set_page_dirty_nobuffers
             */
            #ifndef HAVE_KALLSYMS_LOOKUP_NAME
            	for (i = 0; i < count; i++)
            		__set_page_dirty_nobuffers(pvec->pages[i]);
            #else
            +       /*
            +        * In kernel 5.14.21, kallsyms_lookup_name is defined but
            +        * account_page_dirtied is not exported.
            +        */
            +       if (!vvp_account_page_dirtied) {
            +               for (i = 0; i < count; i++)
            +                       __set_page_dirty_nobuffers(pvec->pages[i]);
            +               goto end;
            +       }
            +
            

            In Rocky Linux 8.6 kernel 4.18.0-372.32.1.el8_6.x86_64, both account_page_dirtied and kallsyms_lookup_name are exported. So, I need to change the check of vvp_account_page_dirtied to HAVE_ACCOUNT_PAGE_DIRTIED_EXPORT. This can resolved the client performance regression issue on Rocky Linux 8.6.
            However, for SLES15 SP4 client, I'm not sure how to resolve the issue since account_page_dirtied is not exported and we have to use __set_page_dirty_nobuffers.
            I'm working on a patch to fix the issue on Rocky Linux 8.6.

            yujian Jian Yu added a comment - In patch https://review.whamcloud.com/47924 (" LU-15959 kernel: new kernel [SLES15 SP4 5.14.21-150400.24.18.1] "), the following changes are related: lustre/llite/vvp_internal.h -#ifndef HAVE_ACCOUNT_PAGE_DIRTIED_EXPORT +# if !defined(HAVE_ACCOUNT_PAGE_DIRTIED_EXPORT) || \ +defined(HAVE_KALLSYMS_LOOKUP_NAME) extern unsigned int (*vvp_account_page_dirtied)(struct page *page, struct address_space *mapping); #endif lustre/llite/vvp_io.c /* kernels without HAVE_KALLSYMS_LOOKUP_NAME also don't have account_dirty_page * exported, and if we can 't access that symbol, we can' t do page dirtying in * batch (taking the xarray lock only once) so we just fall back to a looped * call to __set_page_dirty_nobuffers */ #ifndef HAVE_KALLSYMS_LOOKUP_NAME for (i = 0; i < count; i++) __set_page_dirty_nobuffers(pvec->pages[i]); # else + /* + * In kernel 5.14.21, kallsyms_lookup_name is defined but + * account_page_dirtied is not exported. + */ + if (!vvp_account_page_dirtied) { + for (i = 0; i < count; i++) + __set_page_dirty_nobuffers(pvec->pages[i]); + goto end; + } + In Rocky Linux 8.6 kernel 4.18.0-372.32.1.el8_6.x86_64, both account_page_dirtied and kallsyms_lookup_name are exported. So, I need to change the check of vvp_account_page_dirtied to HAVE_ACCOUNT_PAGE_DIRTIED_EXPORT . This can resolved the client performance regression issue on Rocky Linux 8.6. However, for SLES15 SP4 client, I'm not sure how to resolve the issue since account_page_dirtied is not exported and we have to use __set_page_dirty_nobuffers . I'm working on a patch to fix the issue on Rocky Linux 8.6.

            People

              yujian Jian Yu
              sihara Shuichi Ihara
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: