Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-744

Single client's performance degradation on 2.1

Details

    • Bug
    • Resolution: Duplicate
    • Critical
    • None
    • Lustre 2.2.0, Lustre 2.3.0
    • None
    • 3
    • 4018

    Description

      During the performance testing on lustre-2.1, I saw the single client's performance degradation on it.
      Here is IOR results on the single cleints with 2.1 and also lustre-1.8.6.80 for comparing.
      I ran IOR (IOR -t 1m -b 32g -w -r -vv -F -o /lustre/ior.out/file) on the single client with 1, 2, 4 and 8 processes.

      Write(MiB/sec)
      v1.8.6.80 v2.1
      446.25 411.43
      808.53 761.30
      1484.18 1151.41
      1967.42 1172.06

      Read(MiB/sec)
      v1.8.6.80 v2.1
      823.90 595.71
      1449.49 1071.76
      2502.49 1517.79
      3133.43 1746.30

      Tested on same infrastracture(hardware and network). The client just turned off the checksum on both testing.

      Attachments

        1. 2.4 Single Client 3May2013.xlsx
          34 kB
        2. 574.1.pdf
          169 kB
        3. ior-256gb.tar.gz
          32 kB
        4. ior-32gb.tar.gz
          24 kB
        5. lu744-20120909.tar.gz
          883 kB
        6. lu744-20120915.tar.gz
          874 kB
        7. lu744-20120915-02.tar.gz
          1.02 MB
        8. lu744-20121111.tar.gz
          849 kB
        9. lu744-20121113.tar.gz
          846 kB
        10. lu744-20121117.tar.gz
          2.45 MB
        11. lu744-20130104.tar.gz
          915 kB
        12. lu744-20130104-02.tar.gz
          26 kB
        13. lu744-dls-20121113.tar.gz
          10 kB
        14. orig-collectl.out
          81 kB
        15. orig-ior.out
          2 kB
        16. orig-opreport-l.out
          146 kB
        17. patched-collectl.out
          34 kB
        18. patched-ior.out
          2 kB
        19. patched-opreport-l.out
          137 kB
        20. single-client-performance.xlsx
          42 kB
        21. stats-1.8.zip
          14 kB
        22. stats-2.1.zip
          64 kB
        23. test2-various-version.zip
          264 kB
        24. test-patchset-2.zip
          147 kB

        Issue Links

          Activity

            [LU-744] Single client's performance degradation on 2.1

            Jinshan,

            I just tested http://review.whamcloud.com/4943

            attached includes all results and oprofile output.
            it looks obviously better than previous numbers. but I wonder if we could get more better performance since we are getting 5.6GB/sec sometimes. (see collectl.out) want to keep these around these numbers

            ihara Shuichi Ihara (Inactive) added a comment - Jinshan, I just tested http://review.whamcloud.com/4943 attached includes all results and oprofile output. it looks obviously better than previous numbers. but I wonder if we could get more better performance since we are getting 5.6GB/sec sometimes. (see collectl.out) want to keep these around these numbers

            My next patch will be to remove top cache of cl_page.

            jay Jinshan Xiong (Inactive) added a comment - My next patch will be to remove top cache of cl_page.

            There is a new patch for performance tune at: http://review.whamcloud.com/4943. Please give it a try.

            jay Jinshan Xiong (Inactive) added a comment - There is a new patch for performance tune at: http://review.whamcloud.com/4943 . Please give it a try.

            Hi Ihara, this is because CPU is still under contention so the performance dropped when the hosekeeping work started. Can you please run the benchmark one more time with patches 4519, 4472 and 4617. This should help a little bit.

            jay Jinshan Xiong (Inactive) added a comment - Hi Ihara, this is because CPU is still under contention so the performance dropped when the hosekeeping work started. Can you please run the benchmark one more time with patches 4519, 4472 and 4617. This should help a little bit.
            prakash Prakash Surya (Inactive) added a comment - - edited

            Jinshan, Frederik, When using the LU-2139 patches on the client but not on the server, it is normal to see the IO pause/stall as you are seeing. I'm not sure if this is happening for this this, but what can happen is:

            1. Client performs IO
            2. Client receives completion callback for bulk RPC
            3. Bulk pages now clean but "unstable" (uncommitted on OST)
            4. NR_UNSTABLE_NFS incremented for each unstable page (due to http://review.whamcloud.com/4245)
            5. NR_UNSTABLE_NFS grows larger than (background_thresh + dirty_thresh)/2
            6. Kernel stalls IO waiting for NR_UNSTABLE_NFS to decrease (via kernel function: balance_dirty_pages)
            7. Client receives Lustre ping sometime in future (around 20 seconds later?), updating last_committed
            8. Bulk pages now "stable" on client and can be reclaimed, lowering NR_UNSTABLE_NFS
            9. Go back to step 1.

            Reading the above comments, it looks like the LU-2139 patches are working as intended (avoiding OOMs at the cost of performance). Although I admit, the performance is terrible when you hit the NR_UNSTABLE_NFS limit and the kernel halts all IO (put is better than OOM, IMO). To improve on this, http://review.whamcloud.com/4375 needs to be applied to both clients and servers. This will allow the server to proactively commit bulk pages as they come in, hopefully preventing the client from exhausting its memory with unstable pages and avoiding the "stall" in balance_dirty_pages. With it applied to the server, I'd expect NR_UNSTABLE_NFS to remain "low", and the 4GB file speeds to reflect the 1GB speeds.

            Please keep in mind, the LU-2139 patches are all experimental and subject to change.

            On the client, with the LU-2139 patches applied, you might find it interesting to watch lctl get_param llite.*.unstable_stats and cat /proc/meminfo | grep NFS_Unstable as the test is running.

            For example:

            $ watch -n0.1 'lctl get_param llite.*.unstable_stats'
            $ watch -n0.1 'cat /proc/meminfo | grep NFS_Unstable'
            

            Those will give you an idea for the amount of unstable pages the client has at a given time. If that value gets "high" (exact value depends on your dirty limits, but probably around 1/4 of RAM) then what I detailed above is most likely the cause for the bad performance.

            prakash Prakash Surya (Inactive) added a comment - - edited Jinshan, Frederik, When using the LU-2139 patches on the client but not on the server, it is normal to see the IO pause/stall as you are seeing. I'm not sure if this is happening for this this, but what can happen is: 1. Client performs IO 2. Client receives completion callback for bulk RPC 3. Bulk pages now clean but "unstable" (uncommitted on OST) 4. NR_UNSTABLE_NFS incremented for each unstable page (due to http://review.whamcloud.com/4245 ) 5. NR_UNSTABLE_NFS grows larger than (background_thresh + dirty_thresh)/2 6. Kernel stalls IO waiting for NR_UNSTABLE_NFS to decrease (via kernel function: balance_dirty_pages) 7. Client receives Lustre ping sometime in future (around 20 seconds later?), updating last_committed 8. Bulk pages now "stable" on client and can be reclaimed, lowering NR_UNSTABLE_NFS 9. Go back to step 1. Reading the above comments, it looks like the LU-2139 patches are working as intended (avoiding OOMs at the cost of performance). Although I admit, the performance is terrible when you hit the NR_UNSTABLE_NFS limit and the kernel halts all IO (put is better than OOM, IMO). To improve on this, http://review.whamcloud.com/4375 needs to be applied to both clients and servers. This will allow the server to proactively commit bulk pages as they come in, hopefully preventing the client from exhausting its memory with unstable pages and avoiding the "stall" in balance_dirty_pages. With it applied to the server, I'd expect NR_UNSTABLE_NFS to remain "low", and the 4GB file speeds to reflect the 1GB speeds. Please keep in mind, the LU-2139 patches are all experimental and subject to change. On the client, with the LU-2139 patches applied, you might find it interesting to watch lctl get_param llite.*.unstable_stats and cat /proc/meminfo | grep NFS_Unstable as the test is running. For example: $ watch -n0.1 'lctl get_param llite.*.unstable_stats' $ watch -n0.1 'cat /proc/meminfo | grep NFS_Unstable' Those will give you an idea for the amount of unstable pages the client has at a given time. If that value gets "high" (exact value depends on your dirty limits, but probably around 1/4 of RAM) then what I detailed above is most likely the cause for the bad performance.

            Jinshan,

            Yes, I upgraded MPI libbary a couple of weeks ago. I found a hardware problem and fixed it. Now mca_btl_sm_component_progress less consuming. it's still high compared to previous library though...

            This attachment includes three test results

            1. master without any patches
            2. master + 4519 (2nd patch) + 4472 (2nd patch)
            3. master + 4519 (2nd patch) + 4472 (2nd patch) and run mpi with pthread, instead of shared memory.
            

            patches help less CPU consuming and improve the performance, but still drop the performance when the client is no free memory.

            ihara Shuichi Ihara (Inactive) added a comment - Jinshan, Yes, I upgraded MPI libbary a couple of weeks ago. I found a hardware problem and fixed it. Now mca_btl_sm_component_progress less consuming. it's still high compared to previous library though... This attachment includes three test results 1. master without any patches 2. master + 4519 (2nd patch) + 4472 (2nd patch) 3. master + 4519 (2nd patch) + 4472 (2nd patch) and run mpi with pthread, instead of shared memory. patches help less CPU consuming and improve the performance, but still drop the performance when the client is no free memory.

            Hi Ihara, I saw there is significant CPU usage for library mca_btl_sm.so(11.7%) and libopen-pal.so.0.0.0(4.7%) but the performance data shown on Sep 5 they only consumed 0.13% and 0.05%. They are openmpi libraries. Did you do any upgrade on these libraries?

            Anyway, I revised patch 4519 and restored 4472 to remove memory stalls, please apply them in your next benchmark. However we have to figure out why openmpi libraries consumed so much cpu before seeing the performance improvement.

            jay Jinshan Xiong (Inactive) added a comment - Hi Ihara, I saw there is significant CPU usage for library mca_btl_sm.so(11.7%) and libopen-pal.so.0.0.0(4.7%) but the performance data shown on Sep 5 they only consumed 0.13% and 0.05%. They are openmpi libraries. Did you do any upgrade on these libraries? Anyway, I revised patch 4519 and restored 4472 to remove memory stalls, please apply them in your next benchmark. However we have to figure out why openmpi libraries consumed so much cpu before seeing the performance improvement.

            Frederik for delay response. From the test results, it looks there may be some issues with LU-2139 patches. You can see it in the collectl stats:

            46 45 72551 10187 2G 10M 410M 399M 188M 162M 0 0 379904 371
            47 47 74169 7109 2G 11M 786M 775M 269M 162M 0 0 385024 376
            25 25 40289 4190 2G 11M 982M 971M 312M 162M 0 0 200704 196
            6 6 8440 229 2G 11M 982M 971M 313M 162M 0 0 0 0
            7 7 10545 249 2G 11M 982M 971M 313M 164M 0 0 0 0

            .....

            Unknown macro: {20 seconds later}

            7 7 10639 241 2G 11M 983M 973M 311M 163M 0 0 0 0
            9 8 12408 236 2G 11M 983M 973M 311M 163M 0 0 0 0
            34 34 53022 4218 1G 11M 1G 1G 357M 163M 0 0 258048 252
            50 50 77645 7414 1G 11M 1G 1G 447M 163M 0 0 422912 413

            There were some IO activities for 2 or 3 seconds and then stay quiet for around 20 seconds and then do IO again. It seems like LRU budget was running out so OSC had to wait for commit on OST to be finished.

            I will work on this. Thanks for testing.

            jay Jinshan Xiong (Inactive) added a comment - Frederik for delay response. From the test results, it looks there may be some issues with LU-2139 patches. You can see it in the collectl stats: 46 45 72551 10187 2G 10M 410M 399M 188M 162M 0 0 379904 371 47 47 74169 7109 2G 11M 786M 775M 269M 162M 0 0 385024 376 25 25 40289 4190 2G 11M 982M 971M 312M 162M 0 0 200704 196 6 6 8440 229 2G 11M 982M 971M 313M 162M 0 0 0 0 7 7 10545 249 2G 11M 982M 971M 313M 164M 0 0 0 0 ..... Unknown macro: {20 seconds later} 7 7 10639 241 2G 11M 983M 973M 311M 163M 0 0 0 0 9 8 12408 236 2G 11M 983M 973M 311M 163M 0 0 0 0 34 34 53022 4218 1G 11M 1G 1G 357M 163M 0 0 258048 252 50 50 77645 7414 1G 11M 1G 1G 447M 163M 0 0 422912 413 There were some IO activities for 2 or 3 seconds and then stay quiet for around 20 seconds and then do IO again. It seems like LRU budget was running out so OSC had to wait for commit on OST to be finished. I will work on this. Thanks for testing.

            Jinshan, tested master + patch 4519 on both servers and client, but it seems to be still same results.

            ihara Shuichi Ihara (Inactive) added a comment - Jinshan, tested master + patch 4519 on both servers and client, but it seems to be still same results.

            So far all these tests have been done with 2.3.0 on the servers. I've not tried 2.3.54 on any of my test servers yet. I'll try to find some time over the next few days.

            ferner Frederik Ferner (Inactive) added a comment - So far all these tests have been done with 2.3.0 on the servers. I've not tried 2.3.54 on any of my test servers yet. I'll try to find some time over the next few days.

            Frederik, I'm assuming for your test results that you are running the same version on both the client and aerver? Would it also be possible for you to test 2.3.0 clients with 2.3.54 servers and vice versa? That would allow us to isolate if the slowdown seen with 2.3.54 is due to changes in the client or server.

            adilger Andreas Dilger added a comment - Frederik, I'm assuming for your test results that you are running the same version on both the client and aerver? Would it also be possible for you to test 2.3.0 clients with 2.3.54 servers and vice versa? That would allow us to isolate if the slowdown seen with 2.3.54 is due to changes in the client or server.

            People

              jay Jinshan Xiong (Inactive)
              ihara Shuichi Ihara (Inactive)
              Votes:
              1 Vote for this issue
              Watchers:
              35 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: