Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-744

Single client's performance degradation on 2.1

Details

    • Bug
    • Resolution: Duplicate
    • Critical
    • None
    • Lustre 2.2.0, Lustre 2.3.0
    • None
    • 3
    • 4018

    Description

      During the performance testing on lustre-2.1, I saw the single client's performance degradation on it.
      Here is IOR results on the single cleints with 2.1 and also lustre-1.8.6.80 for comparing.
      I ran IOR (IOR -t 1m -b 32g -w -r -vv -F -o /lustre/ior.out/file) on the single client with 1, 2, 4 and 8 processes.

      Write(MiB/sec)
      v1.8.6.80 v2.1
      446.25 411.43
      808.53 761.30
      1484.18 1151.41
      1967.42 1172.06

      Read(MiB/sec)
      v1.8.6.80 v2.1
      823.90 595.71
      1449.49 1071.76
      2502.49 1517.79
      3133.43 1746.30

      Tested on same infrastracture(hardware and network). The client just turned off the checksum on both testing.

      Attachments

        1. 2.4 Single Client 3May2013.xlsx
          34 kB
        2. 574.1.pdf
          169 kB
        3. ior-256gb.tar.gz
          32 kB
        4. ior-32gb.tar.gz
          24 kB
        5. lu744-20120909.tar.gz
          883 kB
        6. lu744-20120915.tar.gz
          874 kB
        7. lu744-20120915-02.tar.gz
          1.02 MB
        8. lu744-20121111.tar.gz
          849 kB
        9. lu744-20121113.tar.gz
          846 kB
        10. lu744-20121117.tar.gz
          2.45 MB
        11. lu744-20130104.tar.gz
          915 kB
        12. lu744-20130104-02.tar.gz
          26 kB
        13. lu744-dls-20121113.tar.gz
          10 kB
        14. orig-collectl.out
          81 kB
        15. orig-ior.out
          2 kB
        16. orig-opreport-l.out
          146 kB
        17. patched-collectl.out
          34 kB
        18. patched-ior.out
          2 kB
        19. patched-opreport-l.out
          137 kB
        20. single-client-performance.xlsx
          42 kB
        21. stats-1.8.zip
          14 kB
        22. stats-2.1.zip
          64 kB
        23. test2-various-version.zip
          264 kB
        24. test-patchset-2.zip
          147 kB

        Issue Links

          Activity

            [LU-744] Single client's performance degradation on 2.1

            Test single client performance against 2.3.64 servers, versions tested: 1.8.8, 2.1.5,2.3.0,2.3.64

            cliffw Cliff White (Inactive) added a comment - Test single client performance against 2.3.64 servers, versions tested: 1.8.8, 2.1.5,2.3.0,2.3.64

            All patches have been landed. More work is also needed.

            jay Jinshan Xiong (Inactive) added a comment - All patches have been landed. More work is also needed.

            Andreas,

            As far as I tested, 4943 helped perforamnce improveemnts, but even that patches applied, perforamnce is still lower than b1_8.

            ihara Shuichi Ihara (Inactive) added a comment - Andreas, As far as I tested, 4943 helped perforamnce improveemnts, but even that patches applied, perforamnce is still lower than b1_8.

            Jinshan,
            with http://review.whamcloud.com/4943 landed to master, are there any patches left to land under this bug, or can it be closed?

            adilger Andreas Dilger added a comment - Jinshan, with http://review.whamcloud.com/4943 landed to master, are there any patches left to land under this bug, or can it be closed?

            Gregoire, that's interesting. I wouldn't immediately expect #2929 to make much of a performance impact. How many iterations did you run? I'm curious if those numbers are within the natural variance of the test, or if they're actually because of the changes in #2929. Jinshan, would you expect performance to increase because of that patch?

            prakash Prakash Surya (Inactive) added a comment - Gregoire, that's interesting. I wouldn't immediately expect #2929 to make much of a performance impact. How many iterations did you run? I'm curious if those numbers are within the natural variance of the test, or if they're actually because of the changes in #2929. Jinshan, would you expect performance to increase because of that patch?

            Jinshan,

            What is the status of the patch http://review.whamcloud.com/#change,2929 you posted several months ago for b2_1 release ?
            Why has it never been landed ?

            I have made some measurements and results are significant: from 4% to 50% improvement depending on the platform I tested on.

            Here are the results.

            Hardware configuration:
            30 OSTs
            2 OSS : 4 sockets, 32 cores, 64GB memory, 2xIB, 4xFC8-2port
            ClientA : 4 sockets Nehalem-EX, 32 cores, 64GB memory, 1xIB
            ClientB : 2 sockets SandyBridge-EP, 16 cores, 64GB memory, 1xIB
            Interconnect is QDR Infiniband

            Software configuration:
            kernel 2.6.32-220
            lustre 2.1.3 + ORNL-22 + a few other patches

            IOR file per process, blockSize=4GiB, xfersize=1MiB, fsync=1.
            This gives an aggregate filesize of 120 GiB.

                      #tasks    write   read   configuration
            ClientA       30     1121   1079   lustre 2.1.3
            ClientA       30     1782   1413   lustre 2.1.3 + #2929
            
            ClientB       16     2482   2149   lustre 2.1.3
            ClientB       16     2616   2244   lustre 2.1.3 + #2929
            
            pichong Gregoire Pichon added a comment - Jinshan, What is the status of the patch http://review.whamcloud.com/#change,2929 you posted several months ago for b2_1 release ? Why has it never been landed ? I have made some measurements and results are significant: from 4% to 50% improvement depending on the platform I tested on. Here are the results. Hardware configuration: 30 OSTs 2 OSS : 4 sockets, 32 cores, 64GB memory, 2xIB, 4xFC8-2port ClientA : 4 sockets Nehalem-EX, 32 cores, 64GB memory, 1xIB ClientB : 2 sockets SandyBridge-EP, 16 cores, 64GB memory, 1xIB Interconnect is QDR Infiniband Software configuration: kernel 2.6.32-220 lustre 2.1.3 + ORNL-22 + a few other patches IOR file per process, blockSize=4GiB, xfersize=1MiB, fsync=1. This gives an aggregate filesize of 120 GiB. #tasks write read configuration ClientA 30 1121 1079 lustre 2.1.3 ClientA 30 1782 1413 lustre 2.1.3 + #2929 ClientB 16 2482 2149 lustre 2.1.3 ClientB 16 2616 2244 lustre 2.1.3 + #2929

            new test results includes b1_8, master and master+patch.

            ihara Shuichi Ihara (Inactive) added a comment - new test results includes b1_8, master and master+patch.

            OK, tested again on client with b1_8, master mater+4943 patches, and from this test, I ran multiple iterations of IOR.

            Configuration
            8 x OSS : 2 x E5-2670 (2.6GHz), 64GB memory, Centos6.3+master(2.3.58)/w FDR, total 32 OSTs
            1 x Client : 2 x E5-2680 (2.7GHz), 64GB memory, Centos6.3/w FDR (tested with b1_8, master and master+patch as patchless client)
            
            nproc=12
                               iteration=1   iteration=2   iteration=3
            master(2.3.58)     3547 MiB/s    2754 MiB/s    2633 MiB/s
            master+patch(4943) 3775 MiB/s    3407 MiB/s    2841 MiB/s  
            b1_8               4212 MiB/s    4012 MiB/s    3750 MiB/s
            
            nproc=16
                               iteration=1   iteration=2   iteration=3
            master(2.3.58)     3617 MiB/s    3286 MiB/s    3149 MiB/s
            master+patch(4943) 4077 MiB/s    3269 MiB/s    3511 MiB/s  
            b1_8               4851 MiB/s    4255 MiB/s    4277 MiB/s
            
            ihara Shuichi Ihara (Inactive) added a comment - OK, tested again on client with b1_8, master mater+4943 patches, and from this test, I ran multiple iterations of IOR. Configuration 8 x OSS : 2 x E5-2670 (2.6GHz), 64GB memory, Centos6.3+master(2.3.58)/w FDR, total 32 OSTs 1 x Client : 2 x E5-2680 (2.7GHz), 64GB memory, Centos6.3/w FDR (tested with b1_8, master and master+patch as patchless client) nproc=12 iteration=1 iteration=2 iteration=3 master(2.3.58) 3547 MiB/s 2754 MiB/s 2633 MiB/s master+patch(4943) 3775 MiB/s 3407 MiB/s 2841 MiB/s b1_8 4212 MiB/s 4012 MiB/s 3750 MiB/s nproc=16 iteration=1 iteration=2 iteration=3 master(2.3.58) 3617 MiB/s 3286 MiB/s 3149 MiB/s master+patch(4943) 4077 MiB/s 3269 MiB/s 3511 MiB/s b1_8 4851 MiB/s 4255 MiB/s 4277 MiB/s

            Ihara, could you please extract out the performance numbers for this patch and the previous ones in a small table like was done for the previous tests?

            adilger Andreas Dilger added a comment - Ihara, could you please extract out the performance numbers for this patch and the previous ones in a small table like was done for the previous tests?

            Hi Ihara, what's the performance of b1_8 again on the same platform?

            jay Jinshan Xiong (Inactive) added a comment - Hi Ihara, what's the performance of b1_8 again on the same platform?
            jay Jinshan Xiong (Inactive) added a comment - - edited

            CPU is still a bottleneck. The write speed dropped after OSC LRU cache stepped in and immediately drove the CPU usage to 100%. Let me see if I can optimize it.

            jay Jinshan Xiong (Inactive) added a comment - - edited CPU is still a bottleneck. The write speed dropped after OSC LRU cache stepped in and immediately drove the CPU usage to 100%. Let me see if I can optimize it.

            People

              jay Jinshan Xiong (Inactive)
              ihara Shuichi Ihara (Inactive)
              Votes:
              1 Vote for this issue
              Watchers:
              35 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: