Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2032

small random read i/o performance regression

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • Lustre 2.1.3
    • None
    • 2.1.3 server with 2.1.3 and 1.8.8 clients. centos6, x86_64, QDR IB
    • 3
    • 4173

    Description

      I've been doing some 2.1.3 pre-rollout testing, and there seems to be a client problem with small random reads. performance is considerably worse on 2.1.3 clients than 1.8.8 clients. it's about a 35x slowdown for 4k random read i/o.

      tests use the same files on a rhel6 x86_64 2.1.3 server (stock 2.1.3 rpm is used), QDR IB fabric, single disk or md8+2 lun for an OST, all client & server VFS caches dropped between trials.

      checksums on or off and client rpc's 8 or 32 makes little difference. I've also tried umount'ing the fs from the 1.8.8 client between tests to make sure there's no hidden caching, but that didn't change anything.

      random read ->
      IOR -a POSIX -C -r -F -k -e -t $i -z -b 1024m -o /mnt/yo96/rjh/blah

      i/o size client version single disk lun md8+2 lun
      i=4k 1.8.8 20.70 MB/s 22.0 MB/s
      i=4k 2.1.3 0.55 MB/s 0.6 MB/s
      i/o size client version single disk lun md8+2 lun
      i=1M 1.8.8 87 MB/s 137 MB/s
      i=1M 2.1.3 63 MB/s 83 MB/s

      although these numbers are for a single process, the same trend applies when the IOR is scaled up to 8 processes/node and to multiple nodes.

      Attachments

        Issue Links

          Activity

            [LU-2032] small random read i/o performance regression

            sure. this is +reada +vfstrace for a 1.8.8 client and a 2.1.3 client.

            rjh Robin Humble (Inactive) added a comment - sure. this is +reada +vfstrace for a 1.8.8 client and a 2.1.3 client.

            Can you please run your test case again with D_READA enabled on the client side? This seems to be a readahead problem for sure.

            jay Jinshan Xiong (Inactive) added a comment - Can you please run your test case again with D_READA enabled on the client side? This seems to be a readahead problem for sure.

            rpc's, especially pages and bytes per rpc, look pretty different between 1.8.8 and 2.1.3.
            eg. after running
            IOR -a POSIX -C -r -F -k -e -t $i -z -b 100m -o /mnt/yo*/rjh/blah

            osc/*/rpc_stats ->

            1.8.8

                                    read                    write
            pages per rpc         rpcs   % cum % |       rpcs   % cum %
            1:                      68  12  12   |          0   0   0
            2:                      34   6  19   |          0   0   0
            4:                      57  10  30   |          0   0   0
            8:                      52   9  39   |          0   0   0
            16:                     63  11  51   |          0   0   0
            32:                     50   9  61   |          0   0   0
            64:                     66  12  73   |          0   0   0
            128:                    67  12  86   |          0   0   0
            256:                    72  13 100   |          0   0   0
            
                                    read                    write
            rpcs in flight        rpcs   % cum % |       rpcs   % cum %
            0:                     483  91  91   |          0   0   0
            1:                      39   7  98   |          0   0   0
            2:                       5   0  99   |          0   0   0
            3:                       2   0 100   |          0   0   0
            
                                    read                    write
            offset                rpcs   % cum % |       rpcs   % cum %
            0:                     101  19  19   |          0   0   0
            1:                      26   4  24   |          0   0   0
            2:                      49   9  33   |          0   0   0
            4:                      48   9  42   |          0   0   0
            8:                      61  11  53   |          0   0   0
            16:                     64  12  65   |          0   0   0
            32:                     56  10  76   |          0   0   0
            64:                     55  10  86   |          0   0   0
            128:                    69  13 100   |          0   0   0
            

            2.1.3

                                    read                    write
            pages per rpc         rpcs   % cum % |       rpcs   % cum %
            1:                   25568  99  99   |          0   0   0
            2:                       1   0  99   |          0   0   0
            4:                       0   0  99   |          0   0   0
            8:                       5   0 100   |          0   0   0
            
                                    read                    write
            rpcs in flight        rpcs   % cum % |       rpcs   % cum %
            0:                   25574 100 100   |          0   0   0
                                  
                                    read                    write
            offset                rpcs   % cum % |       rpcs   % cum %
            0:                     100   0   0   |          0   0   0
            1:                     100   0   0   |          0   0   0
            2:                     200   0   1   |          0   0   0
            4:                     400   1   3   |          0   0   0
            8:                     800   3   6   |          0   0   0
            16:                   1600   6  12   |          0   0   0
            32:                   3200  12  25   |          0   0   0
            64:                   6383  24  49   |          0   0   0
            128:                 12791  50 100   |          0   0   0
            

            osc/*/import ->

            1.8.8

            ...
                read_data_averages:
                   bytes_per_rpc: 200200
                   usec_per_rpc: 7673
                   MB_per_sec: 26.09
            

            2.1.3

                read_data_averages:
                   bytes_per_rpc: 4100
                   usec_per_rpc: 6108
                   MB_per_sec: 0.67
            

            also read ahead stats hit/miss ratios are pretty much reversed between 1.8.8 and 2.1.3, but that might just be accounting?
            llite/*/read_ahead_stats ->

            1.8.8

            snapshot_time             1348738815.173086 secs.usecs
            hits                      25120 samples [pages]
            misses                    480 samples [pages]
            readpage not consecutive  25583 samples [pages]
            failed grab_cache_page    25583 samples [pages]
            read but discarded        256 samples [pages]
            

            2.1.3

            snapshot_time             1348738815.198485 secs.usecs
            hits                      26 samples [pages]
            misses                    25574 samples [pages]
            readpage not consecutive  25589 samples [pages]
            miss inside window        3 samples [pages]
            failed grab_cache_page    25599 samples [pages]
            zero size window          1 samples [pages]
            
            rjh Robin Humble (Inactive) added a comment - rpc's, especially pages and bytes per rpc, look pretty different between 1.8.8 and 2.1.3. eg. after running IOR -a POSIX -C -r -F -k -e -t $i -z -b 100m -o /mnt/yo*/rjh/blah osc/*/rpc_stats -> 1.8.8 read write pages per rpc rpcs % cum % | rpcs % cum % 1: 68 12 12 | 0 0 0 2: 34 6 19 | 0 0 0 4: 57 10 30 | 0 0 0 8: 52 9 39 | 0 0 0 16: 63 11 51 | 0 0 0 32: 50 9 61 | 0 0 0 64: 66 12 73 | 0 0 0 128: 67 12 86 | 0 0 0 256: 72 13 100 | 0 0 0 read write rpcs in flight rpcs % cum % | rpcs % cum % 0: 483 91 91 | 0 0 0 1: 39 7 98 | 0 0 0 2: 5 0 99 | 0 0 0 3: 2 0 100 | 0 0 0 read write offset rpcs % cum % | rpcs % cum % 0: 101 19 19 | 0 0 0 1: 26 4 24 | 0 0 0 2: 49 9 33 | 0 0 0 4: 48 9 42 | 0 0 0 8: 61 11 53 | 0 0 0 16: 64 12 65 | 0 0 0 32: 56 10 76 | 0 0 0 64: 55 10 86 | 0 0 0 128: 69 13 100 | 0 0 0 2.1.3 read write pages per rpc rpcs % cum % | rpcs % cum % 1: 25568 99 99 | 0 0 0 2: 1 0 99 | 0 0 0 4: 0 0 99 | 0 0 0 8: 5 0 100 | 0 0 0 read write rpcs in flight rpcs % cum % | rpcs % cum % 0: 25574 100 100 | 0 0 0 read write offset rpcs % cum % | rpcs % cum % 0: 100 0 0 | 0 0 0 1: 100 0 0 | 0 0 0 2: 200 0 1 | 0 0 0 4: 400 1 3 | 0 0 0 8: 800 3 6 | 0 0 0 16: 1600 6 12 | 0 0 0 32: 3200 12 25 | 0 0 0 64: 6383 24 49 | 0 0 0 128: 12791 50 100 | 0 0 0 osc/*/import -> 1.8.8 ... read_data_averages: bytes_per_rpc: 200200 usec_per_rpc: 7673 MB_per_sec: 26.09 2.1.3 read_data_averages: bytes_per_rpc: 4100 usec_per_rpc: 6108 MB_per_sec: 0.67 also read ahead stats hit/miss ratios are pretty much reversed between 1.8.8 and 2.1.3, but that might just be accounting? llite/*/read_ahead_stats -> 1.8.8 snapshot_time 1348738815.173086 secs.usecs hits 25120 samples [pages] misses 480 samples [pages] readpage not consecutive 25583 samples [pages] failed grab_cache_page 25583 samples [pages] read but discarded 256 samples [pages] 2.1.3 snapshot_time 1348738815.198485 secs.usecs hits 26 samples [pages] misses 25574 samples [pages] readpage not consecutive 25589 samples [pages] miss inside window 3 samples [pages] failed grab_cache_page 25599 samples [pages] zero size window 1 samples [pages]

            it seems current master has the same problem.
            /proc/fs/lustre/version is 2.3.50-g1d2bf12-CHANGED-2.6.32-279.9.1.el6-2.3.50-ish.a

            i/o size client version single disk lun
            i=4k master 0.54 MB/s
            rjh Robin Humble (Inactive) added a comment - it seems current master has the same problem. /proc/fs/lustre/version is 2.3.50-g1d2bf12-CHANGED-2.6.32-279.9.1.el6-2.3.50-ish.a i/o size client version single disk lun i=4k master 0.54 MB/s

            People

              wc-triage WC Triage
              rjh Robin Humble (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

                Created:
                Updated: