[LU-2032] small random read i/o performance regression - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: Lustre 2.1.3
Labels:
None
Environment:
2.1.3 server with 2.1.3 and 1.8.8 clients. centos6, x86_64, QDR IB

Severity:
3
Rank (Obsolete):
4173

Description

I've been doing some 2.1.3 pre-rollout testing, and there seems to be a client problem with small random reads. performance is considerably worse on 2.1.3 clients than 1.8.8 clients. it's about a 35x slowdown for 4k random read i/o.

tests use the same files on a rhel6 x86_64 2.1.3 server (stock 2.1.3 rpm is used), QDR IB fabric, single disk or md8+2 lun for an OST, all client & server VFS caches dropped between trials.

checksums on or off and client rpc's 8 or 32 makes little difference. I've also tried umount'ing the fs from the 1.8.8 client between tests to make sure there's no hidden caching, but that didn't change anything.

random read ->
IOR -a POSIX -C -r -F -k -e -t $i -z -b 1024m -o /mnt/yo96/rjh/blah

i/o size	client version	single disk lun	md8+2 lun
i=4k	1.8.8	20.70 MB/s	22.0 MB/s
i=4k	2.1.3	0.55 MB/s	0.6 MB/s
i/o size	client version	single disk lun	md8+2 lun
i=1M	1.8.8	87 MB/s	137 MB/s
i=1M	2.1.3	63 MB/s	83 MB/s

although these numbers are for a single process, the same trend applies when the IOR is scaled up to 8 processes/node and to multiple nodes.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

dk_ra_vfs.100m_1.8.8.txt.gz
282 kB
28/Sep/12 6:04 AM
dk_ra_vfs.100m_2.1.3.txt.gz
418 kB
28/Sep/12 6:04 AM

Issue Links

duplicates

LU-5561 Lustre random reads: 80% performance loss from 1.8 to 2.6

Resolved

is related to

LU-11416 Improve readahead for random read of small/medium files

Open

is related to

LU-3321 2.x single thread/process throughput degraded from 1.8

Resolved

Activity

[LU-2032] small random read i/o performance regression

Robin Humble (Inactive) added a comment - 28/Sep/12 6:04 AM

sure. this is +reada +vfstrace for a 1.8.8 client and a 2.1.3 client.

Robin Humble (Inactive) added a comment - 28/Sep/12 6:04 AM sure. this is +reada +vfstrace for a 1.8.8 client and a 2.1.3 client.

Jinshan Xiong (Inactive) added a comment - 27/Sep/12 11:26 AM

Can you please run your test case again with D_READA enabled on the client side? This seems to be a readahead problem for sure.

Jinshan Xiong (Inactive) added a comment - 27/Sep/12 11:26 AM Can you please run your test case again with D_READA enabled on the client side? This seems to be a readahead problem for sure.

Robin Humble (Inactive) added a comment - 27/Sep/12 5:47 AM

rpc's, especially pages and bytes per rpc, look pretty different between 1.8.8 and 2.1.3.
eg. after running
IOR -a POSIX -C -r -F -k -e -t $i -z -b 100m -o /mnt/yo*/rjh/blah

osc/*/rpc_stats ->

1.8.8

                        read                    write
pages per rpc         rpcs   % cum % |       rpcs   % cum %
1:                      68  12  12   |          0   0   0
2:                      34   6  19   |          0   0   0
4:                      57  10  30   |          0   0   0
8:                      52   9  39   |          0   0   0
16:                     63  11  51   |          0   0   0
32:                     50   9  61   |          0   0   0
64:                     66  12  73   |          0   0   0
128:                    67  12  86   |          0   0   0
256:                    72  13 100   |          0   0   0

                        read                    write
rpcs in flight        rpcs   % cum % |       rpcs   % cum %
0:                     483  91  91   |          0   0   0
1:                      39   7  98   |          0   0   0
2:                       5   0  99   |          0   0   0
3:                       2   0 100   |          0   0   0

                        read                    write
offset                rpcs   % cum % |       rpcs   % cum %
0:                     101  19  19   |          0   0   0
1:                      26   4  24   |          0   0   0
2:                      49   9  33   |          0   0   0
4:                      48   9  42   |          0   0   0
8:                      61  11  53   |          0   0   0
16:                     64  12  65   |          0   0   0
32:                     56  10  76   |          0   0   0
64:                     55  10  86   |          0   0   0
128:                    69  13 100   |          0   0   0

2.1.3

                        read                    write
pages per rpc         rpcs   % cum % |       rpcs   % cum %
1:                   25568  99  99   |          0   0   0
2:                       1   0  99   |          0   0   0
4:                       0   0  99   |          0   0   0
8:                       5   0 100   |          0   0   0

                        read                    write
rpcs in flight        rpcs   % cum % |       rpcs   % cum %
0:                   25574 100 100   |          0   0   0
                      
                        read                    write
offset                rpcs   % cum % |       rpcs   % cum %
0:                     100   0   0   |          0   0   0
1:                     100   0   0   |          0   0   0
2:                     200   0   1   |          0   0   0
4:                     400   1   3   |          0   0   0
8:                     800   3   6   |          0   0   0
16:                   1600   6  12   |          0   0   0
32:                   3200  12  25   |          0   0   0
64:                   6383  24  49   |          0   0   0
128:                 12791  50 100   |          0   0   0

osc/*/import ->

1.8.8

...
    read_data_averages:
       bytes_per_rpc: 200200
       usec_per_rpc: 7673
       MB_per_sec: 26.09

2.1.3

    read_data_averages:
       bytes_per_rpc: 4100
       usec_per_rpc: 6108
       MB_per_sec: 0.67

also read ahead stats hit/miss ratios are pretty much reversed between 1.8.8 and 2.1.3, but that might just be accounting?
llite/*/read_ahead_stats ->

1.8.8

snapshot_time             1348738815.173086 secs.usecs
hits                      25120 samples [pages]
misses                    480 samples [pages]
readpage not consecutive  25583 samples [pages]
failed grab_cache_page    25583 samples [pages]
read but discarded        256 samples [pages]

2.1.3

snapshot_time             1348738815.198485 secs.usecs
hits                      26 samples [pages]
misses                    25574 samples [pages]
readpage not consecutive  25589 samples [pages]
miss inside window        3 samples [pages]
failed grab_cache_page    25599 samples [pages]
zero size window          1 samples [pages]

Robin Humble (Inactive) added a comment - 27/Sep/12 5:47 AM rpc's, especially pages and bytes per rpc, look pretty different between 1.8.8 and 2.1.3. eg. after running IOR -a POSIX -C -r -F -k -e -t $i -z -b 100m -o /mnt/yo*/rjh/blah osc/*/rpc_stats -> 1.8.8 read write pages per rpc rpcs % cum % | rpcs % cum % 1: 68 12 12 | 0 0 0 2: 34 6 19 | 0 0 0 4: 57 10 30 | 0 0 0 8: 52 9 39 | 0 0 0 16: 63 11 51 | 0 0 0 32: 50 9 61 | 0 0 0 64: 66 12 73 | 0 0 0 128: 67 12 86 | 0 0 0 256: 72 13 100 | 0 0 0 read write rpcs in flight rpcs % cum % | rpcs % cum % 0: 483 91 91 | 0 0 0 1: 39 7 98 | 0 0 0 2: 5 0 99 | 0 0 0 3: 2 0 100 | 0 0 0 read write offset rpcs % cum % | rpcs % cum % 0: 101 19 19 | 0 0 0 1: 26 4 24 | 0 0 0 2: 49 9 33 | 0 0 0 4: 48 9 42 | 0 0 0 8: 61 11 53 | 0 0 0 16: 64 12 65 | 0 0 0 32: 56 10 76 | 0 0 0 64: 55 10 86 | 0 0 0 128: 69 13 100 | 0 0 0 2.1.3 read write pages per rpc rpcs % cum % | rpcs % cum % 1: 25568 99 99 | 0 0 0 2: 1 0 99 | 0 0 0 4: 0 0 99 | 0 0 0 8: 5 0 100 | 0 0 0 read write rpcs in flight rpcs % cum % | rpcs % cum % 0: 25574 100 100 | 0 0 0 read write offset rpcs % cum % | rpcs % cum % 0: 100 0 0 | 0 0 0 1: 100 0 0 | 0 0 0 2: 200 0 1 | 0 0 0 4: 400 1 3 | 0 0 0 8: 800 3 6 | 0 0 0 16: 1600 6 12 | 0 0 0 32: 3200 12 25 | 0 0 0 64: 6383 24 49 | 0 0 0 128: 12791 50 100 | 0 0 0 osc/*/import -> 1.8.8 ... read_data_averages: bytes_per_rpc: 200200 usec_per_rpc: 7673 MB_per_sec: 26.09 2.1.3 read_data_averages: bytes_per_rpc: 4100 usec_per_rpc: 6108 MB_per_sec: 0.67 also read ahead stats hit/miss ratios are pretty much reversed between 1.8.8 and 2.1.3, but that might just be accounting? llite/*/read_ahead_stats -> 1.8.8 snapshot_time 1348738815.173086 secs.usecs hits 25120 samples [pages] misses 480 samples [pages] readpage not consecutive 25583 samples [pages] failed grab_cache_page 25583 samples [pages] read but discarded 256 samples [pages] 2.1.3 snapshot_time 1348738815.198485 secs.usecs hits 26 samples [pages] misses 25574 samples [pages] readpage not consecutive 25589 samples [pages] miss inside window 3 samples [pages] failed grab_cache_page 25599 samples [pages] zero size window 1 samples [pages]

Robin Humble (Inactive) added a comment - 26/Sep/12 12:14 PM

it seems current master has the same problem.
/proc/fs/lustre/version is 2.3.50-g1d2bf12-CHANGED-2.6.32-279.9.1.el6-2.3.50-ish.a

i/o size	client version	single disk lun
i=4k	master	0.54 MB/s

Robin Humble (Inactive) added a comment - 26/Sep/12 12:14 PM it seems current master has the same problem. /proc/fs/lustre/version is 2.3.50-g1d2bf12-CHANGED-2.6.32-279.9.1.el6-2.3.50-ish.a i/o size client version single disk lun i=4k master 0.54 MB/s

small random read i/o performance regression

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates