[LU-8964] use parallel I/O to improve performance on machines with slow single thread performance - Whamcloud Community JIRA

Details

Type: New Feature
Resolution: Duplicate
Priority: Major
Fix Version/s: None
Affects Version/s: None
Labels:
None

Rank (Obsolete):
9223372036854775807

Description

On machines with slow single thread performance like KNL the bottleneck of I/O performance moved into code which just copy memory from one buffer to other (from user space to kernel or vice versa). In current Lustre implementation all I/O performs in single thread and this is become an issue for KNL. Significantly improve performance can be with solution which do parallel memory transfer of large buffers.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

40ost_rpc_stats.txt
91 kB
01/Dec/17 5:37 AM
read_readahead_test.c
3 kB
06/Feb/17 3:53 PM

Issue Links

is duplicated by

LU-1056 Single-client, single-thread and single-file is limited at 1.5GB/s

Resolved

is related to

LU-11069 ifort lseek returns wrong position on lustre 2.10.3

Resolved

LU-10367 FIO Fails to run with libaio

Resolved

LU-9618 Connect readahead to prep_partial_page to improve small (< 1 page) write performance

Resolved

LU-11825 Remove LU-8964/pio feature & supporting framework

Resolved

LU-12043 improve Lustre single thread read performances

Resolved

is related to

LU-6658 single stream write performance improvement with worker threads in llite

Resolved

LU-8709 parallel asynchronous readahead

Resolved

(1 is related to, 2 is related to )

Activity

[LU-8964] use parallel I/O to improve performance on machines with slow single thread performance

Cory Spitz added a comment - 22/Oct/19 7:11 PM

simmonsja, for the record, you mean ~~LU-12043~~. ~~LU-12403~~ is "add e2fsprog support for RHEL-8".

Cory Spitz added a comment - 22/Oct/19 7:11 PM simmonsja , for the record, you mean LU-12043 . LU-12403 is "add e2fsprog support for RHEL-8".

James A Simmons added a comment - 07/Mar/19 11:39 PM

~~LU-12403~~ will do this work correctly.

James A Simmons added a comment - 07/Mar/19 11:39 PM LU-12403 will do this work correctly.

James A Simmons added a comment - 28/Nov/18 2:40 PM

Thanks Patrick for the heads up on ktask. I will be watching it closely and give it a spin under this ticket.

James A Simmons added a comment - 28/Nov/18 2:40 PM Thanks Patrick for the heads up on ktask. I will be watching it closely and give it a spin under this ticket.

Dmitry Eremin (Inactive) added a comment - 26/Apr/18 11:26 AM

Thanks for slides. I will loop at them carefully. But for now I disagree that padata API have a big overhead. It's mostly negligible comparing with other overhead to pass work into different thread. But having many threads will leads a sheduler delay to switch under heavy loads. So, I think padata will work more stable and predictable in this case.

Dmitry Eremin (Inactive) added a comment - 26/Apr/18 11:26 AM Thanks for slides. I will loop at them carefully. But for now I disagree that padata API have a big overhead. It's mostly negligible comparing with other overhead to pass work into different thread. But having many threads will leads a sheduler delay to switch under heavy loads. So, I think padata will work more stable and predictable in this case.

Patrick Farrell (Inactive) added a comment - 25/Apr/18 2:51 PM

Also, apologies for not posting these last year.

Patrick Farrell (Inactive) added a comment - 25/Apr/18 2:51 PM Also, apologies for not posting these last year.

Patrick Farrell (Inactive) added a comment - 25/Apr/18 2:50 PM

https://www.eofs.eu/_media/events/devsummit17/patrick_farrell_laddevsummit_pio.pdf

This is old and out of date, but I wanted to make sure these slides were seen. I think the performance of the readahead code would probably be helped a lot by changes to the parallelization framework (as would the performance of pio itself).

So slides 8, 9, and 10 would probably be of particular interest here. There are significant performance improvements available for PIO just by going from padata to something simpler. Also, the CPU binding behavior of padata is pretty bad - Binding explicitly to one CPU is problematic. Padata seems to assume the whole machine is dedicated, which is not a friendly assumption. (I discovered its CPU binding behavior because I saw performance problems - A particular CPU would be busy and the work assigned to that CPU would be delayed, which delays the completion of the whole i/o. At this time, other CPUs were idle, and not binding to a specific CPU would have allowed one of them to be used.)

Patrick Farrell (Inactive) added a comment - 25/Apr/18 2:50 PM https://www.eofs.eu/_media/events/devsummit17/patrick_farrell_laddevsummit_pio.pdf This is old and out of date, but I wanted to make sure these slides were seen. I think the performance of the readahead code would probably be helped a lot by changes to the parallelization framework (as would the performance of pio itself). So slides 8, 9, and 10 would probably be of particular interest here. There are significant performance improvements available for PIO just by going from padata to something simpler. Also, the CPU binding behavior of padata is pretty bad - Binding explicitly to one CPU is problematic. Padata seems to assume the whole machine is dedicated, which is not a friendly assumption. (I discovered its CPU binding behavior because I saw performance problems - A particular CPU would be busy and the work assigned to that CPU would be delayed, which delays the completion of the whole i/o. At this time, other CPUs were idle, and not binding to a specific CPU would have allowed one of them to be used.)

Patrick Farrell (Inactive) added a comment - 14/Dec/17 5:32 PM

Great

Patrick Farrell (Inactive) added a comment - 14/Dec/17 5:32 PM Great

Dmitry Eremin (Inactive) added a comment - 14/Dec/17 5:15 PM

The last version of patch don't have an issue with RPC splitting. For reading in my VM machine I have the following:

with PIO disabled:

                        read                    write
pages per rpc         rpcs   % cum % |       rpcs   % cum %
1:                       3   4   4   |          0   0   0
2:                       0   0   4   |          0   0   0
4:                       0   0   4   |          0   0   0
8:                       0   0   4   |          0   0   0
16:                      0   0   4   |          0   0   0
32:                      0   0   4   |          0   0   0
64:                      0   0   4   |          0   0   0
128:                     0   0   4   |          0   0   0
256:                     0   0   4   |          0   0   0
512:                     1   1   6   |          0   0   0
1024:                   62  93 100   |          0   0   0

with PIO enabled:

                        read                    write
pages per rpc         rpcs   % cum % |       rpcs   % cum %
1:                       2   2   2   |          0   0   0
2:                       0   0   2   |          0   0   0
4:                       0   0   2   |          0   0   0
8:                       0   0   2   |          0   0   0
16:                      0   0   2   |          0   0   0
32:                      0   0   2   |          0   0   0
64:                      0   0   2   |          0   0   0
128:                     0   0   2   |          0   0   0
256:                     1   1   4   |          0   0   0
512:                     4   5  10   |          0   0   0
1024:                   61  89 100   |          0   0   0

Dmitry Eremin (Inactive) added a comment - 14/Dec/17 5:15 PM The last version of patch don't have an issue with RPC splitting. For reading in my VM machine I have the following: with PIO disabled: read write pages per rpc rpcs % cum % | rpcs % cum % 1: 3 4 4 | 0 0 0 2: 0 0 4 | 0 0 0 4: 0 0 4 | 0 0 0 8: 0 0 4 | 0 0 0 16: 0 0 4 | 0 0 0 32: 0 0 4 | 0 0 0 64: 0 0 4 | 0 0 0 128: 0 0 4 | 0 0 0 256: 0 0 4 | 0 0 0 512: 1 1 6 | 0 0 0 1024: 62 93 100 | 0 0 0 with PIO enabled: read write pages per rpc rpcs % cum % | rpcs % cum % 1: 2 2 2 | 0 0 0 2: 0 0 2 | 0 0 0 4: 0 0 2 | 0 0 0 8: 0 0 2 | 0 0 0 16: 0 0 2 | 0 0 0 32: 0 0 2 | 0 0 0 64: 0 0 2 | 0 0 0 128: 0 0 2 | 0 0 0 256: 1 1 4 | 0 0 0 512: 4 5 10 | 0 0 0 1024: 61 89 100 | 0 0 0

People

Assignee:: James A Simmons

Reporter:: Dmitry Eremin (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 26 Start watching this issue

Dates

Created:: 21/Dec/16 3:02 PM

Updated:: 05/Nov/19 5:41 AM

Resolved:: 07/Mar/19 11:39 PM