Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Not a Bug
Priority: Major
Fix Version/s: None
Affects Version/s: Lustre 2.6.0
Labels:
None

Severity:
3
Rank (Obsolete):
15511

Description

In a random read benchmark with a moderate amount of data, reading from disk (IE, cache cold), we see an 80-90% performance loss going from 1.8 to 2.6. We have not tested 2.4/2.5. (We've tried both Cray's 1.8.6 client on SLES and Intel's 1.8.9 client on CentOS and had similar results.)

This is the IOR command line to read the file - the file is single striped:
IOR -E -F -e -g -b 2964m -t 19k -k -C -Q 17 -r -z -v

The IOR command is a single task reading 2.89 GB in 19KB chunks, entirely randomly.

After writing the file out (same command with -w), then dropping server & client caches (echo 3 > /proc/sys/vm/drop_caches), we see (numbers are from a virtual cluster; numbers on real hardware were similar in terms of % change), I saw the following read rates on 1.8.9:

33.8 MB/s
22.5 MB/s
20.3 MB/s
22.9 MB/s

And these read rates on 2.6:
3.48 MB/s
3.57 MB/s

Server is running 2.6; we saw very similar numbers with a 2.1 server, so it doesn't seem to be related to the server version.

I'll be attaching attached brw_stats for the OST where the file was located, for one run each of 1.8 and 2.6. The main thing I noted was that the 1.8 client seemed to do many more large reads, and many, many fewer reads overall.

The 1.8 client read a total of (approximately - estimated from brw_stats) ~4040 MB, but did it in ~11k total RPCs/disk I/O ops.

The 2.6 client read a total (again, estimated from brw_stats) of ~4060 MB. It used roughly 140k total RPCs/disk I/O ops. More than ten time as many IO requests from the 2.6 client. The distribution of IO sizes, unsurprisingly, skews much more towards small IOs.

Thoughts? I know random IO is generally not a good use of Lustre, but some codes do it, and this change in performance from 1.8 to 2.6 is kind of staggering.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

1.8.9_brw_stats.txt
29/Aug/14 4:04 PM
2 kB
Patrick Farrell
2.6_brw_stats.txt
29/Aug/14 4:04 PM
5 kB
Patrick Farrell

Issue Links

is duplicated by

LU-2032 small random read i/o performance regression

Open

is related to

LU-11416 Improve readahead for random read of small/medium files

Open

is related to

LU-4931 New feature of giving server/storage side advice of accessing file

Resolved

Activity

People

Assignee:: WC Triage

Reporter:: Patrick Farrell (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 29/Aug/14 4:03 PM

Updated:: 21/Sep/18 11:29 PM

Resolved:: 07/Sep/16 8:05 PM