[LU-8280] Bad Lustre Read Performance with Master Build 3371 Created: 15/Jun/16  Updated: 23/Jul/16  Resolved: 23/Jul/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Cong Xu (Inactive) Assignee: Cliff White (Inactive)
Resolution: Not a Bug Votes: 0
Labels: None

Attachments: Microsoft Word 2.8.0 vs 2.8.54 Jun 28 2016.xlsx     Microsoft Word Spirit LU-8280 Jun 27 2016.xlsx     File lu-8280-2.8.0-alldata.tar.gz     File lu-8280-2.8.53-alldata.tar.gz    
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

*Important notes: Please do not rebuild Lustre on wolf-[33-45], thanks!

The read performance of Lustre master build 3371 is bad. Following are the detailed configurations for my evaluation over Lustre file system using IOR benchmark:

[Lustre Configuration]

MDS/MDT: wolf-37
OST/OSS: wolf-[33-36] (Each OST: MD RAID 0 striped [Chunk = 1M] over 5 SATA Drives)
Clients: wolf-[38-45]

Stripe Size: 4194304
Stripe Count: 4

Lustre Build Version: 3371
Lustre Build Command:

  1. loadjenkinsbuild -b 3371 -a x86_64 -j lustre-master --jenkinsuri https://build.hpdd.intel.com -p test-el6-x86_64 -t server -d el6.7 -n wolf-45 -r -v

[IOR Benchmark]

IOR Command:

  1. mpirun -np 4 -iface ib0 -f /home/congxu/host-ib /home/congxu/Software/ior-master/src/ior -a POSIX -N 4 -d 5 -i 1 -s 32768 -b 4MiB -t 4MiB -w -r -o /mnt/lustre/cong/testfile

IOR Results:

Share File case
IOR-3.0.1: MPI Coordinated Test of Parallel I/O

Began: Tue Jun 14 20:16:18 2016
Command line used: /home/congxu/Software/ior-master/src/ior -a POSIX -N 4 -d 5 -i 1 -s 32768 -b 4MiB -t 4MiB -w -r -o /mnt/lustre/cong/testfile
Machine: Linux wolf-38.wolf.hpdd.intel.com

Test 0 started: Tue Jun 14 20:16:18 2016
Summary:
api = POSIX
test filename = /mnt/lustre/cong/testfile
access = single-shared-file
ordering in a file = sequential offsets
ordering inter file= no tasks offsets
clients = 4 (1 per node)
repetitions = 1
xfersize = 4 MiB
blocksize = 4 MiB
aggregate filesize = 512 GiB

access bw(MiB/s) block(KiB) xfer(KiB) open(s) wr/rd(s) close(s) total(s) iter
------ --------- ---------- --------- -------- -------- -------- -------- ----
write 2546.35 4096 4096 0.001613 205.90 10.62 205.90 0
read 1076.82 4096 4096 0.000565 486.89 74.16 486.89 0
remove - - - - - - 0.001506 0

Max Write: 2546.35 MiB/sec (2670.05 MB/sec)
Max Read: 1076.82 MiB/sec (1129.12 MB/sec)

Summary of all tests:
Operation Max(MiB) Min(MiB) Mean(MiB) StdDev Mean(s) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggsize API RefNum
write 2546.35 2546.35 2546.35 0.00 205.89756 0 4 1 1 0 0 1 0 0 32768 4194304 4194304 549755813888 POSIX 0
read 1076.82 1076.82 1076.82 0.00 486.88718 0 4 1 1 0 0 1 0 0 32768 4194304 4194304 549755813888 POSIX 0

Finished: Tue Jun 14 20:28:01 2016

*************************************On the contrary, "lustre-b2_8 -b 12" performs well**********************************

[Lustre Configuration]

MDS/MDT: wolf-37
OST/OSS: wolf-[33-36] (Each OST: MD RAID 0 striped [Chunk = 1M] over 5 SATA Drives)
Clients: wolf-[38-45]

Stripe Size: 4194304
Stripe Count: 4

Lustre Build Version: 12
Lustre Build Command:

  1. loadjenkinsbuild -b 12 -a x86_64 -j lustre-b2_8 --jenkinsuri https://build.hpdd.intel.com -p test-el6-x86_64 -t server -d el6.7 -n wolf-45 -r -v

[IOR Benchmark]

IOR Command:

  1. mpirun -np 4 -iface ib0 -f /home/congxu/host-ib /home/congxu/Software/ior-master/src/ior -a POSIX -N 4 -d 5 -i 1 -s 32768 -b 4MiB -t 4MiB -w -r -o /mnt/lustre/cong/testfile

IOR Results:

Share File case
IOR-3.0.1: MPI Coordinated Test of Parallel I/O

Began: Wed Jun 15 01:17:36 2016
Command line used: /home/congxu/Software/ior-master/src/ior -a POSIX -N 4 -d 5 -i 1 -s 32768 -b 4MiB -t 4MiB -w -r -o /mnt/lustre/cong/testfile
Machine: Linux wolf-38.wolf.hpdd.intel.com

Test 0 started: Wed Jun 15 01:17:36 2016
Summary:
api = POSIX
test filename = /mnt/lustre/cong/testfile
access = single-shared-file
ordering in a file = sequential offsets
ordering inter file= no tasks offsets
clients = 4 (1 per node)
repetitions = 1
xfersize = 4 MiB
blocksize = 4 MiB
aggregate filesize = 512 GiB

access bw(MiB/s) block(KiB) xfer(KiB) open(s) wr/rd(s) close(s) total(s) iter
------ --------- ---------- --------- -------- -------- -------- -------- ----
write 2559.64 4096 4096 0.001187 204.83 5.84 204.83 0
read 2840.79 4096 4096 0.000808 184.56 24.11 184.56 0
remove - - - - - - 0.001111 0

Max Write: 2559.64 MiB/sec (2683.98 MB/sec)
Max Read: 2840.79 MiB/sec (2978.78 MB/sec)

Summary of all tests:
Operation Max(MiB) Min(MiB) Mean(MiB) StdDev Mean(s) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggsize API RefNum
write 2559.64 2559.64 2559.64 0.00 204.82859 0 4 1 1 0 0 1 0 0 32768 4194304 4194304 549755813888 POSIX 0
read 2840.79 2840.79 2840.79 0.00 184.55737 0 4 1 1 0 0 1 0 0 32768 4194304 4194304 549755813888 POSIX 0

Finished: Wed Jun 15 01:24:15 2016



 Comments   
Comment by Oleg Drokin [ 15/Jun/16 ]

I moved this into Lustre project sine it seems that newer Lustre versions are slower and 2.8 is still good, so it's unlikely to do anything with the infrastructure.

Comment by Cliff White (Inactive) [ 27/Jun/16 ]

Note: we normally run IOR with -i 5 at least - a single iteration does not produce a consistent result. Also, our normal performance tests use -b 4G, not -b 4m.

Comment by Cong Xu (Inactive) [ 27/Jun/16 ]

In this evaluation, we configure Lustre stripe size, IOR Transfer size and Block size to be 4MB, and the number of Clients equals to number of OSTs. Thus, everything is perfect matched and we expect to deliver the maximum bandwidth of Lustre file system.

Comment by Cliff White (Inactive) [ 28/Jun/16 ]

I ran a comparison of 2.8.0 and 2.8.53.38 on Spirit.
Backend storage was a DDN 12k with 8 LUNs. Each LUN was 7.0 TB, Total filesystem size was 142TB.
The OSS uses dual port FDR Infiniband, the clients have a single port FDR IB connection.
I ran a matrix, first with 8 IOR threads per client, second with 16 IOR threads per client.
Test were run with 1,4, 8 and 16 clients.
IOR File-per-process and IOR single-shared-file tests were run.
For the case of a single client with 8 or 16 threads, IOR file-per-process read performance for 2.8.53.38 is well below 2.8.0.
For all other cases, performance on 2.8.53.38 is equal or above 2.8.0.
The single-client case is interesting and may be worth further examination.
All test runs, vmstat and collectl data is attached to the bug.

Comment by Cliff White (Inactive) [ 28/Jun/16 ]

In my experience a single client cannot generate enough IO request to saturate a single OSS, >1 client per OST is normally required. For this reason we run performance across a range of client sizes, and use multiple threads per client.

Comment by Cliff White (Inactive) [ 28/Jun/16 ]

This spreadsheet contains the relevant runs from Hyperion. 2.8.55 is below 2.8.0 performance @ 16 clients for both reads and writes, writes are also below standard at larger client counts.

Comment by Peter Jones [ 23/Jul/16 ]

IIRC Cliff had reported that this was an issue with the way the tests were being run

Generated at Sat Feb 10 02:16:10 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.