Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Minor
Fix Version/s: Lustre 2.9.0
Affects Version/s: Lustre 2.5.0
Labels:
None

Severity:
3
Rank (Obsolete):
11618

Description

Sanger has an interesting test in which they read from the same file from 20 processes. They first run in parallel and then run serially (after flushing cache). Their expected result is that the serial and parallel runs should take about the same amount of time. What they see however is that parallel reads are about 50% slower than serial reads:

client1# cat readfile.sh
#!/bin/sh

dd if=/lustre/scratch110/sanger/jb23/test/delete bs=4M of=/dev/null

client1# for i in `seq -w 1 20 `
do
  (time $LOC/readfile.sh )  > $LOC/results/${i}_out 2>&1 &
done

In parallel

01_out:real 3m36.228s
02_out:real 3m36.227s
03_out:real 3m36.226s
04_out:real 3m36.224s
05_out:real 3m36.224s
06_out:real 3m36.224s
07_out:real 3m36.222s
08_out:real 3m36.221s
09_out:real 3m36.228s
10_out:real 3m36.222s
11_out:real 3m36.220s
12_out:real 3m36.220s
13_out:real 3m36.228s
14_out:real 3m36.219s
15_out:real 3m36.217s
16_out:real 3m36.218s
17_out:real 3m36.214s
18_out:real 3m36.214s
19_out:real 3m36.211s
20_out:real 3m36.212s

A serial read ( I expect all the time to be in the first read ).

grep -i real *_serial
01_out_serial:real 2m31.372s
02_out_serial:real 0m1.190s
03_out_serial:real 0m0.654s
04_out_serial:real 0m0.562s
05_out_serial:real 0m0.574s
06_out_serial:real 0m0.570s
07_out_serial:real 0m0.574s
08_out_serial:real 0m0.461s
09_out_serial:real 0m0.456s
10_out_serial:real 0m0.462s
11_out_serial:real 0m0.475s
12_out_serial:real 0m0.473s
13_out_serial:real 0m0.582s
14_out_serial:real 0m0.580s
15_out_serial:real 0m0.569s
16_out_serial:real 0m0.679s
17_out_serial:real 0m0.565s
18_out_serial:real 0m0.573s
19_out_serial:real 0m0.579s
20_out_serial:real 0m0.472s

And try the same experiment with nfs

Serial access.

root@farm3-head4:~/tmp/test/results# grep -i real *
results/01_out_serial:real 0m19.923s
results/02_out_serial:real 0m1.373s
results/03_out_serial:real 0m1.237s
results/04_out_serial:real 0m1.276s
results/05_out_serial:real 0m1.289s
results/06_out_serial:real 0m1.297s
results/07_out_serial:real 0m1.265s
results/08_out_serial:real 0m1.278s
results/09_out_serial:real 0m1.224s
results/10_out_serial:real 0m1.225s
results/11_out_serial:real 0m1.221s
...

So the question is:
Why is the access slower if we are accessing the file in parallel and it is not in the cache ?

Is there some lock contention going on with multiple readers? Or is the Lustre client sending multiple RPCs for the same data, even though there is already an outstanding request? They have tried this on 1.8.x clients as well as 2.5.0.

Thanks.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

debug_file.out.gz
21/Nov/13 8:37 PM
0.2 kB
Kit Westneat
io.png
03/Dec/13 4:46 PM
75 kB
James Bonfield
lu-4257.tar.gz
18/Nov/13 12:08 PM
0.2 kB
James Beal
lustre_1.8.9
16/Dec/13 1:26 PM
850 kB
James Beal
lustre_2.5
16/Dec/13 1:26 PM
798 kB
James Beal
readfile.sh
16/Dec/13 1:26 PM
0.4 kB
James Beal
test.sh
16/Dec/13 1:26 PM
2 kB
James Beal

Issue Links

is related to

LU-7382 (vvp_io.c:573:vvp_io_update_iov()) ASSERTION( vio->vui_tot_nrsegs >= vio->vui_iter->nr_segs ) failed

Reopened

LU-9491 v2.9: silent data corruption with writev() and O_APPEND

Resolved

LU-4650 contention on ll_inode_size_lock with mmap'ed file

Resolved

LU-4588 replace semaphores with mutexes

Resolved

is related to

LU-8248 sanity test_248: fast read was not 4 times faster

Resolved

LU-9106 ASSERTION( vio->vui_tot_nrsegs >= vio->vui_iter->nr_segs )

Resolved

(1 is related to )

Activity

People

Assignee:: Jinshan Xiong (Inactive)

Reporter:: Shuichi Ihara (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 29 Start watching this issue

Dates

Created:: 15/Nov/13 3:43 PM

Updated:: 16/Jun/20 9:00 AM

Resolved:: 03/Jun/16 8:47 PM