Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.4.0
-
RHEL6.3 and current master
-
2
-
8114
Description
There is a performance regression on the current master(c864582b5d4541c7830d628457e55cd859aee005) if we have multiple IOR threads per client. As far as I can test, LU-2576 might cause this performance regression. Here is quick test results on each commit.
client : commit ac37e7b4d101761bbff401ed12fcf671d6b68f9c
# mpirun -np 8 /lustre/IOR -w -b 8g -t 1m -e -C -F -vv -o /lustre/ior.out/file IOR-2.10.3: MPI Coordinated Test of Parallel I/O Run began: Sun May 5 12:24:09 2013 Command line used: /lustre/IOR -w -b 8g -t 1m -e -C -F -vv -o /lustre/ior.out/file Machine: Linux s08 2.6.32-279.19.1.el6_lustre.x86_64 #1 SMP Sat Feb 9 21:55:32 PST 2013 x86_64 Using synchronized MPI timer Start time skew across all tasks: 0.00 sec Path: /lustre/ior.out FS: 683.5 TiB Used FS: 0.0% Inodes: 5.0 Mi Used Inodes: 0.0% Participating tasks: 8 Using reorderTasks '-C' (expecting block, not cyclic, task assignment) task 0 on s08 task 1 on s08 task 2 on s08 task 3 on s08 task 4 on s08 task 5 on s08 task 6 on s08 task 7 on s08 Summary: api = POSIX test filename = /lustre/ior.out/file access = file-per-process pattern = segmented (1 segment) ordering in a file = sequential offsets ordering inter file=constant task offsets = 1 clients = 8 (8 per node) repetitions = 1 xfersize = 1 MiB blocksize = 8 GiB aggregate filesize = 64 GiB Using Time Stamp 1367781849 (0x5186b1d9) for Data Signature Commencing write performance test. Sun May 5 12:24:09 2013 access bw(MiB/s) block(KiB) xfer(KiB) open(s) wr/rd(s) close(s) total(s) iter ------ --------- ---------- --------- -------- -------- -------- -------- ---- write 3228.38 8388608 1024.00 0.001871 20.30 1.34 20.30 0 XXCEL Operation Max (MiB) Min (MiB) Mean (MiB) Std Dev Max (OPs) Min (OPs) Mean (OPs) Std Dev Mean (s) Op grep #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggsize --------- --------- --------- ---------- ------- --------- --------- ---------- ------- -------- write 3228.38 3228.38 3228.38 0.00 3228.38 3228.38 3228.38 0.00 20.29996 8 8 1 1 1 1 0 0 1 8589934592 1048576 68719476736 -1 POSIX EXCEL Max Write: 3228.38 MiB/sec (3385.20 MB/sec) Run finished: Sun May 5 12:24:30 2013
client : commit 5661651b2cc6414686e7da581589c2ea0e1f1969
# mpirun -np 8 /lustre/IOR -w -b 8g -t 1m -e -C -F -vv -o /lustre/ior.out/file IOR-2.10.3: MPI Coordinated Test of Parallel I/O Run began: Sun May 5 12:16:35 2013 Command line used: /lustre/IOR -w -b 8g -t 1m -e -C -F -vv -o /lustre/ior.out/file Machine: Linux s08 2.6.32-279.19.1.el6_lustre.x86_64 #1 SMP Sat Feb 9 21:55:32 PST 2013 x86_64 Using synchronized MPI timer Start time skew across all tasks: 0.00 sec Path: /lustre/ior.out FS: 683.5 TiB Used FS: 0.0% Inodes: 5.0 Mi Used Inodes: 0.0% Participating tasks: 8 Using reorderTasks '-C' (expecting block, not cyclic, task assignment) task 0 on s08 task 1 on s08 task 2 on s08 task 3 on s08 task 4 on s08 task 5 on s08 task 6 on s08 task 7 on s08 Summary: api = POSIX test filename = /lustre/ior.out/file access = file-per-process pattern = segmented (1 segment) ordering in a file = sequential offsets ordering inter file=constant task offsets = 1 clients = 8 (8 per node) repetitions = 1 xfersize = 1 MiB blocksize = 8 GiB aggregate filesize = 64 GiB Using Time Stamp 1367781395 (0x5186b013) for Data Signature Commencing write performance test. Sun May 5 12:16:35 2013 access bw(MiB/s) block(KiB) xfer(KiB) open(s) wr/rd(s) close(s) total(s) iter ------ --------- ---------- --------- -------- -------- -------- -------- ---- write 550.28 8388608 1024.00 0.001730 119.10 2.76 119.10 0 XXCEL Operation Max (MiB) Min (MiB) Mean (MiB) Std Dev Max (OPs) Min (OPs) Mean (OPs) Std Dev Mean (s) Op grep #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggsize --------- --------- --------- ---------- ------- --------- --------- ---------- ------- -------- write 550.28 550.28 550.28 0.00 550.28 550.28 550.28 0.00 119.09623 8 8 1 1 1 1 0 0 1 8589934592 1048576 68719476736 -1 POSIX EXCEL Max Write: 550.28 MiB/sec (577.01 MB/sec) Run finished: Sun May 5 12:18:34 2013
Both tests, the servers are running current master (c864582b5d4541c7830d628457e55cd859aee005)
Since I've been assigned to this, I'm marking it resolved. The "bad patch" was reverted, and there's been no reports of this since, which leads me to believe it is no longer an issue. Feel free to reopen if there is a compelling case to do so.
General discussion of the
LU-2139issue and pending patch stack is better suited to happen inLU-2139ticket.