[LU-5754] Lustre 2.6 client performance running on 2.5 production system Created: 16/Oct/14 Updated: 08/Feb/18 Resolved: 08/Feb/18 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Question/Request | Priority: | Minor |
| Reporter: | Dave Bond (Inactive) | Assignee: | Jinshan Xiong (Inactive) |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
| Rank (Obsolete): | 16150 |
| Description |
|
After LAD this year the suggestion about our performance issues with single client single steam data was to try out the 2.6 Client. We are aiming for 900MB/s single stream single client performance. I am seeing on average a 66% increase in performance from the 2.5 to the 2.6 client. This is giving us approximately 650MiB/s from a single client running IOR. Run as below This agrees with dd used as a crude way to measure single stream performance, and real world testing. Though iozone achieves much higher figures being approximately 900MB/s the relationship is still an proximate 60% improvement. I believe the IOR figures to be more like what we would see with one of our detectors and I do not fully understand yet why iozone achieves so much better results. iozone.x86_64 -i 0 -r 4M -s 10G -t 1 But I would like to know if you feel we can get IOR to run at 900MB/s. The testing I have done so far with real world tests and benchmarking is consistently showing the 600MB/s performance. Andreas was of the opinion I should be able to achieve 900MB/s with the 2.6 client. Is there any tuning that you can think of that might benefit us. We can already see the file striped across all OST's |
| Comments |
| Comment by Peter Jones [ 16/Oct/14 ] |
|
Dave Yes this configuration would certainly be supported. Jinshan Is there any advice that you can provide to Dave? Thanks Peter |
| Comment by Jinshan Xiong (Inactive) [ 16/Oct/14 ] |
|
Hi Dave, In my latest test, single client single thread writing speed should be able to reach 1.3GB/s. Do you know how fast a single OST it is in your configuration? If possible, I'd like to start with single striped file. Please collect some stats while IOR and iozone is running: Also make sure the debug is turned off, you can also try to turn off checksum and see how good it is. I will do further analysis once I've got these info. Jinshan |
| Comment by Andreas Dilger [ 16/Oct/14 ] |
|
Also, what is the CPU/RAM on the client? Some operations like copying data from userspace to the kernel are CPU bound, so having a faster GHz CPU should improve performance of the single-threaded case. We haven't done much testing on this yet. |
| Comment by Dave Bond (Inactive) [ 20/Oct/14 ] |
|
This is the performance from a stripe count of 1 [joe59240@cs04r-sc-serv-68 dave]$ sudo lfs setstripe -c 1 /mnt/lustre03/testdir/dave/
[joe59240@cs04r-sc-serv-68 dave]$ lfs getstripe /mnt/lustre03/testdir/dave/
/mnt/lustre03/testdir/dave/
stripe_count: 1 stripe_size: 1048576 stripe_offset: -1
/mnt/lustre03/testdir/dave//dd-test
lmm_stripe_count: 30
lmm_stripe_size: 1048576
lmm_pattern: 1
lmm_layout_gen: 0
lmm_stripe_offset: 4
obdidx objid objid group
4 52196121 0x31c7319 0
26 53264792 0x32cc198 0
9 52897292 0x327260c 0
17 52485050 0x320dbba 0
5 52788704 0x3257de0 0
29 52833246 0x3262bde 0
14 52785759 0x325725f 0
16 52658853 0x32382a5 0
0 53020814 0x329088e 0
12 52817200 0x325ed30 0
18 52858751 0x3268f7f 0
24 53058169 0x3299a79 0
1 53232395 0x32c430b 0
13 52599697 0x3229b91 0
21 51907308 0x3180aec 0
23 52559058 0x321fcd2 0
2 52421528 0x31fe398 0
8 52819310 0x325f56e 0
20 53108899 0x32a60a3 0
28 53012365 0x328e78d 0
27 53149873 0x32b00b1 0
11 52740508 0x324c19c 0
15 53099667 0x32a3c93 0
3 53045067 0x329674b 0
10 52926727 0x3279907 0
22 52342894 0x31eb06e 0
6 51948916 0x318ad74 0
25 52317516 0x31e4d4c 0
7 52712325 0x3245385 0
19 52586950 0x32269c6 0
/mnt/lustre03/testdir/dave//ior_dat
lmm_stripe_count: 30
lmm_stripe_size: 1048576
lmm_pattern: 1
lmm_layout_gen: 0
lmm_stripe_offset: 29
obdidx objid objid group
29 52921862 0x3278606 0
14 52874360 0x326cc78 0
16 52747467 0x324dccb 0
0 53109419 0x32a62ab 0
12 52905806 0x327474e 0
18 52947353 0x327e999 0
24 53146774 0x32af496 0
1 53321004 0x32d9d2c 0
13 52688306 0x323f5b2 0
21 51995910 0x3196506 0
23 52647663 0x32356ef 0
2 52510135 0x3213db7 0
8 52907923 0x3274f93 0
20 53197508 0x32bbac4 0
28 53100974 0x32a41ae 0
27 53238481 0x32c5ad1 0
11 52829113 0x3261bb9 0
15 53188281 0x32b96b9 0
3 53133679 0x32ac16f 0
10 53015333 0x328f325 0
22 52431496 0x3200a88 0
6 52037517 0x31a078d 0
25 52406118 0x31fa766 0
7 52800937 0x325ada9 0
19 52675557 0x323c3e5 0
4 52284727 0x31dcd37 0
26 53353395 0x32e1bb3 0
9 52985895 0x3288027 0
17 52573656 0x32235d8 0
5 52877308 0x326d7fc 0
IOR test output: [joe59240@cs04r-sc-serv-68 dave]$ /dls_sw/apps/openmpi/1.4.3/64/bin/mpirun -mca btl self,tcp,sm -np 1 /home/bnh65367/code/ior/src/ior -o /mnt/lustre03/testdir/dave/ior_dat -w -r -k -t1m -S -b 10G -i 1 -e -a POSIX
ior WARNING: strided datatype only available in MPIIO. Using value of 0.
IOR-3.0.0: MPI Coordinated Test of Parallel I/O
Began: Mon Oct 20 11:34:33 2014
Command line used: /home/bnh65367/code/ior/src/ior -o /mnt/lustre03/testdir/dave/ior_dat -w -r -k -t1m -S -b 10G -i 1 -e -a POSIX
Machine: Linux cs04r-sc-serv-68.diamond.ac.uk
Test 0 started: Mon Oct 20 11:34:33 2014
Summary:
api = POSIX
test filename = /mnt/lustre03/testdir/dave/ior_dat
access = single-shared-file
ordering in a file = sequential offsets
ordering inter file= no tasks offsets
clients = 1 (1 per node)
repetitions = 1
xfersize = 1 MiB
blocksize = 10 GiB
aggregate filesize = 10 GiB
access bw(MiB/s) block(KiB) xfer(KiB) open(s) wr/rd(s) close(s) total(s) iter
------ --------- ---------- --------- -------- -------- -------- -------- ----
write 281.06 10485760 1024.00 0.000305 36.43 0.000271 36.43 0
read 878.49 10485760 1024.00 0.000173 11.66 0.000015 11.66 0
Max Write: 281.06 MiB/sec (294.71 MB/sec)
Max Read: 878.49 MiB/sec (921.17 MB/sec)
Summary of all tests:
Operation Max(MiB) Min(MiB) Mean(MiB) StdDev Mean(s) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggsize API RefNum
write 281.06 281.06 281.06 0.00 36.43340 0 1 1 1 0 0 1 0 0 1 10737418240 1048576 10737418240 POSIX 0
read 878.49 878.49 878.49 0.00 11.65634 0 1 1 1 0 0 1 0 0 1 10737418240 1048576 10737418240 POSIX 0
Finished: Mon Oct 20 11:35:21 2014
[joe59240@cs04r-sc-serv-68 dave]$
The same with iozone [joe59240@cs04r-sc-serv-68 dave]$ sudo /mnt/lustre03/testdir/iozone.x86_64 -i 0 -r 4M -s 10G -t 1 Iozone: Performance Test of File I/O
Version $Revision: 3.283 $
Compiled for 64 bit mode.
Build: linux
Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
Al Slater, Scott Rhine, Mike Wisner, Ken Goss
Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
Randy Dunlap, Mark Montague, Dan Million,
Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy,
Erik Habbinga, Kris Strecker, Walter Wong.
Run began: Mon Oct 20 11:37:40 2014
Record Size 4096 KB
File size set to 10485760 KB
Command line used: /mnt/lustre03/testdir/iozone.x86_64 -i 0 -r 4M -s 10G -t 1
Output is in Kbytes/sec
Time Resolution = 0.000001 seconds.
Processor cache size set to 1024 Kbytes.
Processor cache line size set to 32 bytes.
File stride size set to 17 * record size.
Throughput test with 1 process
Each process writes a 10485760 Kbyte file in 4096 Kbyte records
Children see throughput for 1 initial writers = 262971.44 KB/sec
Parent sees throughput for 1 initial writers = 261413.53 KB/sec
Min throughput per process = 262971.44 KB/sec
Max throughput per process = 262971.44 KB/sec
Avg throughput per process = 262971.44 KB/sec
Min xfer = 10485760.00 KB
Children see throughput for 1 rewriters = 278931.97 KB/sec
Parent sees throughput for 1 rewriters = 275981.64 KB/sec
Min throughput per process = 278931.97 KB/sec
Max throughput per process = 278931.97 KB/sec
Avg throughput per process = 278931.97 KB/sec
Min xfer = 10485760.00 KB
iozone test complete.
OSC stats: lctl set_param osc.*.rpc_stats=clear
sudo less /proc/fs/lustre/osc/*/rpc_stats
snapshot_time: 1413802038.44872 (secs.usecs)
read RPCs in flight: 0
write RPCs in flight: 0
pending write pages: 0
pending read pages: 0
read write
pages per rpc rpcs % cum % | rpcs % cum %
1: 0 0 0 | 0 0 0
read write
rpcs in flight rpcs % cum % | rpcs % cum %
0: 0 0 0 | 0 0 0
read write
offset rpcs % cum % | rpcs % cum %
0: 0 0 0 | 0 0 0
This does not look right, is this what you expected? Collectl output during the IOR run detailed above [joe59240@ws250 ~]$ ssh cs04r-sc-serv-68 Last login: Tue Oct 14 13:12:09 2014 from ws250.diamond.ac.uk [joe59240@cs04r-sc-serv-68 ~]$ [joe59240@cs04r-sc-serv-68 ~]$ [joe59240@cs04r-sc-serv-68 ~]$ [joe59240@cs04r-sc-serv-68 ~]$ collectl waiting for 1 second sample... #<----CPU[HYPER]-----><----------Disks-----------><----------Network----------> #cpu sys inter ctxsw KBRead Reads KBWrit Writes KBIn PktIn KBOut PktOut 0 0 587 841 0 0 0 0 9 50 11 57 0 0 703 896 0 0 0 0 1 9 1 9 3 3 19403 7052 0 0 44 3 0 3 1 3 4 4 21818 7536 0 0 0 0 720 8211 208771 25457 6 6 32816 11121 0 0 0 0 693 8223 219066 26867 7 7 93271 138243 0 0 0 0 1340 15870 425733 52238 13 13 307K 548008 0 0 0 0 1486 17641 471197 57778 14 14 272K 470931 0 0 0 0 1127 13389 350671 42719 5 5 27538 8991 0 0 64 2 2297 27343 716670 86940 7 7 41288 12778 0 0 144 14 1247 14809 391712 48057 7 6 32532 12251 0 0 19664 1633 1606 18808 503272 60941 5 3 19356 8981 0 0 768 108 1754 17422 486652 58989 11 9 47614 15398 0 0 220 20 478 3899 113687 13848 3 3 17766 5328 0 0 308 65 1649 16127 469160 57100 3 3 14610 5470 0 0 0 0 1370 15896 479407 57970 4 4 18595 6688 0 0 0 0 115 1324 40295 4864 3 3 13522 5252 0 0 0 0 1240 14202 449454 54485 7 6 30853 11003 0 0 36 3 168 1934 60959 7372 6 6 29337 10556 0 0 12 2 1320 15217 467725 56887 10 10 51858 14404 0 0 0 0 1284 14771 456003 55435 1 1 6909 2779 0 0 0 0 1836 21275 637351 77108 6 5 25279 7865 0 0 0 0 620 7220 216076 26096 #<----CPU[HYPER]-----><----------Disks-----------><----------Network----------> #cpu sys inter ctxsw KBRead Reads KBWrit Writes KBIn PktIn KBOut PktOut 5 5 23565 7832 0 0 0 0 1160 13788 364409 45931 5 5 26097 8397 0 0 12 3 791 9396 247501 29920 7 7 34057 11116 0 0 0 0 984 11696 310894 37928 2 2 11622 4540 0 0 0 0 1974 23490 614568 74517 7 7 32042 10125 0 0 0 0 528 6261 167598 20500 4 4 19701 6609 0 0 0 0 927 11016 292406 35490 5 5 27505 8451 0 0 12 3 1161 13807 362674 44136 5 5 20214 6601 0 0 0 0 1212 14407 379212 46230 3 3 18082 6196 0 0 12 2 1124 13354 352720 42823 9 9 57738 55049 0 0 0 0 584 6904 184015 22283 11 11 68865 63861 0 0 0 0 471477 61694 143164 36098 11 11 73120 64607 0 0 16 4 864545 104359 2977 35145 11 11 74268 65426 0 0 0 0 887446 107283 3057 36073 11 11 79167 65613 0 0 0 0 889100 107515 3075 36156 11 11 81678 66742 0 0 0 0 907457 109708 3125 36879 11 11 79362 64898 0 0 0 0 909729 109926 3133 36973 11 11 70470 63635 0 0 280 5 903671 109274 3112 36729 11 11 59786 61302 0 0 0 0 887830 107275 3058 36083 11 11 61463 63516 0 0 0 0 849486 102583 2925 34523 11 11 60779 62515 0 0 20 2 855909 102715 2949 34798 11 10 59548 61130 0 0 76 7 866513 103751 2985 35231 2 2 11908 12079 0 0 0 0 837783 100208 2886 34058 #<----CPU[HYPER]-----><----------Disks-----------><----------Network----------> #cpu sys inter ctxsw KBRead Reads KBWrit Writes KBIn PktIn KBOut PktOut 0 0 577 858 0 0 0 0 448334 53644 1545 18230 Ouch! [joe59240@cs04r-sc-serv-68 ~]$ IOZONE as detailed above waiting for 1 second sample... #<----CPU[HYPER]-----><----------Disks-----------><----------Network----------> #cpu sys inter ctxsw KBRead Reads KBWrit Writes KBIn PktIn KBOut PktOut 0 0 604 849 0 0 0 0 2 15 2 15 2 1 14781 11683 0 0 0 0 0 6 0 3 3 2 18669 6212 0 0 0 0 104 464 8300 1266 2 1 15306 5297 0 0 0 0 889 10178 287334 35605 2 1 12186 4372 0 0 0 0 311 3428 70424 9940 7 6 39800 11164 0 0 0 0 519 5806 137612 18353 9 8 45085 13230 0 0 16 3 687 7710 202694 26120 8 8 40120 12531 0 0 0 0 2071 23899 664566 82360 8 7 38068 11933 0 0 0 0 1559 17932 496415 61911 4 3 21977 7444 0 0 0 0 1464 16867 457643 57327 10 9 48323 13548 0 0 0 0 725 8232 203054 26327 11 9 54483 11768 0 0 0 0 1544 17777 481140 60172 3 2 17697 6603 0 0 88 10 1936 22289 630640 77987 8 7 38108 12013 0 0 0 0 573 6414 154170 20554 6 6 31520 8285 0 0 0 0 1185 13405 354637 45058 5 5 23790 7108 0 0 0 0 1379 16123 464958 56450 11 11 52229 14343 0 0 0 0 1049 12194 358767 43469 8 8 37343 12483 0 0 56 8 1929 22733 618107 74940 8 8 39677 12878 0 0 0 0 1785 20927 601368 73346 8 8 33446 10767 0 0 0 0 1722 20159 574514 70234 1 1 6889 2622 0 0 12 2 1517 17917 488105 59922 7 7 32176 10498 0 0 0 0 801 9522 254860 31226 #<----CPU[HYPER]-----><----------Disks-----------><----------Network----------> #cpu sys inter ctxsw KBRead Reads KBWrit Writes KBIn PktIn KBOut PktOut 10 10 47869 13378 0 0 8 2 762 9037 241801 29691 2 2 10315 3874 0 0 0 0 2244 26580 700204 85859 6 6 30827 10660 0 0 0 0 1038 12425 324463 39769 10 10 44931 13088 0 0 0 0 925 10994 289343 35644 5 5 19976 6395 0 0 0 0 2016 23911 631048 77593 5 5 24151 7682 0 0 12 3 883 10559 277271 34161 1 1 7841 2976 0 0 316 26 1263 15052 392666 48183 0 0 596 861 0 0 0 0 705 8315 218015 26460 2 2 10681 3824 0 0 0 0 1 9 1 7 2 2 14454 4887 0 0 0 0 159 1838 56364 6718 2 2 10568 4015 0 0 0 0 716 8390 245314 29486 1 1 5077 2339 0 0 12 2 387 4493 133290 16215 1 1 5410 2372 0 0 0 0 267 3095 91959 11176 5 5 25443 7448 0 0 0 0 274 3181 94027 11447 With a stripe across all OST's [joe59240@cs04r-sc-serv-68 dave]$ lfs getstripe /mnt/lustre03/testdir/dave//mnt/lustre03/testdir/dave/
stripe_count: -1 stripe_size: 1048576 stripe_offset: -1
/mnt/lustre03/testdir/dave//dd-test
lmm_stripe_count: 30
lmm_stripe_size: 1048576
lmm_pattern: 1
lmm_layout_gen: 0
lmm_stripe_offset: 4
obdidx objid objid group
4 52196121 0x31c7319 0
26 53264792 0x32cc198 0
9 52897292 0x327260c 0
17 52485050 0x320dbba 0
5 52788704 0x3257de0 0
29 52833246 0x3262bde 0
14 52785759 0x325725f 0
16 52658853 0x32382a5 0
0 53020814 0x329088e 0
12 52817200 0x325ed30 0
18 52858751 0x3268f7f 0
24 53058169 0x3299a79 0
1 53232395 0x32c430b 0
13 52599697 0x3229b91 0
21 51907308 0x3180aec 0
23 52559058 0x321fcd2 0
2 52421528 0x31fe398 0
8 52819310 0x325f56e 0
20 53108899 0x32a60a3 0
28 53012365 0x328e78d 0
27 53149873 0x32b00b1 0
11 52740508 0x324c19c 0
15 53099667 0x32a3c93 0
3 53045067 0x329674b 0
10 52926727 0x3279907 0
22 52342894 0x31eb06e 0
6 51948916 0x318ad74 0
25 52317516 0x31e4d4c 0
7 52712325 0x3245385 0
19 52586950 0x32269c6 0
/mnt/lustre03/testdir/dave//ior_dat
lmm_stripe_count: 30
lmm_stripe_size: 1048576
lmm_pattern: 1
lmm_layout_gen: 0
lmm_stripe_offset: 29
obdidx objid objid group
29 53301435 0x32d50bb 0
1 53749983 0x33428df 0
13 53083696 0x329fe30 0
25 52799678 0x325a8be 0
26 53740721 0x33404b1 0
5 53271989 0x32cddb5 0
8 53317095 0x32d8de7 0
21 52409690 0x31fb55a 0
0 53502583 0x3306277 0
18 53357367 0x32e2b37 0
9 53380702 0x32e865e 0
22 52839422 0x32643fe 0
19 53082855 0x329fae7 0
14 53271785 0x32cdce9 0
4 52675556 0x323c3e4 0
24 53559173 0x3313f85 0
3 53541604 0x330fae4 0
2 52917827 0x3277643 0
23 53061600 0x329a7e0 0
20 53580372 0x3319254 0
17 52972922 0x3284d7a 0
28 53499676 0x330571c 0
12 53298632 0x32d45c8 0
10 53413001 0x32f0489 0
15 53601715 0x331e5b3 0
6 52454067 0x32062b3 0
7 53188708 0x32b9864 0
16 53136154 0x32acb1a 0
11 53217905 0x32c0a71 0
27 53634590 0x332661e 0
[joe59240@cs04r-sc-serv-68 dave]$
IOR [joe59240@cs04r-sc-serv-68 ~]$ collectl waiting for 1 second sample... #<----CPU[HYPER]-----><----------Disks-----------><----------Network----------> #cpu sys inter ctxsw KBRead Reads KBWrit Writes KBIn PktIn KBOut PktOut 0 0 562 836 0 0 0 0 28 260 40 334 3 3 21068 7563 0 0 0 0 0 3 0 3 8 8 44555 13754 0 0 0 0 294 2986 70214 8874 8 8 44343 13682 0 0 0 0 2007 23763 639672 77159 8 8 45285 13995 0 0 0 0 2039 24127 648836 78164 8 8 45720 14187 0 0 12 2 2035 24058 652145 78585 8 8 45833 14234 0 0 0 0 2073 24508 663974 80200 10 10 46715 13982 0 0 76 2 2079 24595 667395 80519 11 11 48859 13950 0 0 0 0 2080 24630 664216 80456 11 11 49677 13997 0 0 0 0 2065 24445 659500 80140 11 11 49067 14193 0 0 0 0 2067 24559 663404 80618 11 11 50743 14452 0 0 0 0 2081 24507 664379 80749 10 10 48587 14081 0 0 0 0 2074 24529 667448 80881 11 11 48505 14009 0 0 0 0 2088 24710 669080 81231 11 11 48426 13948 0 0 280 7 2044 24163 654650 79560 10 10 44231 12839 0 0 0 0 2065 24471 659625 80075 10 10 44696 12704 0 0 0 0 1936 22903 617964 76287 9 9 38918 11698 0 0 80 7 1872 22160 595290 74359 4 4 1719 869 0 0 0 0 1888 22392 600464 75115 4 4 1781 838 0 0 84 19 414 4653 123005 15382 0 0 943 895 0 0 0 0 0 4 0 1 Ouch! [joe59240@cs04r-sc-serv-68 ~]$ IOZONE [joe59240@cs04r-sc-serv-68 ~]$ collectl waiting for 1 second sample... #<----CPU[HYPER]-----><----------Disks-----------><----------Network----------> #cpu sys inter ctxsw KBRead Reads KBWrit Writes KBIn PktIn KBOut PktOut 1 0 14184 26608 0 0 0 0 29 304 113 296 10 8 43543 18637 0 0 0 0 17 177 73 170 19 18 81349 21661 0 0 0 0 408 3841 83298 11154 20 18 80810 21536 0 0 32 3 3189 37475 948788 119026 19 18 81103 21786 0 0 0 0 3195 37515 951536 118957 20 18 81443 21744 0 0 0 0 3206 37670 953980 119522 20 18 81622 21652 0 0 0 0 3232 37957 955683 118725 20 18 80380 21795 0 0 0 0 3209 37702 955052 119159 19 18 75353 20554 0 0 0 0 3199 37532 959613 120015 18 18 71471 18827 0 0 12 2 3071 35958 920918 114773 19 19 74618 19167 0 0 0 0 2942 34714 934244 114394 18 18 73930 19441 0 0 0 0 3008 35638 957373 117057 13 13 52541 13865 0 0 1712 54 3046 36284 969794 118585 0 0 594 858 0 0 0 0 3124 36967 979256 119706 2 2 17548 5773 0 0 0 0 71 610 14240 1783 12 12 79256 23327 0 0 0 0 0 5 0 3 11 11 81920 24226 0 0 0 0 3207 38045 998K 124450 12 11 82623 24198 0 0 12 2 3773 44736 1169K 145615 11 11 82375 24033 0 0 0 0 3859 45852 1185K 147588 12 11 82640 23988 0 0 0 0 3872 46075 1181K 147056 12 12 85661 25074 0 0 76 7 3870 45983 1186K 147492 12 12 86445 24981 0 0 0 0 3939 46864 1202K 149635 #<----CPU[HYPER]-----><----------Disks-----------><----------Network----------> #cpu sys inter ctxsw KBRead Reads KBWrit Writes KBIn PktIn KBOut PktOut 11 11 80767 23403 0 0 0 0 4071 48526 1231K 153274 5 5 46652 12278 0 0 164 22 4018 48240 1217K 151549 0 0 558 797 0 0 0 0 3330 39112 992456 248660 0 0 594 866 0 0 0 0 1 9 0 5 0 0 598 853 0 0 0 0 0 3 0 2 0 0 780 1077 0 0 24 5 0 3 0 1 0 0 609 814 0 0 0 0 4 12 3 11 0 0 743 975 0 0 40 8 0 7 1 6 54 54 192K 547874 0 0 0 0 1 7 1 6 97 97 381K 1096K 0 0 0 0 13 26 10 27 25 25 222K 506925 0 0 0 0 0 3 0 2 Ouch! [joe59240@cs04r-sc-serv-68 ~]$ Mem and CPU info [joe59240@cs04r-sc-serv-68 ~]$ cat /proc/meminfo MemTotal: 65890040 kB MemFree: 30684456 kB Buffers: 367320 kB Cached: 29115520 kB SwapCached: 0 kB Active: 10533044 kB Inactive: 19325756 kB Active(anon): 375532 kB Inactive(anon): 708 kB Active(file): 10157512 kB Inactive(file): 19325048 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 2097144 kB SwapFree: 2097144 kB Dirty: 84 kB Writeback: 0 kB AnonPages: 376112 kB Mapped: 42724 kB Shmem: 208 kB Slab: 4025664 kB SReclaimable: 550248 kB SUnreclaim: 3475416 kB KernelStack: 7128 kB PageTables: 10824 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 35042164 kB Committed_AS: 767780 kB VmallocTotal: 34359738367 kB VmallocUsed: 956128 kB VmallocChunk: 34324855240 kB HardwareCorrupted: 0 kB AnonHugePages: 266240 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 4992 kB DirectMap2M: 2013184 kB DirectMap1G: 65011712 kB CPU: [joe59240@cs04r-sc-serv-68 ~]$ cat /proc/cpuinfo ... processor : 23 vendor_id : GenuineIntel cpu family : 6 model : 45 model name : Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz stepping : 7 cpu MHz : 2299.974 cache size : 15360 KB physical id : 1 siblings : 12 core id : 5 cpu cores : 6 apicid : 43 initial apicid : 43 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid bogomips : 4599.34 clflush size : 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management: Check debug is off [joe59240@cs04r-sc-serv-68 dave]$ lctl get_param debug I will have a go with checksums but this is not how I would want to run with real data. |
| Comment by Dave Bond (Inactive) [ 23/Oct/14 ] |
|
Hello, Could I have an update on this. We are wanting to update some production machines to 2.6 and before we do that I would love to prove it can go as fast as you say. |
| Comment by Jinshan Xiong (Inactive) [ 23/Oct/14 ] |
|
Hi Dave, I edited your comment so that it's easier to see. Please let me know if I happened to delete some important information. |
| Comment by Jinshan Xiong (Inactive) [ 23/Oct/14 ] |
|
Hi Dave, Thanks for the result. Unfortunately I didn't get much useful information. Can you please perform the test again for me in the following steps: Before the test starts: 6. Run iozone command: sudo /mnt/lustre03/testdir/iozone.x86_64 -i 0 -r 4M -s 10G -t 1 -w -f /mnt/lustre03/testdir/dave/iozone 7. once the above command is finished, please attach stats files and show me the result of: lfs getstripe /mnt/lustre03/testdir/dave/iozone Thanks again. I'd like to know some detail information about the OSS/OST and network configuration. Jinshan |
| Comment by Dave Bond (Inactive) [ 24/Oct/14 ] |
[joe59240@cs04r-sc-serv-68 ~]$ sudo lctl set_param debug=0 debug=0 New directory rather than empty the existing one [joe59240@cs04r-sc-serv-68 ~]$ sudo mkdir /mnt/lustre03/testdir/dave1 [joe59240@cs04r-sc-serv-68 ~]$ sudo lfs setstripe -c 1 /mnt/lustre03/testdir/dave/1 [joe59240@cs04r-sc-serv-68 ~]$ sudo lctl set_param osc.*.rpc_stats=clear osc.lustre03-OST0000-osc-ffff880828d65400.rpc_stats=clear osc.lustre03-OST0001-osc-ffff880828d65400.rpc_stats=clear osc.lustre03-OST0002-osc-ffff880828d65400.rpc_stats=clear osc.lustre03-OST0003-osc-ffff880828d65400.rpc_stats=clear osc.lustre03-OST0004-osc-ffff880828d65400.rpc_stats=clear osc.lustre03-OST0005-osc-ffff880828d65400.rpc_stats=clear osc.lustre03-OST0006-osc-ffff880828d65400.rpc_stats=clear osc.lustre03-OST0007-osc-ffff880828d65400.rpc_stats=clear osc.lustre03-OST0008-osc-ffff880828d65400.rpc_stats=clear osc.lustre03-OST0009-osc-ffff880828d65400.rpc_stats=clear osc.lustre03-OST000a-osc-ffff880828d65400.rpc_stats=clear osc.lustre03-OST000b-osc-ffff880828d65400.rpc_stats=clear osc.lustre03-OST000c-osc-ffff880828d65400.rpc_stats=clear osc.lustre03-OST000d-osc-ffff880828d65400.rpc_stats=clear osc.lustre03-OST000e-osc-ffff880828d65400.rpc_stats=clear osc.lustre03-OST000f-osc-ffff880828d65400.rpc_stats=clear osc.lustre03-OST0010-osc-ffff880828d65400.rpc_stats=clear osc.lustre03-OST0011-osc-ffff880828d65400.rpc_stats=clear osc.lustre03-OST0012-osc-ffff880828d65400.rpc_stats=clear osc.lustre03-OST0013-osc-ffff880828d65400.rpc_stats=clear osc.lustre03-OST0014-osc-ffff880828d65400.rpc_stats=clear osc.lustre03-OST0015-osc-ffff880828d65400.rpc_stats=clear osc.lustre03-OST0016-osc-ffff880828d65400.rpc_stats=clear osc.lustre03-OST0017-osc-ffff880828d65400.rpc_stats=clear osc.lustre03-OST0018-osc-ffff880828d65400.rpc_stats=clear osc.lustre03-OST0019-osc-ffff880828d65400.rpc_stats=clear osc.lustre03-OST001a-osc-ffff880828d65400.rpc_stats=clear osc.lustre03-OST001b-osc-ffff880828d65400.rpc_stats=clear osc.lustre03-OST001c-osc-ffff880828d65400.rpc_stats=clear osc.lustre03-OST001d-osc-ffff880828d65400.rpc_stats=clear osc.play01-OST0000-osc-ffff8807825c1800.rpc_stats=clear osc.play01-OST0001-osc-ffff8807825c1800.rpc_stats=clear osc.play01-OST0002-osc-ffff8807825c1800.rpc_stats=clear osc.play01-OST0003-osc-ffff8807825c1800.rpc_stats=clear osc.play01-OST0004-osc-ffff8807825c1800.rpc_stats=clear osc.play01-OST0005-osc-ffff8807825c1800.rpc_stats=clear [joe59240@cs04r-sc-serv-68 ~]$ /mnt/lustre03/testdir/iozone.x86_64 -i 0 -r 4M -s 10G -t 1 -w -F /mnt/lustre03/testdir/dave1/iozone Iozone: Performance Test of File I/O Version $Revision: 3.283 $ Compiled for 64 bit mode. Build: linux Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins Al Slater, Scott Rhine, Mike Wisner, Ken Goss Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR, Randy Dunlap, Mark Montague, Dan Million, Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Erik Habbinga, Kris Strecker, Walter Wong. Run began: Fri Oct 24 09:05:21 2014 Record Size 4096 KB File size set to 10485760 KB Setting no_unlink Command line used: /mnt/lustre03/testdir/iozone.x86_64 -i 0 -r 4M -s 10G -t 1 -w -F /mnt/lustre03/testdir/dave1/iozone Output is in Kbytes/sec Time Resolution = 0.000001 seconds. Processor cache size set to 1024 Kbytes. Processor cache line size set to 32 bytes. File stride size set to 17 * record size. Throughput test with 1 process Each process writes a 10485760 Kbyte file in 4096 Kbyte records Children see throughput for 1 initial writers = 337159.12 KB/sec Parent sees throughput for 1 initial writers = 335492.23 KB/sec Min throughput per process = 337159.12 KB/sec Max throughput per process = 337159.12 KB/sec Avg throughput per process = 337159.12 KB/sec Min xfer = 10485760.00 KB Children see throughput for 1 rewriters = 338427.53 KB/sec Parent sees throughput for 1 rewriters = 337059.64 KB/sec Min throughput per process = 338427.53 KB/sec Max throughput per process = 338427.53 KB/sec Avg throughput per process = 338427.53 KB/sec Min xfer = 10485760.00 KB iozone test complete. [joe59240@cs04r-sc-serv-68 ~]$ |
| Comment by Dave Bond (Inactive) [ 24/Oct/14 ] |
|
Stats from the iozone test attached |
| Comment by Dave Bond (Inactive) [ 24/Oct/14 ] |
[joe59240@cs04r-sc-serv-68 ~]$ sudo lfs getstripe /mnt/lustre03/testdir/dave1/iozone /mnt/lustre03/testdir/dave1/iozone lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: 1 lmm_layout_gen: 0 lmm_stripe_offset: 24 obdidx objid objid group 24 53921714 0x336c7b2 0 [joe59240@cs04r-sc-serv-68 ~]$ |
| Comment by Dave Bond (Inactive) [ 24/Oct/14 ] |
|
For the last two questions I will get back to you I will be talking with Frederik about that. Regards |
| Comment by Dave Bond (Inactive) [ 24/Oct/14 ] |
Fri Jun 3 09:17:39 BST 2011 Obdfilter-survey for case=disk from cs04r-sc-oss03-01.diamond.ac.uk ost 30 sz 503316480K rsz 1024K obj 30 thr 30 write 1927.85 [ 37.96, 70.94] rewrite 1913.93 [ 23.97, 75.93] read 4890.09 [ 132.87, 186.82] ost 30 sz 503316480K rsz 1024K obj 30 thr 60 write 3563.12 [ 65.94, 141.86] rewrite 3688.63 [ 46.95, 136.86] read 8823.83 [ 264.74, 406.60] ost 30 sz 503316480K rsz 1024K obj 30 thr 120 write 6530.20 [ 126.75, 268.74] rewrite 6868.42 [ 104.91, 262.74] read 11557.70 [ 366.64, 523.96] ost 30 sz 503316480K rsz 1024K obj 30 thr 240 write 8533.28 [ 132.86, 363.64] rewrite 8665.87 [ 139.86, 398.61] read 11666.15 [ 377.25, 536.94] ost 30 sz 503316480K rsz 1024K obj 30 thr 480 write 8724.26 [ 110.90, 381.62] rewrite 8651.21 [ 73.93, 497.51] read 11661.70 [ 369.63, 542.51] ost 30 sz 503316480K rsz 1024K obj 60 thr 60 write 3447.86 [ 56.94, 139.87] rewrite 3343.69 [ 67.93, 132.88] read 9385.09 [ 281.72, 362.28] ost 30 sz 503316480K rsz 1024K obj 60 thr 120 write 5872.07 [ 84.92, 238.75] rewrite 5625.11 [ 122.88, 239.76] read 11584.47 [ 349.32, 534.47] ost 30 sz 503316480K rsz 1024K obj 60 thr 240 write 8098.56 [ 97.91, 381.63] rewrite 7987.45 [ 80.92, 356.64] read 11582.54 [ 306.70, 536.62] ost 30 sz 503316480K rsz 1024K obj 60 thr 480 write 8625.35 [ 136.88, 429.19] rewrite 8750.36 [ 106.90, 448.56] read 11316.42 [ 309.73, 642.37] ost 30 sz 503316480K rsz 1024K obj 120 thr 120 write 3600.37 [ 59.94, 186.82] rewrite 3522.46 [ 72.86, 153.72] read 4946.79 [ 146.72, 249.76] ost 30 sz 503316480K rsz 1024K obj 120 thr 240 write 6653.74 [ 65.94, 321.70] rewrite 6498.17 [ 128.74, 295.73] read 5862.56 [ 128.87, 261.74] ost 30 sz 503316480K rsz 1024K obj 120 thr 480 write 8690.26 [ 116.89, 410.58] rewrite 8368.97 [ 129.74, 386.62] read 9619.68 [ 201.80, 435.70] ost 30 sz 503316480K rsz 1024K obj 240 thr 240 write 3582.52 [ 72.93, 170.82] rewrite 3579.80 [ 61.95, 164.68] read 4713.35 [ 114.89, 188.82] ost 30 sz 503316480K rsz 1024K obj 240 thr 480 write 6536.34 [ 111.89, 297.70] rewrite 6391.06 [ 92.91, 272.73] read 5064.96 [ 145.86, 199.82] ost 30 sz 503316480K rsz 1024K obj 480 thr 480 write 3624.81 [ 91.91, 200.80] rewrite 3604.54 [ 70.99, 190.81] read 4713.24 [ 97.90, 190.81] |
| Comment by Frederik Ferner (Inactive) [ 24/Oct/14 ] |
|
Note, the obdfilter-survey output Dave posted earlier had been taken a few years ago when we first commissioned the hardware, since then we have upgraded the server hardware and also changed from Lustre 1.8/RHEL5 to Lustre 2.5/RHEL6. Unfortunately I don't think we have recorded single OST obdfilter-survey output. We have also recently added IB to our file system. The tests so far had been using dual 10GigE bonded links (LACP) on the client and the same on the OSSes, MTU on the network is 8982. lnet selftest results for this network are below. We have repeated the tests with lnet over IB and the iozone performance hasn't changed, again lnet selftest output below. lnet selftest between all 4 OSSes in the file system as server and the client over ethernet/tcp: [bnh65367@cs04r-sc-serv-68 ~]$ sudo /tmp/lnet-selftest-wc.sh -s "172.23.144.31@tcp 172.23.144.32@tcp 172.23.144.33@tcp 172.23.144.34@tcp" -c 172.23.134.68@tcp -k 1 -r start CONCURRENCY=32 SESSION: hh FEATURES: 0 TIMEOUT: 100000 FORCE: No 172.23.134.68@tcp are added to session 172.23.144.31@tcp are added to session 172.23.144.32@tcp are added to session 172.23.144.33@tcp are added to session 172.23.144.34@tcp are added to session Test was added successfully Test was added successfully b is running now Batch: b Tests: 2 State: 177 ACTIVE BUSY DOWN UNKNOWN TOTAL client 1 0 0 0 1 server 4 0 0 0 4 Test 1(brw) (loop: 1800000, concurrency: 32) ACTIVE BUSY DOWN UNKNOWN TOTAL client 1 0 0 0 1 server 4 0 0 0 4 Test 2(brw) (loop: 1800000, concurrency: 32) ACTIVE BUSY DOWN UNKNOWN TOTAL client 1 0 0 0 1 server 4 0 0 0 4 [LNet Rates of c] [R] Avg: 9156 RPC/s Min: 9156 RPC/s Max: 9156 RPC/s [W] Avg: 6837 RPC/s Min: 6837 RPC/s Max: 6837 RPC/s [LNet Bandwidth of c] [R] Avg: 2320.97 MB/s Min: 2320.97 MB/s Max: 2320.97 MB/s [W] Avg: 2257.24 MB/s Min: 2257.24 MB/s Max: 2257.24 MB/s [LNet Rates of s] [R] Avg: 1976 RPC/s Min: 1409 RPC/s Max: 2487 RPC/s [W] Avg: 2556 RPC/s Min: 1752 RPC/s Max: 3293 RPC/s [LNet Bandwidth of s] [R] Avg: 568.28 MB/s Min: 408.74 MB/s Max: 708.79 MB/s [W] Avg: 604.67 MB/s Min: 365.93 MB/s Max: 827.75 MB/s [LNet Rates of c] [R] Avg: 9195 RPC/s Min: 9195 RPC/s Max: 9195 RPC/s [W] Avg: 6876 RPC/s Min: 6876 RPC/s Max: 6876 RPC/s [LNet Bandwidth of c] [R] Avg: 2321.21 MB/s Min: 2321.21 MB/s Max: 2321.21 MB/s [W] Avg: 2276.78 MB/s Min: 2276.78 MB/s Max: 2276.78 MB/s [LNet Rates of s] [R] Avg: 2019 RPC/s Min: 1391 RPC/s Max: 2628 RPC/s [W] Avg: 2600 RPC/s Min: 1715 RPC/s Max: 3471 RPC/s [LNet Bandwidth of s] [R] Avg: 570.68 MB/s Min: 397.51 MB/s Max: 755.95 MB/s [W] Avg: 629.91 MB/s Min: 366.12 MB/s Max: 889.95 MB/s lnet selftest for the same servers but now using IB/o2ib: [bnh65367@cs04r-sc-serv-68 ~]$ sudo /tmp/lnet-selftest-wc.sh -s "10.144.144.31@o2ib 10.144.144.32@o2ib 10.144.144.33@o2ib 10.144.144.34@o2ib" -c 10.144.134.68@o2ib -k 1 -r start CONCURRENCY=32 SESSION: hh FEATURES: 0 TIMEOUT: 100000 FORCE: No 10.144.134.68@o2ib are added to session 10.144.144.31@o2ib are added to session 10.144.144.32@o2ib are added to session 10.144.144.33@o2ib are added to session 10.144.144.34@o2ib are added to session Test was added successfully Test was added successfully b is running now Batch: b Tests: 2 State: 177 ACTIVE BUSY DOWN UNKNOWN TOTAL client 1 0 0 0 1 server 4 0 0 0 4 Test 1(brw) (loop: 1800000, concurrency: 32) ACTIVE BUSY DOWN UNKNOWN TOTAL client 1 0 0 0 1 server 4 0 0 0 4 Test 2(brw) (loop: 1800000, concurrency: 32) ACTIVE BUSY DOWN UNKNOWN TOTAL client 1 0 0 0 1 server 4 0 0 0 4 [LNet Rates of c] [R] Avg: 19354 RPC/s Min: 19354 RPC/s Max: 19354 RPC/s [W] Avg: 9678 RPC/s Min: 9678 RPC/s Max: 9678 RPC/s [LNet Bandwidth of c] [R] Avg: 4776.00 MB/s Min: 4776.00 MB/s Max: 4776.00 MB/s [W] Avg: 4902.20 MB/s Min: 4902.20 MB/s Max: 4902.20 MB/s [LNet Rates of s] [R] Avg: 4430 RPC/s Min: 4335 RPC/s Max: 4546 RPC/s [W] Avg: 5624 RPC/s Min: 5524 RPC/s Max: 5755 RPC/s [LNet Bandwidth of s] [R] Avg: 1225.72 MB/s Min: 1214.89 MB/s Max: 1239.21 MB/s [W] Avg: 1525.16 MB/s Min: 1480.58 MB/s Max: 1563.76 MB/s [LNet Rates of c] [R] Avg: 19354 RPC/s Min: 19354 RPC/s Max: 19354 RPC/s [W] Avg: 9677 RPC/s Min: 9677 RPC/s Max: 9677 RPC/s [LNet Bandwidth of c] [R] Avg: 4773.75 MB/s Min: 4773.75 MB/s Max: 4773.75 MB/s [W] Avg: 4906.15 MB/s Min: 4906.15 MB/s Max: 4906.15 MB/s [LNet Rates of s] [R] Avg: 4479 RPC/s Min: 4350 RPC/s Max: 4640 RPC/s [W] Avg: 5672 RPC/s Min: 5532 RPC/s Max: 5852 RPC/s [LNet Bandwidth of s] [R] Avg: 1226.72 MB/s Min: 1219.92 MB/s Max: 1241.57 MB/s [W] Avg: 1520.53 MB/s Min: 1474.58 MB/s Max: 1571.82 MB/s |
| Comment by Jinshan Xiong (Inactive) [ 24/Oct/14 ] |
|
Hi Dave, Thanks for the testing. From the obdfilter-survey result, the OST can see maximum performance at 16 threads write. It may become better with more threads. From the result of iozone, single stripe write performance was at 337MB/s, which roughly matches the obdfilter-survey performance at 8 threads write. So I guess you're setting OSC max_rpcs_in_flight to 8, try to increase it to 16(lctl set_param osc.*.max_rpcs_in_flight=16) and see how it goes. Monitoring rpc_stats will clarify the case, so please do it as I said in step 5. In the next step, please run lnet_selftest to verify that the network performance meets you expectation; then increase the stripe count to 2 and see how it goes. |
| Comment by Jinshan Xiong (Inactive) [ 24/Oct/14 ] |
|
Hi Frederik Ferner, the network seems good. The lower performance on servers side is because there are 4 servers and only 1 client. Have you ever tried the case of 1 client and 1 server nodes? |
| Comment by Andreas Dilger [ 24/Oct/14 ] |
|
Note that setting only max_rpcs_in_flight doesn't necessarily help if the peer credits isn't also increased. See LU-3184 for more details. |
| Comment by Dave Bond (Inactive) [ 27/Oct/14 ] |
|
As expected max_rpcs_in_flight did not give any great improvement. With the thought on peer credits I see I have 8 currently [joe59240@cs04r-sc-serv-68 ~]$ cat /proc/sys/lnet/peers nid refs state last max rtr min tx min queue 172.23.144.1@tcp 1 NA -1 8 8 8 8 7 0 10.144.144.1@o2ib 1 NA -1 8 8 8 8 5 0 172.23.144.14@tcp 1 NA -1 8 8 8 8 5 0 172.23.144.6@tcp 1 NA -1 8 8 8 8 6 0 172.23.144.32@tcp 1 NA -1 8 8 8 8 -56 0 10.144.144.32@o2ib 1 NA -1 8 8 8 8 -56 0 10.144.134.68@o2ib 1 NA -1 8 8 8 8 6 0 172.23.134.68@tcp 1 NA -1 8 8 8 8 6 0 10.144.144.34@o2ib 1 NA -1 8 8 8 8 -42 0 172.23.144.34@tcp 1 NA -1 8 8 8 8 -57 0 172.23.144.5@tcp 1 NA -1 8 8 8 8 7 0 172.23.144.18@tcp 1 NA -1 8 8 8 8 5 0 10.144.144.31@o2ib 1 NA -1 8 8 8 8 -48 0 172.23.144.31@tcp 1 NA -1 8 8 8 8 -56 0 10.144.144.33@o2ib 1 NA -1 8 8 8 8 -56 0 172.23.144.33@tcp 1 NA -1 8 8 8 8 -56 0 Should this also be set to 16 to match the max_rpcs_in_flight setting? Would this also be adjusting the max peers or do all fields have to match? |
| Comment by Dave Bond (Inactive) [ 27/Oct/14 ] |
|
I would also be appreciative of a man page or an example of how this is set. For example does it take effect immediately or do I need to preform any other steps. I cannot find this information in the 2.x manual. |
| Comment by Dave Bond (Inactive) [ 30/Oct/14 ] |
|
*NUDGE* Could you please give a little more information on the peer credits before I proceed. |
| Comment by Jinshan Xiong (Inactive) [ 31/Oct/14 ] |
|
Hi Dave, I have invited our LNET expert, Issac to take a look at this issue and I think he will provide some information so that we can proceed. |
| Comment by Jinshan Xiong (Inactive) [ 31/Oct/14 ] |
|
From what I have seen so far, another thing we can do is to increase the stripe count to 2 and see if we can get any performance gains. |
| Comment by Isaac Huang (Inactive) [ 03/Nov/14 ] |
|
I suppose the question was about client and servers connected directly by TCP (i.e. no lnet routers). The /proc/sys/lnet/peers showed that the queues for the servers grew quite deep at one point, which might be caused by lack of peer_credits or not (e.g. transient network congestion). Try increase peer_credits to match max_rpcs_in_flight. If the Dynamic LNet Config project hasn't yet enabled dynamic peer credits tuning, then it's a ksocklnd option. |
| Comment by Jinshan Xiong (Inactive) [ 03/Nov/14 ] |
|
Thanks Issac. Hi Dave, To actually control peer_credits, please apply `options ksocklnd peer_credits=16' for ksocklnd to match the value of max_pages_per_rpc. |
| Comment by Dave Bond (Inactive) [ 11/Nov/14 ] |
|
After testing this on out test file system we felt ready to put this into production, and then we could provide some good performance metrics. We had to roll back the change as on the majority of our cluster nodes they failed to mount the file system Nov 10 13:06:43 cs04r-sc-com06-40 kernel: LNetError: 1317:0:(o2iblnd_cb.c:2619:kiblnd_rejected()) 10.144.144.1@o2ib rejected: incompatible message queue depth 16, 8 The lnet.conf file is as below options lnet networks=o2ib0(ib0),tcp0(bond0) options ksocklnd peer_credits=16 options ko2iblnd peer_credits=16 So the suggestion above did work on many of the nodes but because of this issue we could not keep the change. Any thoughts as to how to overcome this? |
| Comment by Liang Zhen (Inactive) [ 13/Nov/14 ] |
|
Hi Dave, I think you need to have same credits for all nodes, so if you changed some credits to 16, then all others have to be 16 as well. |
| Comment by Dave Bond (Inactive) [ 13/Nov/14 ] |
|
As it has a max and min value, is it really the case that everything has to be set to 16. I would prefer due to the disruption of a live file system if this can be tuned per client. If it has to be set everywhere could you please advise of risk associated with this as I would not want to introduce any oddities at this point in our run time. |
| Comment by Jinshan Xiong (Inactive) [ 13/Nov/14 ] |
|
Hi Dave, Is it possible for you to set up a test environment to verify the performance gains by this change? It won't need a lot of nodes - 1 client, 2 OSS and 1MDS would be enough for now. Right now, we're still in the phase of identifying the problem. We may have more experiments to do down the road. Having a testing env will be really helpful and accelerate the progress. |
| Comment by Isaac Huang (Inactive) [ 17/Nov/14 ] |
|
Dave, For now all peer_credits must be the same (for each LND) every where. I thought you intended to increase peer_credits only for the TCP network, as the TCP peers showed deep tx queues. For TCP I see no risk to double the peer_credits. It's more complicated for the o2iblnd, as it wouldn't suffice to increase peer_credits alone in order to increase the send window. It's OK to change peer_credits for TCP only. There's a patch to enable per-client tuning for o2iblnd, but it's highly experimental: |
| Comment by Dave Bond (Inactive) [ 19/Nov/14 ] |
|
Hello all, As this is looking to be more disruptive than expected for the production system we are running this on, as we would need to change peer credits on all clients. For stability reasons this does not feel a good idea to us. We are going to be using an alternative smaller file system, still running the same versions of lustre. Over the next few days we will be benchmarking this to ensure that we are not already saturating the disks or servers. If this is successful, we will introduce the peer credits change and note any performance gain. I would hope if we are going to benefit we will see a smaller increment in the performance compared with the production file system. |
| Comment by Dave Bond (Inactive) [ 05/Dec/14 ] |
|
Hello, With the current configuration I am going from 8 to 16 max_rpc_in_flight and peer credits. Talking to others at MEW this year numbers such as 64 have been mentioned as being used. Could you please explain how the 16 was derived? Why not for example double it again to 32? |
| Comment by Liang Zhen (Inactive) [ 21/Dec/14 ] |
|
Hi Dave, 16 is just a value we'd suggest to try with, if it cannot help then you might want to increase it. |
| Comment by Jinshan Xiong (Inactive) [ 08/Feb/18 ] |
|
close old tickets |