Details
-
Improvement
-
Resolution: Fixed
-
Minor
-
Lustre 2.1.0
-
None
-
22,980
-
8541
Description
this is just copy of bug 22980, but I think it's better to track & discuss it at here:
Hello,
Testing our new IO servers we have an issue with obdfilter-survey. Our OSSs are based on 4
Nehalem-EX processors, connected to a Boxboro chipset. Every socket has 6 cores. On every OST we
have several FC channels connected to our storage bay.
When we perform raw tests with sgpdd-survey, over 24 luns we get ~4400 MB/s on write and more than
5500 MB/s on read.
Then if we start a Lustre filesystem and we test these 24 osts with obdfilter-survey (size=24192
rszlo=1024 rszhi=1024 nobjlo=1 nobjhi=2 thrlo=1 thrhi=16 case=disk tests_str="write read" sh
obdfilter-survey) we always have a performance limit on 1200 MB/s for write and read.
If we perform IOzone tests from five clients (2 threads per client, connected to the server with
Infiniband) we get more than 2500 MB/s.
Then we disconnected two sockets using command "echo 0 > /sys/devices/system/cpu/cpu5/online" on
every cpu belonging to these two sockets and we get expected results on obdfilter-survey (4600 MB/s
on write and 5500 MB/s on read). If we only disconnect one socket then obdfilter-survey gives us a
max of 1600 MB/s. Using only one socket results are slightly worse than with two sockets.
We also made these tests with Lustre 1.6, with other storage bays and with similar platforms (4
sockets and 8 cpus per socket) having always the same kind of problem. If we activate the
hyper-threading functionality on every socket then performances are even worse.
It's like if obdfilter-survey has any kind of saturation when there are many sockets. What do you
think? Thanks,