[LU-63] Study of the lustre performance with SFA10K Created: 07/Feb/11  Updated: 26/Jan/12  Resolved: 26/Jan/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Minor
Reporter: Shuichi Ihara (Inactive) Assignee: Niu Yawei (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Attachments: Microsoft Word sfa-benchmark.xlsx    
Rank (Obsolete): 10130

 Description   

Hi, I have been investigating the disk and lustre benchmark on SFA10K. Overall, the lustre performance on SFA is good. Getting 10GB/sec over the Lustre clients. However, I would have more reasonable benchmark procedures on the lustre.
What I say mean, we usually run the following benchmark steps to make sure each device or layer are healthy.
1. sgpdd-survey
2. obdfilter-survey
3. lnetself-test
4. IOR/iozone

Actually, we want to have a result which 1. sgp_dd-survey > 2. obdfilter-survey > 4.IOR/iozone due to each layer has small overhead. However, in my current testing, obdfilter-survey resutls are mostly faster than sgpdd-survey, or IOR/iozone results are better than obdfilter-survey in some configurations which does not make sense.

So, I wonder if we could improve each survey or procedure to get maximum performance on each layer.

I'm attaching the obdfilter-survey and sgpdd-survey results which I got on SFA10K.
I only used 1 RAID processor on SFA and connects to a server with QDR which means maximum bandwidth between OSS and SFA10K is 3.2GB/sec.

There are two questions at least.
1. Why I can only get around 2.5GB/sec on sgpdd-suvey. (single thread number was good, 3GB/sec, but other are not good)
2. Why read number on obdfilter-survey can't achieve 3GB/sec when number of objects are increased. Finally, it can get 3GB/sec, but need more thread to get 3GB/sec.

Ihara



 Comments   
Comment by Cliff White (Inactive) [ 07/Feb/11 ]

We are looking at your results, but I would note - when the single-threaded number is greater than multi-disk performance, that tends to implicate the disk array - have you tried different layouts in the array to see if that impacts performance?

Comment by Brian Murrell (Inactive) [ 07/Feb/11 ]

Shuichi,

I had a look at your data and compared your spgdd_survey results against your obdfilter_survey results where sgpdd_survey's "crg" values are compared to obdilter_survey's "obj" values. Unfortunately your two test runs do not scale out to the same extremes on both axis so only a minimal comparison can be done. We can compare sgpdd_survey results to obdfilter_survey results for crg/objs from 1 to 16 and threads 1 to 16.

The only tests where obdfilter_survey performed better than sgpdd_survey were for writing (all reading tests were slower for obdfilter than sgpdd) 16 threads with crg/objs=2,4,8. At crg/objs=16 sgpdd_survey was once again higher. For crg/objs=2, 4, and 8, obdfilter_survey ran at 107, 110 and 103% of spgdd_survey.

Unfortunately the results (i.e. where they overlapped and could be compared directly) only just started getting interesting at the end of the comparable values. It might be more illustrative if the sgpdd_survey results ran out to 128 threads (or some value higher than 16 at least) and obdfilter_survey results went beyond 32 objs.

Comment by Shuichi Ihara (Inactive) [ 07/Feb/11 ]

Cliff, Brian,

thanks for your comments. In sgpdd-survey, crg=1 case is quite good, still to multiple disks.
And, I think sgpdd_survey doesn't work due to thr= option in sgp_dd only supports up to 16. If we have server which has memory, we can run obdfilter-survey with more threads (e.g thr=1024), but we can't see same threads results in sgpdd-survey. I'm thinking to run other tools to see raw device performance in detail.

Comment by Peter Jones [ 09/Feb/11 ]

Assign to Niu

Comment by Peter Jones [ 14/Jun/11 ]

Ihara

It seems that this ticket has not had any activity in a while. Is it still relevant to you? If so, what specifically would you like from us at this point?

Thanks

Peter

Comment by Shuichi Ihara (Inactive) [ 18/Jun/11 ]

ok, I have only two things on this.
1. I'm always getting better perfoarmnce obdfiler-suvery than sgp_dd survery. why?
2. the read performance on obdfilter-ruvery, with obj=1 and obj=2, we see good scaling results, but other nubmer of object case, we see storange line on the graph. The peak number is mostly same though.

Comment by Niu Yawei (Inactive) [ 19/Jun/11 ]

Hi, Ihara

1. As I know, in some fat core/multiple threads testing environment, the performance of obdfilter-survey will be better than sgpdd-survey, the reason might be that the sgpdd-survey has to transfer the data between user and kernel space, and the overhead of such copy sometimes is very heavey. obdfilter-survey allocate memory in kernel space directly, so it needn't transfer the data between userspace and kernel sapce. When the thread number is limited, sgpdd-suvery usually performs little better than obdfilter-survey.

2. Whenever the peak number doesn't grow anymore, I think we must hit bottleneck somewhere. In your case, I think the bottleneck might be caused by the overhead of thread switching.

Comment by Brian Murrell (Inactive) [ 20/Jun/11 ]

1. I'm always getting better perfoarmnce obdfiler-suvery than sgp_dd survery. why?

I wonder if this could be due to spindle distance. sgpdd_survey at least at one time, if not currently (still) did all of it's work at the start of the disk (i.e. on the outer tracks). I recall working (but don't recall specifically landing) a patch to deal with this "optimistic" behavior. IIRC, sgp_dd's work was spread out over the distance between the outer and inner tracks to give a better indication of "average" performance. Again, IIRC this behavior was defaulted with the option to revert to the "optimistic" behavior with a command-line switch.

Looks like that work was done in bug 17218 but by the looks of the bug, it was never landed, so maybe this is not an issue. Perhaps worth checking the copy of spgdd_survey that you have though to see where on the disk it does it's work.

I don't know if obdfilter_survey was written to provide this "average" performance value or if it's reporting the "hero" number as well. Given that it's just working through the ldiskfs code, I'd suspect it's the "hero" number since the OSTs would be empty.

Comment by Peter Jones [ 26/Jan/12 ]

Ihara

Is this ticket still relevant or can it now be closed?

Please advise

Peter

Comment by Shuichi Ihara (Inactive) [ 26/Jan/12 ]

Peter, finally, we found the good Lustre/IO survey and results. please close this ticket. thanks!

Comment by Peter Jones [ 26/Jan/12 ]

Glad to hear it!

Generated at Sat Feb 10 01:03:21 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.