[LU-817] sgpdd-survey is encountering r/w errors on arrays using 2TB drives Created: 02/Nov/11 Updated: 17/Feb/12 Resolved: 08/Feb/12 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.2.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Joe Mervini | Assignee: | Cliff White (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Dell R710 running RHEL6.1 or Toss2.0Beta2, QDR IB attached DDN SFA10k-x |
||
| Attachments: |
|
| Severity: | 2 |
| Rank (Obsolete): | 4748 |
| Description |
|
Just curious if you know whether there are any known issues with running sgpdd-survey against 2TB drives. In my testing when I get above 8 concurrent regions and 32 threads/virtual disk I start seeing errors. When I run the same test on the same server/OS configuration against an older 9550 with 250GB drives (SDR) I don't get the errors. I have seen some output from sgpdd-survey on the web from some testing that the University of Florida did that is similar to my results (http://wiki.hpc.ufl.edu/index.php/Adaptec_51645_Performance). The common denominator is the size of the drive. This can be a simple yes or no answer, but if you have any addition insight that you can pass along that would be appreciated as well. Thanks. |
| Comments |
| Comment by Peter Jones [ 02/Nov/11 ] | ||||||||
|
Cliff Could you please help on this one Thanks Peter | ||||||||
| Comment by Cliff White (Inactive) [ 02/Nov/11 ] | ||||||||
|
I have not heard of any issues like this - you should ask on wc-discuss also. I am checking with various engineers atm. | ||||||||
| Comment by Cliff White (Inactive) [ 02/Nov/11 ] | ||||||||
|
Sgpdd is memory-hungry, have you checked memory consumption? also, it might help to look in the .details file for more information on the error. Can you post the output from a run? | ||||||||
| Comment by Joe Mervini [ 03/Nov/11 ] | ||||||||
|
So after messing with the script some I got sgpdd-survey to run a little longer without breaking by adjusting the bpt parameter (cutting it in half). But now I am get "SCSI status: task set full" errors which I guess means I'm saturating the queue. (I am attaching details as requested.) The thing I'm unsure of is where the limitation lies - with the raid controllers or as I mentioned before because we're running the 2TB drives or whether this is expected behavior or not. | ||||||||
| Comment by Cliff White (Inactive) [ 03/Nov/11 ] | ||||||||
|
I don't see any errors in the output you sent, can you attach some of the errors? We don't expect that the test itself would have issues, this is not expected behavior rather the test appears to be saturating some part of your system. I don't think the size of the disks is a direct cause, what other differences are there between the new system and the old? | ||||||||
| Comment by Joe Mervini [ 03/Nov/11 ] | ||||||||
|
Cliff, On the server side we have plenty of memory - 24GB on westmere based Dell R710s and during the run it is basically completely free with 240 sgp_dd threads running, Slab = ~ 130MB. There are differences in the interfacing though. Between the old and the new both are host connected via infiniband with the new being QDR vs SRD. On the storage side the disk trays on the old are connected to the controllers (DDN 9550) via fiber channel. On the new connection to the trays is via SAS to DDN SFA10K controllers. One thing I failed to mention (my oversight) is that the previous attachment was based on runs using direct io. Those were the settings that would allow me to run without it blowing up on launch (It's been quite a while since I have used sgp_dd and I had assumed that sgpdd-survey was using dio which apparently it doesn't. But in all of our recent testing we have been using direct IO flags so I figured I'd use the same parameters.) In any event, the data that I just posted is using an out-of-the-box copy of sgpdd-survey with all the defaults left in place. As you can see, I am still getting the task full messages. | ||||||||
| Comment by Cliff White (Inactive) [ 04/Nov/11 ] | ||||||||
|
The errors aren't telling me much, are there errrors in /var/log/messages or dmesg to correspond? | ||||||||
| Comment by Joe Mervini [ 05/Nov/11 ] | ||||||||
|
Yes and no. /var/log/messages doesn't have any thing of value but dmesg is flooded with messages like this for every device: sd 6:0:0:3: timing out command, waited 60s | ||||||||
| Comment by Cliff White (Inactive) [ 05/Nov/11 ] | ||||||||
|
That's not good. I would suggest looking at the disk setup, controllers, etc. Looks like something in there is not happy. | ||||||||
| Comment by Cliff White (Inactive) [ 05/Nov/11 ] | ||||||||
|
I would also suggest you see if there is a scaling/load point where those messages start to happen. Are the disks happy with fewer threads, or fewer regions? 60 seconds (i assume 's' is seconds in this case) is a very long time to wait for IO completion. | ||||||||
| Comment by Shuichi Ihara (Inactive) [ 06/Nov/11 ] | ||||||||
|
I have a patch to sgpdd-survey for SFA10K and large LUN support. I'm not sure this is related to this issue, but will will post patch on gerrit later, thanks. | ||||||||
| Comment by Shuichi Ihara (Inactive) [ 07/Nov/11 ] | ||||||||
|
posted patch for sgpdd-suvery. http://review.whamcloud.com/#change,1658 sgpdd is doing 1MB I/O which is same size of stripe size on SFA10K if you have 8D+2P RAID6 with 128K chunk. This is fine, but the problem is that boundary size is still 512K for between concurrent regions per device, which we could see many un-aligned IOs on SFA. I've changed fixed boundary skip=1024+.. to configurable. Although the default size is still 512K to keep compatibility, you can set boundary=2048 to sgpdd-survery and test with full alined IOs. #boundary=2048 scsidevs="/dev/sdc /dev/sdd ..." sgpdd-survey | ||||||||
| Comment by Joe Mervini [ 07/Nov/11 ] | ||||||||
|
Sweet! I'll test again today with the new script and feedback the results. | ||||||||
| Comment by Eric Barton (Inactive) [ 07/Nov/11 ] | ||||||||
|
It would be worth checking that the arithmetic done in the script doesn't overflow. It just wasn't an issue with LUN size at the time this script was originally written... | ||||||||
| Comment by Shuichi Ihara (Inactive) [ 07/Nov/11 ] | ||||||||
|
Yes, I agree. The patch might not help original issue, but un-alined boundary with sgpdd-suvery is not good for SFA anyway, if we compare with obdfilter-suvery number as next step. - all caching I/O vs full-striped alined I/O. | ||||||||
| Comment by Joe Mervini [ 08/Nov/11 ] | ||||||||
|
I did testing with the patched script and got the same errors. However, when I backed off the queue_depth from 16 to 4 the test ran to completion. (Well, ran to the point where it ran out of memory but that completion afaic.) I went back and tested the with the original script with the smaller queue_depth. It wasn't until I hit 256 concurrent regions and 256 threads that I encountered any problems (still disk timeouts) but it did complete. So I'm wondering if Shuichi had similar experiences or whether he was testing against QDR IB attached SFA10Ks. | ||||||||
| Comment by Cliff White (Inactive) [ 08/Nov/11 ] | ||||||||
|
I am not sure if there is sgpdd data, but i recall we have had sites with SATA drives report better performance by limiting queue depth, and/or reducing the number of IO threads. | ||||||||
| Comment by Joe Mervini [ 09/Nov/11 ] | ||||||||
|
I got some information from DDN last night that fixed the timeout errors seen with sgpdd-survey. The setting for the stack ostype which defaults to generic must be set to custom with a value of 0x4. With this setting the standard sgpdd-survey runs to completion without rw or ENOMEM errors. For anyone watching this ticket that might be using the SFA10Ks, the method to show and configure this setting is: ddn10k-6 RAID[0]$ app show stack
Index | Stack Name | OS Type | Characteristics |Max|Cur| Max|Cur | Max|Cur | Total Stacks: 1 Wed Nov 9 17:14:30 2011
Index | Stack Name | OS Type | Characteristics |Max|Cur| Max|Cur | Max|Cur | Total Stacks: 1 Wed Nov 9 17:15:38 2011 | ||||||||
| Comment by Shuichi Ihara (Inactive) [ 06/Jan/12 ] | ||||||||
|
Joe, OK, I see. It seems that SFA was sending SCSI BUSY or SCSI TASK SET FULL to servers when it's the Queue Full. | ||||||||
| Comment by Cliff White (Inactive) [ 01/Feb/12 ] | ||||||||
|
Tested the patch on hyperion, reviewed and approved. Added a reviewer, would be good to land the patch | ||||||||
| Comment by Build Master (Inactive) [ 08/Feb/12 ] | ||||||||
|
Integrated in Result = SUCCESS
| ||||||||
| Comment by Build Master (Inactive) [ 08/Feb/12 ] | ||||||||
|
Integrated in Result = SUCCESS
| ||||||||
| Comment by Build Master (Inactive) [ 08/Feb/12 ] | ||||||||
|
Integrated in Result = SUCCESS
| ||||||||
| Comment by Build Master (Inactive) [ 08/Feb/12 ] | ||||||||
|
Integrated in Result = SUCCESS
| ||||||||
| Comment by Build Master (Inactive) [ 08/Feb/12 ] | ||||||||
|
Integrated in Result = SUCCESS
| ||||||||
| Comment by Build Master (Inactive) [ 08/Feb/12 ] | ||||||||
|
Integrated in Result = SUCCESS
| ||||||||
| Comment by Build Master (Inactive) [ 08/Feb/12 ] | ||||||||
|
Integrated in Result = SUCCESS
| ||||||||
| Comment by Build Master (Inactive) [ 08/Feb/12 ] | ||||||||
|
Integrated in Result = SUCCESS
| ||||||||
| Comment by Build Master (Inactive) [ 08/Feb/12 ] | ||||||||
|
Integrated in Result = SUCCESS
| ||||||||
| Comment by Build Master (Inactive) [ 08/Feb/12 ] | ||||||||
|
Integrated in Result = SUCCESS
| ||||||||
| Comment by Build Master (Inactive) [ 08/Feb/12 ] | ||||||||
|
Integrated in Result = SUCCESS
| ||||||||
| Comment by Build Master (Inactive) [ 08/Feb/12 ] | ||||||||
|
Integrated in Result = SUCCESS
| ||||||||
| Comment by Build Master (Inactive) [ 08/Feb/12 ] | ||||||||
|
Integrated in Result = SUCCESS
| ||||||||
| Comment by Build Master (Inactive) [ 08/Feb/12 ] | ||||||||
|
Integrated in Result = SUCCESS
| ||||||||
| Comment by Peter Jones [ 08/Feb/12 ] | ||||||||
|
Landed for 2.2 | ||||||||
| Comment by Build Master (Inactive) [ 17/Feb/12 ] | ||||||||
|
Integrated in Result = FAILURE
| ||||||||
| Comment by Build Master (Inactive) [ 17/Feb/12 ] | ||||||||
|
Integrated in Result = FAILURE
| ||||||||
| Comment by Build Master (Inactive) [ 17/Feb/12 ] | ||||||||
|
Integrated in Result = ABORTED
|