[LU-1232] Input/Output error during large lun test Created: 19/Mar/12 Updated: 21/Mar/12 Resolved: 21/Mar/12 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.2.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Sarah Liu | Assignee: | Yang Sheng |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Environment: |
2.2-RC1-RHEL6 server and client |
||
| Attachments: |
|
| Severity: | 3 |
| Rank (Obsolete): | 6430 |
| Description |
|
Running large lun with 24T OST on Juelich cluster, got this error when running llverfs in full mode on the OST ldiskfs filesystem write filename: /mnt/ost1/dir00157/file025, current 787.039 MB/s, overall 100.624 MB/s, est 4294967248:4294967257:4294967237 left Please see the attached for console log and dmesg. |
| Comments |
| Comment by Peter Jones [ 19/Mar/12 ] |
|
Yangsheng Could you please advise on tihs one? Thanks Peter |
| Comment by Yang Sheng [ 20/Mar/12 ] |
|
This issue looks like cause by a hardware problem. sd 6:0:27:0: rejecting I/O to offline device LDISKFS-fs error (device dm-1): ldiskfs_find_entry: reading directory #22708225 offset 0 sd 6:0:27:0: rejecting I/O to offline device LDISKFS-fs error (device dm-1): ldiskfs_read_inode_bitmap: Cannot read inode bitmap - block_group = 177408, inode_bitmap = 5813305600 LDISKFS-fs error (device dm-1) in ldiskfs_new_inode: IO failure sd 6:0:27:0: rejecting I/O to offline device LDISKFS-fs (dm-1): delayed block allocation failed for inode 22544416 at logical offset 995328 with max blocks 2048 with error -5 This should not happen!! Data will be lost JBD2: Detected IO errors while flushing file data on dm-1-8 |
| Comment by Sarah Liu [ 20/Mar/12 ] |
|
I reran this test in partial mode, failed again. Please see the attached for console log and demsg. |
| Comment by Yang Sheng [ 20/Mar/12 ] |
|
From demsg_partial, it is very obvious a storage error: Buffer I/O error on device dm-0, logical block 65598925 lost page write due to I/O error on dm-0 sd 6:0:22:0: [sdu] Unhandled error code sd 6:0:22:0: [sdu] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK sd 6:0:22:0: [sdu] CDB: Write(10): 2a 00 1f 61 89 08 00 00 08 00 end_request: I/O error, dev sdu, sector 526485768 I'll look into RHEL6 bugzilla trying to found if has this kind issue with this driver. mpt2sas0: LSISAS2008: FWVersion(11.00.00.00), ChipRevision(0x03), BiosVersion(07.21.00.00) Thanks for the whole dmesg log. |
| Comment by Sarah Liu [ 20/Mar/12 ] |
|
I try to install tag-2.1.56 build again and got following error: Pid: 2080, comm: modprobe Not tainted 2.6.32-220.4.2.el6_lustre.gddd1a7c.x86_64 #1 SGI.COM C1104-2TY9/X8DTT-IBQF |
| Comment by Frank Heckes (Inactive) [ 21/Mar/12 ] |
|
Hi Sarah, you're right, two disks of the pool assigned the OSS nodes are broken: ID 5000c50040cf7d9d /dev/sdu ST2000NM0001 (2TB disk) I removed them from the JBOD. Could you remove them from the autotest resource file, till we receive the spare parts? These are too many HW failures in 3 month for such a little environment. I'll try to get in touch with our supplier whether there's a quality issue with the disks or maybe some problem with MPT driver, disk firmware or ... |
| Comment by Peter Jones [ 21/Mar/12 ] |
|
Thanks Frank. I am closing this ticket because it is now clear that it is not related to a Lustre software issue. |
| Comment by Sarah Liu [ 21/Mar/12 ] |
|
Thanks Frank, I will remove them from the script. |