[LU-275] I/O errors when lustre uses multipath devices Created: 04/May/11  Updated: 14/Jun/11  Resolved: 13/Jun/11

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.0.0
Fix Version/s: Lustre 2.1.0

Type: Bug Priority: Major
Reporter: Gregoire Pichon Assignee: Zhenyu Xu
Resolution: Fixed Votes: 0
Labels: None
Environment:

RHEL 6.0 GA, ofed1.5.2, Lustre 2.0.0.1, Mellanox QDR Ib cards


Attachments: Text File set_blockdev_tunables_multipath.patch    
Severity: 3
Epic: mount, multipath, tuning
Rank (Obsolete): 4992

 Description   

When the lustre servers have their MDT or OST configured with multipath devices, there are I/O errors that can lead to a server crash.

The following error appears in the system log:
1301326968 2011 Mar 28 17:42:48 berlin4 kern err kernel blk_rq_check_limits: over max size limit.

Followed by several I/O errors
1301326968 2011 Mar 28 17:42:48 berlin4 kern err kernel end_request: I/O error, dev dm-10, sector 26624
1301326968 2011 Mar 28 17:42:48 berlin4 kern err kernel end_request: I/O error, dev dm-10, sector 24576
1301326968 2011 Mar 28 17:42:48 berlin4 kern err kernel end_request: I/O error, dev dm-10, sector 22528
...

Here is the corresponding code in blk_rq_check_limits() routine:
int blk_rq_check_limits(struct request_queue *q, struct request *rq)
{
if (blk_rq_sectors(rq) > queue_max_sectors(q) ||
blk_rq_bytes(rq) > queue_max_hw_sectors(q) << 9)

{ printk(KERN_ERR "%s: over max size limit.\n", __func__); return -EIO; }

...

This error comes from the block device tuning performed by lustre.mount in set_blockdev_tunables() routine. The max_sectors_kb value of the multipath device (dm-10) is updated from 1024 to 32767 which is the value of max_hw_sectors_kb. However, the slave devices are not tuned, which leads to build block requests that cannot be handled by the slave devices.

A workaround to this issue is to tune the slave devices max_sectors_kb value with its max_hw_sectors_kb value.

In attachment is a patch of set_blockdev_tunables() routines to make it support multipath device case.



 Comments   
Comment by Peter Jones [ 05/May/11 ]

Gregoire

Could you please upload your suggested change as a patch in gerrit. Then we can look into testing and landing this change

Thanks

Peter

Comment by Zhenyu Xu [ 05/May/11 ]

patch with minor change tracking at http://review.whamcloud.com/504

Comment by Gregoire Pichon [ 06/May/11 ]

Zhenyu,

I have detected a bug in my patch proposal.
Line 518 is missing a call to "globfree(&glob_info)" before the error case return.

thanks,
Grégoire.

Comment by Zhenyu Xu [ 06/May/11 ]

thanks, looks like globfree() should be called even if glob() returns non-zero value.

Comment by Build Master (Inactive) [ 13/Jun/11 ]

Integrated in lustre-master » x86_64,client,el5,ofa #166
LU-275 I/O errors when lustre uses multipath devices

Oleg Drokin : 515fd66ef9443ad6d95ff23bd865eb7923ab6eb6
Files :

  • lustre/utils/mount_lustre.c
Comment by Build Master (Inactive) [ 13/Jun/11 ]

Integrated in lustre-master » x86_64,client,sles11,inkernel #166
LU-275 I/O errors when lustre uses multipath devices

Oleg Drokin : 515fd66ef9443ad6d95ff23bd865eb7923ab6eb6
Files :

  • lustre/utils/mount_lustre.c
Comment by Build Master (Inactive) [ 13/Jun/11 ]

Integrated in lustre-master » i686,client,el5,inkernel #166
LU-275 I/O errors when lustre uses multipath devices

Oleg Drokin : 515fd66ef9443ad6d95ff23bd865eb7923ab6eb6
Files :

  • lustre/utils/mount_lustre.c
Comment by Build Master (Inactive) [ 13/Jun/11 ]

Integrated in lustre-master » x86_64,client,ubuntu1004,inkernel #166
LU-275 I/O errors when lustre uses multipath devices

Oleg Drokin : 515fd66ef9443ad6d95ff23bd865eb7923ab6eb6
Files :

  • lustre/utils/mount_lustre.c
Comment by Build Master (Inactive) [ 13/Jun/11 ]

Integrated in lustre-master » i686,server,el5,ofa #166
LU-275 I/O errors when lustre uses multipath devices

Oleg Drokin : 515fd66ef9443ad6d95ff23bd865eb7923ab6eb6
Files :

  • lustre/utils/mount_lustre.c
Comment by Build Master (Inactive) [ 13/Jun/11 ]

Integrated in lustre-master » x86_64,client,el6,inkernel #166
LU-275 I/O errors when lustre uses multipath devices

Oleg Drokin : 515fd66ef9443ad6d95ff23bd865eb7923ab6eb6
Files :

  • lustre/utils/mount_lustre.c
Comment by Build Master (Inactive) [ 13/Jun/11 ]

Integrated in lustre-master » x86_64,server,el5,ofa #166
LU-275 I/O errors when lustre uses multipath devices

Oleg Drokin : 515fd66ef9443ad6d95ff23bd865eb7923ab6eb6
Files :

  • lustre/utils/mount_lustre.c
Comment by Build Master (Inactive) [ 13/Jun/11 ]

Integrated in lustre-master » x86_64,server,el5,inkernel #166
LU-275 I/O errors when lustre uses multipath devices

Oleg Drokin : 515fd66ef9443ad6d95ff23bd865eb7923ab6eb6
Files :

  • lustre/utils/mount_lustre.c
Comment by Build Master (Inactive) [ 13/Jun/11 ]

Integrated in lustre-master » x86_64,client,ubuntu1004,ofa #166
LU-275 I/O errors when lustre uses multipath devices

Oleg Drokin : 515fd66ef9443ad6d95ff23bd865eb7923ab6eb6
Files :

  • lustre/utils/mount_lustre.c
Comment by Build Master (Inactive) [ 13/Jun/11 ]

Integrated in lustre-master » i686,server,el5,inkernel #166
LU-275 I/O errors when lustre uses multipath devices

Oleg Drokin : 515fd66ef9443ad6d95ff23bd865eb7923ab6eb6
Files :

  • lustre/utils/mount_lustre.c
Comment by Build Master (Inactive) [ 13/Jun/11 ]

Integrated in lustre-master » i686,client,el5,ofa #166
LU-275 I/O errors when lustre uses multipath devices

Oleg Drokin : 515fd66ef9443ad6d95ff23bd865eb7923ab6eb6
Files :

  • lustre/utils/mount_lustre.c
Comment by Build Master (Inactive) [ 13/Jun/11 ]

Integrated in lustre-master » i686,server,el6,inkernel #166
LU-275 I/O errors when lustre uses multipath devices

Oleg Drokin : 515fd66ef9443ad6d95ff23bd865eb7923ab6eb6
Files :

  • lustre/utils/mount_lustre.c
Comment by Build Master (Inactive) [ 13/Jun/11 ]

Integrated in lustre-master » x86_64,server,el6,inkernel #166
LU-275 I/O errors when lustre uses multipath devices

Oleg Drokin : 515fd66ef9443ad6d95ff23bd865eb7923ab6eb6
Files :

  • lustre/utils/mount_lustre.c
Comment by Build Master (Inactive) [ 13/Jun/11 ]

Integrated in lustre-master » i686,client,el6,inkernel #166
LU-275 I/O errors when lustre uses multipath devices

Oleg Drokin : 515fd66ef9443ad6d95ff23bd865eb7923ab6eb6
Files :

  • lustre/utils/mount_lustre.c
Comment by Zhenyu Xu [ 13/Jun/11 ]

landed on master branch for 2.1.0

Comment by Build Master (Inactive) [ 14/Jun/11 ]

Integrated in lustre-master » x86_64,client,el5,inkernel #170
LU-275 I/O errors when lustre uses multipath devices

Oleg Drokin : 515fd66ef9443ad6d95ff23bd865eb7923ab6eb6
Files :

  • lustre/utils/mount_lustre.c
Generated at Sat Feb 10 01:05:27 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.