Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.0.0
-
None
-
RHEL 6.0 GA, ofed1.5.2, Lustre 2.0.0.1, Mellanox QDR Ib cards
Description
When the lustre servers have their MDT or OST configured with multipath devices, there are I/O errors that can lead to a server crash.
The following error appears in the system log:
1301326968 2011 Mar 28 17:42:48 berlin4 kern err kernel blk_rq_check_limits: over max size limit.
Followed by several I/O errors
1301326968 2011 Mar 28 17:42:48 berlin4 kern err kernel end_request: I/O error, dev dm-10, sector 26624
1301326968 2011 Mar 28 17:42:48 berlin4 kern err kernel end_request: I/O error, dev dm-10, sector 24576
1301326968 2011 Mar 28 17:42:48 berlin4 kern err kernel end_request: I/O error, dev dm-10, sector 22528
...
Here is the corresponding code in blk_rq_check_limits() routine:
int blk_rq_check_limits(struct request_queue *q, struct request *rq)
{
if (blk_rq_sectors(rq) > queue_max_sectors(q) ||
blk_rq_bytes(rq) > queue_max_hw_sectors(q) << 9)
...
This error comes from the block device tuning performed by lustre.mount in set_blockdev_tunables() routine. The max_sectors_kb value of the multipath device (dm-10) is updated from 1024 to 32767 which is the value of max_hw_sectors_kb. However, the slave devices are not tuned, which leads to build block requests that cannot be handled by the slave devices.
A workaround to this issue is to tune the slave devices max_sectors_kb value with its max_hw_sectors_kb value.
In attachment is a patch of set_blockdev_tunables() routines to make it support multipath device case.