[LU-10510] Fix 'over max size limit' issue Created: 14/Jan/18  Updated: 15/Nov/19  Resolved: 29/May/18

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Yang Sheng Assignee: Yang Sheng
Resolution: Duplicate Votes: 0
Labels: None

Issue Links:
Duplicate
Related
is related to LU-9132 Tuning max_sectors_kb on mount Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Some sites have encountered issues with messages like:

May 10 13:54:50 oss02 kernel: blk_cloned_rq_check_limits: over max size limit.
May 10 13:54:50 oss02 kernel: blk_cloned_rq_check_limits: over max size limit.
May 10 13:54:50 oss02 kernel: device-mapper: multipath: Failing path 8:48.
May 10 13:54:50 oss02 kernel: blk_cloned_rq_check_limits: over max size limit.
May 10 13:54:50 oss02 kernel: blk_cloned_rq_check_limits: over max size limit.
May 10 13:54:50 oss02 kernel: device-mapper: multipath: Failing path 8:240.

This issue can cause corruption on storage side. It can be fixed by reverting an upstream patch or adding a udev script (but we will change max_sectors_kb value while mounting time, so it still can be triggered even with this script). The better way is to handle this value in mount.lustre tool. I'll make up a patch for master first.



 Comments   
Comment by Andreas Dilger [ 24/Jan/18 ]

Cliff hit this on the soak test cluster. He will collect grep . /sys/block/{dm*,sd*}/queue/max*sectors_kb for the affected MDT multipath device and the underlying SCSI devices (we don't need all of the others). My suspicion is that one of the underlying devices was reset for some reason, and max_sectors_kb is the default (maybe 128) but the multipath device is set larger (maybe 1024 or 16384) and this is causing the failures.

Using mount.lustre -o max_sectors_kb=0 will prevent Lustre from changing these tunables in the first place, which may hurt OST performance, but likely has a lesser effect on MDT performance. However, if the system is already in this state (underlying devices have lower max_sectors_kb than parent), this needs to be fixed manually to change the underlying settings (or a reboot would work).

Comment by Andreas Dilger [ 24/Jan/18 ]

Yang Sheng, can you please look at what could be done for creating a udev script to handle this? I guess the tricky part is that we don't want to install a udev script to cover all devices from the RPM package, we only want to affect the Lustre target devices.

One option would be to generate udev rules like /etc/udev/rules.d/99-lustre-<device>.rules at mkfs.lustre time for the multipath and children devices. That wouldn't help existing filesystems, and means users could see a performance regression when they upgraded to a version of mkfs.lustre that doesn't include this tuning, if users don't know the tunable is no longer applied by mount.lustre.

A second option would be to change tune_max_sectors_kb() to generate and install udev tuning rules for each mounted device and children if they do not already exist. This would be least complex, but possibly a surprising action for mount.lustre to take. However, it is not worse than our current practice of changing the block device tunables at mount time

A third option (possibly in addition to the first) would be to change tune_max_sectors_kb() to complain about the lack of udev tuning rules for the device and children, or if the multipath child device settings do not match the parent, and indicate how to create the udev rules, but not actually install the rules. Then, the admin can create such a rule and add any tunables desired to quiet mount.lustre, and this should avoid having it do the tuning itself.

Comment by Chris Hunter (Inactive) [ 09/Feb/18 ]

Known issue certain kernels, related to mismatch in max_sectors_kb setting between dm devices (eg. mutipath) and underlying block devices.
If you have redhat account you can read their KB articles:

https://access.redhat.com/solutions/247991

https://access.redhat.com/solutions/145163
 

Comment by Yang Sheng [ 29/May/18 ]

This issue should be fixed after https://review.whamcloud.com/31951/ landed.

Generated at Sat Feb 10 02:35:44 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.