<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:27:11 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-9551] I/O errors when lustre uses multipath devices</title>
                <link>https://jira.whamcloud.com/browse/LU-9551</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;When the lustre servers have  OST configured with multipath devices, there are I/O errors that can lead to a server crash.&lt;/p&gt;

&lt;p&gt;The following error appears in the system log:&lt;br/&gt;
Mar 31 00:02:44 oss01 kernel: blk_cloned_rq_check_limits: over max size limit.&lt;br/&gt;
Mar 31 00:02:44 oss01 kernel: device-mapper: multipath: Failing path 8:160.&lt;/p&gt;

&lt;p&gt;Followed by several I/O errors&lt;br/&gt;
Mar 31 00:17:30 oss01 kernel: blk_update_request: I/O error, dev dm-17, sector 1182279680&lt;br/&gt;
Mar 31 00:17:30 oss01 kernel: blk_update_request: I/O error, dev dm-17, sector 1182291968&lt;br/&gt;
Mar 31 00:17:30 oss01 kernel: blk_update_request: I/O error, dev dm-17, sector 1182267392&lt;br/&gt;
Mar 31 00:17:30 oss01 kernel: blk_update_request: I/O error, dev dm-17, sector 1182304256&lt;br/&gt;
Mar 30 21:04:22 oss01 kernel: LDISKFS-fs (dm-17): Remounting filesystem read-only&lt;/p&gt;
</description>
                <environment>CentOS Linux release 7.3.1611 (Core),OFED.3.4.2.0.0.1,lustre-2.7.19.8,Mellanox Technologies MT27500 Family</environment>
        <key id="46284">LU-9551</key>
            <summary>I/O errors when lustre uses multipath devices</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="utopiabound">Nathaniel Clark</assignee>
                                    <reporter username="shenxm">xiangmin shen</reporter>
                        <labels>
                    </labels>
                <created>Wed, 24 May 2017 07:03:51 +0000</created>
                <updated>Fri, 3 Dec 2021 22:40:56 +0000</updated>
                            <resolved>Fri, 13 Apr 2018 18:54:19 +0000</resolved>
                                    <version>Lustre 2.7.0</version>
                                    <fixVersion>Lustre 2.12.0</fixVersion>
                    <fixVersion>Lustre 2.10.4</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>11</watches>
                                                                            <comments>
                            <comment id="199719" author="chunteraa" created="Tue, 20 Jun 2017 14:34:00 +0000"  >&lt;p&gt;Message &quot; blk_cloned_rq_check_limits&quot;  seen on non-lustre filesystems, believed caused by upstream commit to 4.3 kernel&lt;br/&gt;
&lt;a href=&quot;https://patchwork.kernel.org/patch/8307491/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://patchwork.kernel.org/patch/8307491/&lt;/a&gt;&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Feb. 14, 2016, 10:20 p.m.
From: Hannes Reinecke &amp;lt;hare@suse.de&amp;gt;
commit bf4e6b4e757488dee1b6a581f49c7ac34cd217f8 upstream.

When a cloned request is retried on other queues it always needs
to be checked against the queue limits of that queue.
Otherwise the calculations for nr_phys_segments might be wrong,
leading to a crash in scsi_init_sgtable().
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="209682" author="mhaakddn" created="Wed, 27 Sep 2017 01:07:08 +0000"  >&lt;p&gt;We just hit this at ANU. The fix is to ensure that max_sectors_kb is &apos;large enough&apos;.&lt;/p&gt;

&lt;p&gt;We had an issue where multipath was generating 1MB I/Os (as that&apos;s what lustre was configured for) but the underlying /dev block devices had max_sectors_kb = 512&lt;/p&gt;

&lt;p&gt;I&apos;m not sure how that is possible, but naturally it was resolved by adding a udev rule to set max_sectors_kb=&amp;gt; 1024 but &amp;lt; max_hw_sectors_kb&lt;/p&gt;

&lt;p&gt;I&apos;m not sure if this is actually a lustre error or a multipath error. Based on my reading of &lt;a href=&quot;https://patchwork.kernel.org/patch/9140337/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://patchwork.kernel.org/patch/9140337/&lt;/a&gt;&lt;br/&gt;
this is resolved in a new enough kernel but it seems that there might be some patches that require backporting into Centos/RHEL&lt;/p&gt;

&lt;p&gt;EDIT: Interestingly this was only seen months after the filesystem went into production.&lt;/p&gt;

&lt;p&gt;EDIT: yes I know that patch is for ppc.. The conversation was relevant. &lt;/p&gt;</comment>
                            <comment id="209724" author="chunteraa" created="Wed, 27 Sep 2017 15:10:43 +0000"  >&lt;p&gt;One possible workaround is described in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9132&quot; title=&quot;Tuning max_sectors_kb on mount&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9132&quot;&gt;&lt;del&gt;LU-9132&lt;/del&gt;&lt;/a&gt; setting env variable &quot;MOUNT_LUSTRE_MAX_SECTORS_KB=0&quot;, which will stop mount.lustre from changing max_sectors_kb when mounting OSTs. The OSTs would retain the max_sectors_kb value set by your udev rules.&lt;/p&gt;</comment>
                            <comment id="209792" author="mhaakddn" created="Thu, 28 Sep 2017 00:12:16 +0000"  >&lt;p&gt;Has that been backported into 2.7/IEEL3?&lt;/p&gt;

&lt;p&gt;I can see that it exists in 2.10 and Master.&lt;/p&gt;

&lt;p&gt;Also It doesn&apos;t explain why we would get issues months after going live. The OST&apos;s were mounted and were not remounted.&lt;/p&gt;</comment>
                            <comment id="209793" author="mhaakddn" created="Thu, 28 Sep 2017 00:18:40 +0000"  >&lt;p&gt;Also this might not fix it. Our issue seemed to come from the fact that the backing devices behind multipath had been reset to the default 512 value. Not the multipath devices that lustre was mounted on. &lt;/p&gt;

&lt;p&gt;Our udev rules only change the backing devices/paths not the resulting dm-X devices lustre is mounted on&lt;/p&gt;

&lt;p&gt;Reading some of the discussions on the kernel.org threads it seems that also during failover between paths multipath can do the wrong thing and not check against max_sectors_kb and only check max_hw_sectors_kb. &lt;/p&gt;

&lt;p&gt;Previously, this would not have been an issue. But with the extra checks, this is clearly an issue.&lt;/p&gt;</comment>
                            <comment id="211252" author="mhaakddn" created="Mon, 16 Oct 2017 23:53:24 +0000"  >&lt;p&gt;The exact cause of our issues was discovered:&lt;/p&gt;

&lt;p&gt;Lustre had increased the values at mount, some paths went away and came back. They were set to default values upon return.&lt;/p&gt;

&lt;p&gt;Prior to the patch to the kernel this would not have been an issue, so for us the udev rule enforcing max on probe will resolve the issue &lt;/p&gt;</comment>
                            <comment id="216992" author="pjones" created="Thu, 21 Dec 2017 18:49:24 +0000"  >&lt;p&gt;This is fixed in more current releases&lt;/p&gt;</comment>
                            <comment id="221982" author="gerrit" created="Wed, 28 Feb 2018 22:31:19 +0000"  >&lt;p&gt;Nathaniel Clark (nathaniel.l.clark@intel.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/31464&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/31464&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9551&quot; title=&quot;I/O errors when lustre uses multipath devices&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9551&quot;&gt;&lt;del&gt;LU-9551&lt;/del&gt;&lt;/a&gt; utils: add l_tunedisk to fix disk tunings&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 681f208b5ec25a12eeac5a7c1cea238154ffd6ff&lt;/p&gt;</comment>
                            <comment id="225468" author="gerrit" created="Mon, 9 Apr 2018 19:45:32 +0000"  >&lt;p&gt;Oleg Drokin (oleg.drokin@intel.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/31464/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/31464/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9551&quot; title=&quot;I/O errors when lustre uses multipath devices&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9551&quot;&gt;&lt;del&gt;LU-9551&lt;/del&gt;&lt;/a&gt; utils: add l_tunedisk to fix disk tunings&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 892280742a2b6347df1464379b3ed223b2961ed4&lt;/p&gt;</comment>
                            <comment id="225523" author="pjones" created="Mon, 9 Apr 2018 20:25:33 +0000"  >&lt;p&gt;Landed for 2.12&lt;/p&gt;</comment>
                            <comment id="225749" author="gerrit" created="Wed, 11 Apr 2018 15:20:54 +0000"  >&lt;p&gt;Minh Diep (minh.diep@intel.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/31951&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/31951&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9551&quot; title=&quot;I/O errors when lustre uses multipath devices&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9551&quot;&gt;&lt;del&gt;LU-9551&lt;/del&gt;&lt;/a&gt; utils: add l_tunedisk to fix disk tunings&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_10&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: b30fb047c12c6354df2d81e2a0cd5dd21852f6b3&lt;/p&gt;</comment>
                            <comment id="225768" author="chunteraa" created="Wed, 11 Apr 2018 16:35:24 +0000"  >&lt;p&gt;The old mount method in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-275&quot; title=&quot;I/O errors when lustre uses multipath devices&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-275&quot;&gt;&lt;del&gt;LU-275&lt;/del&gt;&lt;/a&gt; sets value from sysfs/block parameter max_hw_sectors_kb. &lt;/p&gt;

&lt;p&gt;However due to bugs in the transport protocol this value can be wrong (&lt;a href=&quot;https://patchwork.kernel.org/patch/7614871/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://patchwork.kernel.org/patch/7614871/&lt;/a&gt;; &lt;a href=&quot;https://patchwork.kernel.org/patch/6662311/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://patchwork.kernel.org/patch/6662311/&lt;/a&gt;) and produce an error when used by lustre mount command. &lt;br/&gt;
Feature in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9132&quot; title=&quot;Tuning max_sectors_kb on mount&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9132&quot;&gt;&lt;del&gt;LU-9132&lt;/del&gt;&lt;/a&gt; to adjust mount behaviour would be useful in this scenario.&lt;/p&gt;</comment>
                            <comment id="225893" author="gerrit" created="Thu, 12 Apr 2018 16:35:45 +0000"  >&lt;p&gt;John L. Hammond (john.hammond@intel.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/31951/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/31951/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9551&quot; title=&quot;I/O errors when lustre uses multipath devices&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9551&quot;&gt;&lt;del&gt;LU-9551&lt;/del&gt;&lt;/a&gt; utils: add l_tunedisk to fix disk tunings&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_10&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 3281d5d57cec9d6deaa50cb4d9ec9509e3d03507&lt;/p&gt;</comment>
                            <comment id="225982" author="mdiep" created="Fri, 13 Apr 2018 14:23:26 +0000"  >&lt;p&gt;This patch caused &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-10898&quot; title=&quot;conf-sanity test 32a and 32d fail with &#8216;rmmod: ERROR: Module zfs is in use&#8217;&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-10898&quot;&gt;&lt;del&gt;LU-10898&lt;/del&gt;&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="226019" author="pjones" created="Fri, 13 Apr 2018 18:54:19 +0000"  >&lt;p&gt;It looks like it is going to be fixed under &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-10898&quot; title=&quot;conf-sanity test 32a and 32d fail with &#8216;rmmod: ERROR: Module zfs is in use&#8217;&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-10898&quot;&gt;&lt;del&gt;LU-10898&lt;/del&gt;&lt;/a&gt; rather than reverted so keeping as resolved&lt;/p&gt;</comment>
                            <comment id="228841" author="utopiabound" created="Wed, 30 May 2018 14:07:29 +0000"  >&lt;p&gt;This got reverted on b2_10, but it didn&apos;t actually cause &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-10898&quot; title=&quot;conf-sanity test 32a and 32d fail with &#8216;rmmod: ERROR: Module zfs is in use&#8217;&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-10898&quot;&gt;&lt;del&gt;LU-10898&lt;/del&gt;&lt;/a&gt; (afaik).&#160; ZED holds zfs open if it&apos;s running.&#160; Can we re-land this?&#160; Should I resubmit?&lt;/p&gt;</comment>
                            <comment id="228843" author="pjones" created="Wed, 30 May 2018 14:23:20 +0000"  >&lt;p&gt;Yes we want to resubmit it&lt;/p&gt;</comment>
                            <comment id="228856" author="gerrit" created="Wed, 30 May 2018 17:10:28 +0000"  >&lt;p&gt;Nathaniel Clark (nathaniel.l.clark@intel.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/32583&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/32583&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9551&quot; title=&quot;I/O errors when lustre uses multipath devices&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9551&quot;&gt;&lt;del&gt;LU-9551&lt;/del&gt;&lt;/a&gt; utils: add l_tunedisk to fix disk tunings&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_10&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 1743fa638e8fdbe16e6cfd33dd91c24fa5047492&lt;/p&gt;</comment>
                            <comment id="231255" author="gerrit" created="Wed, 1 Aug 2018 17:11:45 +0000"  >&lt;p&gt;John L. Hammond (jhammond@whamcloud.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/32583/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/32583/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9551&quot; title=&quot;I/O errors when lustre uses multipath devices&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9551&quot;&gt;&lt;del&gt;LU-9551&lt;/del&gt;&lt;/a&gt; utils: add l_tunedisk to fix disk tunings&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_10&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 703d418908fa32f60decc3bd535e77784d2721c6&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                            <outwardlinks description="duplicates">
                                        <issuelink>
            <issuekey id="43888">LU-9132</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is duplicated by">
                                        <issuelink>
            <issuekey id="46285">LU-9552</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="51741">LU-10898</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="55014">LU-12029</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="55853">LU-12387</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="53803">LU-11563</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="56305">LU-12530</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="54204">LU-11736</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="26791" name="oss01.log" size="890733" author="shenxm" created="Wed, 24 May 2017 06:43:23 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10040" key="com.atlassian.jira.plugin.system.customfieldtypes:labels">
                        <customfieldname>Epic</customfieldname>
                        <customfieldvalues>
                                        <label>mount</label>
            <label>server</label>
    
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10030" key="com.atlassian.jira.plugin.system.customfieldtypes:labels">
                        <customfieldname>Epic/Theme</customfieldname>
                        <customfieldvalues>
                                        <label>centos7.3</label>
            <label>lustre-2.7.19.8</label>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzzdin:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>