<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:57:02 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-12946] Multipath path flapping issue</title>
                <link>https://jira.whamcloud.com/browse/LU-12946</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;The symptoms of this issue are high I/O latency seen on an external server from the multipath devices without associated latency on the underlying physical disks. The following messages are also associated with this issue:&lt;/p&gt;

&lt;p&gt;Oct 28 15:12:23 nvme1 kernel: device-mapper: multipath: Reinstating path 8:160.&lt;br/&gt;
Oct 28 15:12:23 nvme1 multipathd: 8:160: reinstated&lt;br/&gt;
Oct 28 15:12:23 nvme1 multipathd: 360001ff0b05e90000000002e8964000a: remaining active paths: 5&lt;br/&gt;
Oct 28 15:12:25 nvme1 multipathd: 360001ff0b05e90000000002e8964000a: switch to path group #1&lt;br/&gt;
Oct 28 15:12:25 nvme1 kernel: device-mapper: multipath: Failing path 8:160.&lt;br/&gt;
Oct 28 15:12:26 nvme1 multipathd: sdk: mark as failed&lt;br/&gt;
Oct 28 15:12:26 nvme1 multipathd: 360001ff0b05e90000000002e8964000a: remaining active paths: 4&lt;br/&gt;
Oct 28 15:12:27 nvme1 multipathd: 360001ff0b05e90000000002e8964000a: sdi - tur checker reports path is up&lt;br/&gt;
Oct 28 15:12:27 nvme1 multipathd: 8:128: reinstated&lt;br/&gt;
Oct 28 15:12:27 nvme1 multipathd: 360001ff0b05e90000000002e8964000a: remaining active paths: 5&lt;br/&gt;
Oct 28 15:12:27 nvme1 kernel: device-mapper: multipath: Reinstating path 8:128.&lt;br/&gt;
Oct 28 15:12:28 nvme1 multipathd: 360001ff0b05e90000000002e8964000a: sdk - tur checker reports path is up&lt;br/&gt;
Oct 28 15:12:28 nvme1 kernel: device-mapper: multipath: Reinstating path 8:160.&lt;/p&gt;

&lt;p&gt;Specifically, paths being failed by the device-mapper module itself (e.g. this message) without any other associated error:&lt;/p&gt;

&lt;p&gt;Oct 28 15:12:25 nvme1 kernel: device-mapper: multipath: Failing path 8:160.&lt;br/&gt;
Oct 28 15:12:26 nvme1 multipathd: sdk: mark as failed&lt;/p&gt;

&lt;p&gt;and then being reinstated shortly thereafter by the tur checker:&lt;/p&gt;

&lt;p&gt;Oct 28 15:12:28 nvme1 multipathd: 360001ff0b05e90000000002e8964000a: sdk - tur checker reports path is up&lt;br/&gt;
Oct 28 15:12:28 nvme1 kernel: device-mapper: multipath: Reinstating path 8:160.&lt;/p&gt;

&lt;p&gt;For a more definitive diagnosis one can run the attached systemtap script and the output should look like this (the key is the return/error fields set to 3 and the tail of the backtrace being fail_path called by multipath_end_io):&lt;/p&gt;

&lt;p&gt;blk_insert_cloned_request() return=3&lt;br/&gt;
dm_multipath:multipath_end_io() error=3&lt;br/&gt;
0xffffffffc03a46f0 : fail_path+0x0/0xe0 &lt;span class=&quot;error&quot;&gt;&amp;#91;dm_multipath&amp;#93;&lt;/span&gt;&lt;br/&gt;
0xffffffffc03a646a : multipath_end_io+0x4a/0x1a0 &lt;span class=&quot;error&quot;&gt;&amp;#91;dm_multipath&amp;#93;&lt;/span&gt;&lt;br/&gt;
0xffffffffc0019403 : dm_softirq_done+0xc3/0x2d0 &lt;span class=&quot;error&quot;&gt;&amp;#91;dm_mod&amp;#93;&lt;/span&gt;&lt;br/&gt;
0xffffffffb154fe5f : __blk_mq_complete_request+0xdf/0x1e0 &lt;span class=&quot;error&quot;&gt;&amp;#91;kernel&amp;#93;&lt;/span&gt;&lt;br/&gt;
0xffffffffb154ff87 : blk_mq_complete_request+0x27/0x30 &lt;span class=&quot;error&quot;&gt;&amp;#91;kernel&amp;#93;&lt;/span&gt;&lt;br/&gt;
0xffffffffc0018614 : dm_complete_request+0x34/0x60 &lt;span class=&quot;error&quot;&gt;&amp;#91;dm_mod&amp;#93;&lt;/span&gt;&lt;br/&gt;
0xffffffffc0018d40 : map_request+0x170/0x2a0 &lt;span class=&quot;error&quot;&gt;&amp;#91;dm_mod&amp;#93;&lt;/span&gt;&lt;br/&gt;
0xffffffffc001928e : dm_mq_queue_rq+0x8e/0x140 &lt;span class=&quot;error&quot;&gt;&amp;#91;dm_mod&amp;#93;&lt;/span&gt;&lt;br/&gt;
0xffffffffb1550d18 : blk_mq_dispatch_rq_list+0x268/0x620 &lt;span class=&quot;error&quot;&gt;&amp;#91;kernel&amp;#93;&lt;/span&gt;&lt;br/&gt;
0xffffffffb1555dee : blk_mq_do_dispatch_sched+0x7e/0x130 &lt;span class=&quot;error&quot;&gt;&amp;#91;kernel&amp;#93;&lt;/span&gt;&lt;br/&gt;
0xffffffffb15567ae : blk_mq_sched_dispatch_requests+0x11e/0x1c0 &lt;span class=&quot;error&quot;&gt;&amp;#91;kernel&amp;#93;&lt;/span&gt;&lt;br/&gt;
0xffffffffb154ec10 : __blk_mq_run_hw_queue+0x50/0xc0 &lt;span class=&quot;error&quot;&gt;&amp;#91;kernel&amp;#93;&lt;/span&gt;&lt;br/&gt;
0xffffffffb154ed38 : __blk_mq_delay_run_hw_queue+0xb8/0x110 &lt;span class=&quot;error&quot;&gt;&amp;#91;kernel&amp;#93;&lt;/span&gt;&lt;br/&gt;
0xffffffffb154ee57 : blk_mq_run_hw_queue+0x57/0x110 &lt;span class=&quot;error&quot;&gt;&amp;#91;kernel&amp;#93;&lt;/span&gt;&lt;br/&gt;
0xffffffffb1556c6e : blk_mq_sched_insert_requests+0x6e/0xc0 &lt;span class=&quot;error&quot;&gt;&amp;#91;kernel&amp;#93;&lt;/span&gt;&lt;br/&gt;
0xffffffffb1551d1c : blk_mq_flush_plug_list+0x19c/0x200 &lt;span class=&quot;error&quot;&gt;&amp;#91;kernel&amp;#93;&lt;/span&gt;&lt;br/&gt;
0xffffffffb154703e : blk_flush_plug_list+0xce/0x230 &lt;span class=&quot;error&quot;&gt;&amp;#91;kernel&amp;#93;&lt;/span&gt;&lt;br/&gt;
0xffffffffb15475b4 : blk_finish_plug+0x14/0x40 &lt;span class=&quot;error&quot;&gt;&amp;#91;kernel&amp;#93;&lt;/span&gt;&lt;br/&gt;
0xffffffffc0a15f3b &lt;span class=&quot;error&quot;&gt;&amp;#91;mmfslinux&amp;#93;&lt;/span&gt;&lt;br/&gt;
0xffffffffc0e5b174 &lt;span class=&quot;error&quot;&gt;&amp;#91;mmfs26&amp;#93;&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;The cause is a borked RedHat backport of this upstream patch (&lt;a href=&quot;https://github.com/torvalds/linux/commit/86ff7c2a80cd357f6156a53b354f6a0b357dc0c9&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/torvalds/linux/commit/86ff7c2a80cd357f6156a53b354f6a0b357dc0c9&lt;/a&gt;). Here&apos;s what I believe to be the fix which prevents the BLK_MQ_RQ_QUEUE_DEV_BUSY state from causing the request to be prematurely completed in an error state:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;&#8212; kernel-3.10.0-957.21.3.el7/linux-3.10.0-957.21.3.el7.x86_64/drivers/md/dm-rq.c 2019-06-14 06:29:35.000000000 +0000
+++ kernel-3.10.0-957.21.3.el7.patched/linux-3.10.0-957.21.3.el7.x86_64/drivers/md/dm-rq.c 2019-10-28 00:16:55.949220284 +0000
@@ -477,7 +477,7 @@

clone-&amp;gt;start_time = jiffies;
r = blk_insert_cloned_request(clone-&amp;gt;q, clone);

if (r != BLK_MQ_RQ_QUEUE_OK &amp;amp;&amp;amp; r != BLK_MQ_RQ_QUEUE_BUSY)
+ if (r != BLK_MQ_RQ_QUEUE_OK &amp;amp;&amp;amp; r != BLK_MQ_RQ_QUEUE_BUSY &amp;amp;&amp;amp; r != BLK_MQ_RQ_QUEUE_DEV_BUSY )
/* must complete clone in terms of original request */
dm_complete_request(rq, r);
return r;
@@ -661,7 +661,7 @@
trace_block_rq_remap(clone-&amp;gt;q, clone, disk_devt(dm_disk(md)),
blk_rq_pos(rq));
ret = dm_dispatch_clone_request(clone, rq);
if (ret == BLK_MQ_RQ_QUEUE_BUSY) {
+ if (ret == BLK_MQ_RQ_QUEUE_BUSY || ret == BLK_MQ_RQ_QUEUE_DEV_BUSY) {
blk_rq_unprep_clone(clone);
tio-&amp;gt;ti-&amp;gt;type-&amp;gt;release_clone_rq(clone);
tio-&amp;gt;clone = NULL;

&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</description>
                <environment></environment>
        <key id="57328">LU-12946</key>
            <summary>Multipath path flapping issue</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="wshilong">Wang Shilong</assignee>
                                    <reporter username="wshilong">Wang Shilong</reporter>
                        <labels>
                    </labels>
                <created>Thu, 7 Nov 2019 02:31:58 +0000</created>
                <updated>Wed, 17 Feb 2021 22:14:19 +0000</updated>
                            <resolved>Fri, 22 Nov 2019 20:01:51 +0000</resolved>
                                                    <fixVersion>Lustre 2.13.0</fixVersion>
                    <fixVersion>Lustre 2.14.0</fixVersion>
                    <fixVersion>Lustre 2.12.4</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>6</watches>
                                                                            <comments>
                            <comment id="257884" author="gerrit" created="Thu, 7 Nov 2019 02:32:51 +0000"  >&lt;p&gt;Wang Shilong (wshilong@ddn.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/36699&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/36699&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-12946&quot; title=&quot;Multipath path flapping issue&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-12946&quot;&gt;&lt;del&gt;LU-12946&lt;/del&gt;&lt;/a&gt; kernel: fix to handle BLK_MQ_RQ_QUEUE_DEV_BUSY event&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 709749d9af355a67dde4ef7d6ca12a09e873ae86&lt;/p&gt;</comment>
                            <comment id="258570" author="ruth.klundt@gmail.com" created="Wed, 20 Nov 2019 22:44:36 +0000"  >&lt;p&gt;We see this with RHEL 7.7 and lustre 2.10+. We confirmed the patch fixes the issue.&lt;/p&gt;

&lt;p&gt;Is there a RHEL bugzilla associated with this? Thanks.&#160;&lt;/p&gt;</comment>
                            <comment id="258651" author="chunteraa" created="Thu, 21 Nov 2019 18:53:33 +0000"  >&lt;p&gt;There is a private BZ:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;The patch has been identified and our Quality team is currently testing this patch. This fix will go in Rhel 7.8&lt;/p&gt;&lt;/blockquote&gt;</comment>
                            <comment id="258710" author="gerrit" created="Fri, 22 Nov 2019 19:59:51 +0000"  >&lt;p&gt;Oleg Drokin (green@whamcloud.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/36699/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/36699/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-12946&quot; title=&quot;Multipath path flapping issue&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-12946&quot;&gt;&lt;del&gt;LU-12946&lt;/del&gt;&lt;/a&gt; kernel: fix to handle BLK_MQ_RQ_QUEUE_DEV_BUSY event&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 5c8b1e87a97bbe7b05f0b8325e98c16a0de1ff4c&lt;/p&gt;</comment>
                            <comment id="258712" author="ofaaland" created="Fri, 22 Nov 2019 20:12:34 +0000"  >&lt;p&gt;Should this be backported to b2_12?&lt;/p&gt;</comment>
                            <comment id="258716" author="pjones" created="Fri, 22 Nov 2019 22:08:11 +0000"  >&lt;p&gt;Yes &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.whamcloud.com/images/icons/emoticons/smile.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/p&gt;</comment>
                            <comment id="258831" author="gerrit" created="Tue, 26 Nov 2019 15:15:50 +0000"  >&lt;p&gt;Minh Diep (mdiep@whamcloud.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/36868&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/36868&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-12946&quot; title=&quot;Multipath path flapping issue&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-12946&quot;&gt;&lt;del&gt;LU-12946&lt;/del&gt;&lt;/a&gt; kernel: fix to handle BLK_MQ_RQ_QUEUE_DEV_BUSY event&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_12&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: b0d536927e7c882dae1d59d7b01c8b7242a243aa&lt;/p&gt;</comment>
                            <comment id="259773" author="gerrit" created="Thu, 12 Dec 2019 23:06:27 +0000"  >&lt;p&gt;Oleg Drokin (green@whamcloud.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/36868/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/36868/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-12946&quot; title=&quot;Multipath path flapping issue&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-12946&quot;&gt;&lt;del&gt;LU-12946&lt;/del&gt;&lt;/a&gt; kernel: fix to handle BLK_MQ_RQ_QUEUE_DEV_BUSY event&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_12&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 849e1a5cbcd7025a19611277b14c5605c0dffefa&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                                        </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i00p3z:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>