<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:42:11 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-11241] Unable to umount during MDT fail back</title>
                <link>https://jira.whamcloud.com/browse/LU-11241</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;env: 2.10.5-RC1 b2_10-ib build #70 &lt;/p&gt;

&lt;p&gt;During soak test, I found the failover MDT(MDT3) can not be umounted on MDS2, which cause the MDT can not be fall back to the original MDS(MDS3), then the whole recovery process stuck.&lt;/p&gt;

&lt;p&gt;MDS 2 console&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;LustreError: 0-0: Forced cleanup waiting for soaked-MDT0000-osp-MDT0003 namespace with 3 resources in use, (rc=-110)
 Lustre: 3899:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1534181819/real 1534181819]  req@ffffa06fd3332a00 x1608657214545248/t0(0) o38-&amp;gt;soaked-MDT0003-osp-MDT0002@192.168.1.111@o2ib:24/4 lens 520/544 e 0 to 1 dl 1534181830 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1^M
[58161.752944] Lustre: 3899:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 6 previous similar messages^M
LustreError: 0-0: Forced cleanup waiting for soaked-MDT0000-osp-MDT0003 namespace with 3 resources in use, (rc=-110)
LustreError: 0-0: Forced cleanup waiting for soaked-MDT0000-osp-MDT0003 namespace with 3 resources in use, (rc=-110)
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The happened after 30+ hours running (36 times successful recovery), and it only failover MDS during the test.&lt;/p&gt;

&lt;p&gt;On MDS0,  I can not see the stack trace in console log but found this, is this an known issue?&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[52069.665828] LNet: 3948:0:(linux-debug.c:185:libcfs_call_trace()) can&apos;t show stack: kernel doesn&apos;t export show_task^M
[52069.681120] LNet: 3948:0:(linux-debug.c:185:libcfs_call_trace()) Skipped 3 previous similar messages^M
[52069.694717] LustreError: dumping log to /tmp/lustre-log.1534184159.5310^M
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Also there are many multipath error seen on all MDS during failover. BTW, when the system first time mounted before soak started, this kind of error caused MDS2 mount hung, but after reboot, it seems gone, not sure if it is related hardware problem. &lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[64147.395418] device-mapper: multipath: Reinstating path 8:128.^M
[64147.403572] device-mapper: multipath: Failing path 8:128.^M
[64149.405305] device-mapper: multipath: Reinstating path 8:96.^M
[64149.413410] device-mapper: multipath: Failing path 8:96.^M
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I will run the soak again with both MDS and OSS failover/restart enabled, and see if this problem can be reproduced.&lt;/p&gt;
</description>
                <environment></environment>
        <key id="52957">LU-11241</key>
            <summary>Unable to umount during MDT fail back</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="sarah">Sarah Liu</reporter>
                        <labels>
                            <label>soak</label>
                    </labels>
                <created>Mon, 13 Aug 2018 20:13:25 +0000</created>
                <updated>Tue, 14 Aug 2018 18:19:14 +0000</updated>
                                            <version>Lustre 2.10.5</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>3</watches>
                                                                            <comments>
                            <comment id="231894" author="sarah" created="Mon, 13 Aug 2018 22:00:54 +0000"  >&lt;p&gt;After reboot all servers, when mount the system, the problematic MDS(soak-10) hung at mount for 5 minutes, then it seemed pass. In normal case it takes secs.&lt;/p&gt;

&lt;p&gt;soak-10:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[   41.369390] NFS: Registering the id_resolver key type
[   41.375705] Key type id_resolver registered
[   41.381041] Key type id_legacy registered
[   61.439610] IPv6: ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready
[ 7664.773217] LNet: HW NUMA nodes: 2, HW CPU cores: 32, npartitions: 2
[ 7664.784088] alg: No test for adler32 (adler32-zlib)
[ 7665.685582] Lustre: Lustre: Build Version: 2.10.5_RC1_1_g574e63f
[ 7665.990084] LNet: Added LNI 192.168.1.110@o2ib [8/256/0/180]
[ 8661.888556] sd 0:0:0:44: [sdh] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 8661.897398] sd 0:0:0:44: [sdh] Sense Key : Illegal Request [current] 
[ 8661.904672] sd 0:0:0:44: [sdh] &amp;lt;&amp;lt;vendor&amp;gt;&amp;gt;ASC=0x94 ASCQ=0x1 
[ 8661.910955] sd 0:0:0:44: [sdh] CDB: Read(16) 88 00 00 00 00 00 00 00 00 00 00 00 00 20 00 00
[ 8661.920451] blk_update_request: I/O error, dev sdh, sector 0
[ 8661.928771] device-mapper: multipath: Failing path 8:112.
[ 8666.770534] device-mapper: multipath: Reinstating path 8:112.
[ 8666.777310] device-mapper: multipath: Failing path 8:112.
[ 8671.777942] device-mapper: multipath: Reinstating path 8:112.
[ 8671.784653] device-mapper: multipath: Failing path 8:112.
[ 8676.785696] device-mapper: multipath: Reinstating path 8:112.
[ 8676.792416] device-mapper: multipath: Failing path 8:112.
[ 8681.794339] device-mapper: multipath: Reinstating path 8:112.
[ 8681.801053] device-mapper: multipath: Failing path 8:112.
[ 8686.802318] device-mapper: multipath: Reinstating path 8:112.
[ 8686.809107] device-mapper: multipath: Failing path 8:112.
[ 8691.809802] device-mapper: multipath: Reinstating path 8:112.
[ 8691.816548] device-mapper: multipath: Failing path 8:112.
[ 8696.817530] device-mapper: multipath: Reinstating path 8:112.
[ 8696.824281] device-mapper: multipath: Failing path 8:112.
[ 8701.826151] device-mapper: multipath: Reinstating path 8:112.
[ 8701.832916] device-mapper: multipath: Failing path 8:112.
[ 8706.834134] device-mapper: multipath: Reinstating path 8:112.
...
[ 9039.375623] device-mapper: multipath: Reinstating path 8:112.
[ 9039.383620] device-mapper: multipath: Failing path 8:112.
[ 9044.386271] device-mapper: multipath: Reinstating path 8:112.
[ 9044.394213] device-mapper: multipath: Failing path 8:112.
[ 9049.394863] device-mapper: multipath: Reinstating path 8:112.
[ 9049.402805] sd 0:0:0:44: rdac: array soak-netapp5660-1, ctlr 0, queueing MODE_SELECT command
[ 9050.032813] sd 0:0:0:44: rdac: array soak-netapp5660-1, ctlr 0, MODE_SELECT completed
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@soak-10 ~]# multipath -ll
Aug 13 21:51:49 | ignoring extra data starting with &apos;udev_dir&apos; on line 2 of /etc/multipath.conf
Aug 13 21:51:49 | /etc/multipath.conf line 6, invalid keyword: getuid_callout
Aug 13 21:51:49 | /etc/multipath.conf line 29, invalid keyword: getuid_callout
ST500NM0011_Z1M0X1GG dm-5 ATA     ,ST500NM0011     
size=466G features=&apos;0&apos; hwhandler=&apos;0&apos; wp=rw
`-+- policy=&apos;round-robin 0&apos; prio=1 status=active
  `- 7:0:1:0  sdb 8:16  active ready running
360080e50001ff0d00000017c52012921 dm-4 LSI     ,INF-01-00       
size=15T features=&apos;2 queue_if_no_path retain_attached_hw_handler&apos; hwhandler=&apos;1 rdac&apos; wp=rw
`-+- policy=&apos;round-robin 0&apos; prio=6 status=active
  `- 0:0:0:43 sdg 8:96  active ready running
360080e50001fedb80000015752012949 dm-1 LSI     ,INF-01-00       
size=15T features=&apos;2 queue_if_no_path retain_attached_hw_handler&apos; hwhandler=&apos;1 rdac&apos; wp=rw
`-+- policy=&apos;round-robin 0&apos; prio=1 status=active
  `- 0:0:0:44 sdh 8:112 failed ghost running
360080e50001ff0d00000018052012939 dm-6 LSI     ,INF-01-00       
size=15T features=&apos;2 queue_if_no_path retain_attached_hw_handler&apos; hwhandler=&apos;1 rdac&apos; wp=rw
`-+- policy=&apos;round-robin 0&apos; prio=6 status=active
  `- 0:0:0:45 sdi 8:128 active ready running
360080e50001fedb80000015952012962 dm-2 LSI     ,INF-01-00       
size=15T features=&apos;2 queue_if_no_path retain_attached_hw_handler&apos; hwhandler=&apos;1 rdac&apos; wp=rw
`-+- policy=&apos;round-robin 0&apos; prio=1 status=active
  `- 0:0:0:46 sdd 8:48  active ghost running
360080e50001fedb80000015552012932 dm-0 LSI     ,INF-01-00       
size=15T features=&apos;2 queue_if_no_path retain_attached_hw_handler&apos; hwhandler=&apos;1 rdac&apos; wp=rw
`-+- policy=&apos;round-robin 0&apos; prio=1 status=active
  `- 0:0:0:42 sdf 8:80  active ghost running
360080e50001ff0d00000017852012908 dm-3 LSI     ,INF-01-00       
size=15T features=&apos;2 queue_if_no_path retain_attached_hw_handler&apos; hwhandler=&apos;1 rdac&apos; wp=rw
`-+- policy=&apos;round-robin 0&apos; prio=6 status=active
  `- 0:0:0:41 sde 8:64  active ready running
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="231932" author="adilger" created="Tue, 14 Aug 2018 18:19:14 +0000"  >&lt;blockquote&gt;
&lt;p&gt;On MDS0, I can not see the stack trace in console log but found this, is this an known issue?&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[52069.665828] LNet: 3948:0:(linux-debug.c:185:libcfs_call_trace()) can&apos;t show stack: kernel doesn&apos;t export show_task^M
[52069.681120] LNet: 3948:0:(linux-debug.c:185:libcfs_call_trace()) Skipped 3 previous similar messages^M
[52069.694717] LustreError: dumping log to /tmp/lustre-log.1534184159.5310^M
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;&lt;/blockquote&gt;

&lt;p&gt;There was a change in RHEL7.5 that broke the ability of Lustre to dump a stack.  This is fixed via &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11062&quot; title=&quot;Backtrace stack printing is broken in RHEL 7.5&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11062&quot;&gt;&lt;del&gt;LU-11062&lt;/del&gt;&lt;/a&gt;, which should be landing on b2_10 very soon.  It would be worthwhile to restart the soak testing once that patch has landed, so that we get more information in case of a hang.  Otherwise, we don&apos;t know why the MDS thread was stuck, which was preventing the unmount.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="52424">LU-11062</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i000mv:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>