<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:42:18 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-11256] replay-vbr test 7f is failing with &apos;Restart of mds1 failed!&apos;</title>
                <link>https://jira.whamcloud.com/browse/LU-11256</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;replay-vbr test_7f fails on mounting an MDS. It&#8217;s not clear when this test started failing with this error, but it looks like this test didn&#8217;t fail on MDS mount for two and a half months and started failing again on July 14, 2018. All failures of this type since March of 2018 are listed below.&lt;/p&gt;

&lt;p&gt;Looking at the failure at &lt;a href=&quot;https://testing.whamcloud.com/test_sets/5b253cd8-878f-11e8-9028-52540065bddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/5b253cd8-878f-11e8-9028-52540065bddc&lt;/a&gt;, in the test_log, the only sign of trouble is when we try and mount the failover MDS &lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Failing mds1 on trevis-4vm8
+ pm -h powerman --off trevis-4vm8
Command completed successfully
reboot facets: mds1
+ pm -h powerman --on trevis-4vm8
Command completed successfully
Failover mds1 to trevis-4vm7
12:30:33 (1531571433) waiting for trevis-4vm7 network 900 secs ...
12:30:33 (1531571433) network interface is UP
CMD: trevis-4vm7 hostname
mount facets: mds1
CMD: trevis-4vm7 test -b /dev/lvm-Role_MDS/P1
CMD: trevis-4vm7 e2label /dev/lvm-Role_MDS/P1
trevis-4vm7: e2label: No such file or directory while trying to open /dev/lvm-Role_MDS/P1
trevis-4vm7: Couldn&apos;t find valid filesystem superblock.
Starting mds1:   -o loop /dev/lvm-Role_MDS/P1 /mnt/lustre-mds1
CMD: trevis-4vm7 mkdir -p /mnt/lustre-mds1; mount -t lustre   -o loop 		                   /dev/lvm-Role_MDS/P1 /mnt/lustre-mds1
trevis-4vm7: mount: /dev/lvm-Role_MDS/P1: failed to setup loop device: No such file or directory
Start of /dev/lvm-Role_MDS/P1 on mds1 failed 32
 replay-vbr test_7f: @@@@@@ FAIL: Restart of mds1 failed! 
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;In all the following cases, test 7g hangs when test 7f fails in this way.&lt;/p&gt;

&lt;p&gt;2018-08-15 2.10.5 RC2 &#8211; fails in &#8220;test_7f.5 last&#8221;&lt;br/&gt;
&lt;a href=&quot;https://testing.whamcloud.com/test_sets/a75d306e-a081-11e8-8ee3-52540065bddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/a75d306e-a081-11e8-8ee3-52540065bddc&lt;/a&gt;&lt;br/&gt;
2018-08-02 2.10.4.14 &#8211; fails in &#8220;test_7f.5 last&#8221;&lt;br/&gt;
&lt;a href=&quot;https://testing.whamcloud.com/test_sets/7405ad54-9645-11e8-a9f7-52540065bddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/7405ad54-9645-11e8-a9f7-52540065bddc&lt;/a&gt;&lt;br/&gt;
2018-07-14 2.10.4.8 - fails in &#8220;test_7f.1 last&#8221;&lt;br/&gt;
&lt;a href=&quot;https://testing.whamcloud.com/test_sets/5b253cd8-878f-11e8-9028-52540065bddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/5b253cd8-878f-11e8-9028-52540065bddc&lt;/a&gt;&lt;br/&gt;
2018-04-12 2.11.50.51 - fails in &#8220;test_7f.4 last&#8221;&lt;br/&gt;
&lt;a href=&quot;https://testing.whamcloud.com/test_sets/37bad538-3e69-11e8-b45c-52540065bddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/37bad538-3e69-11e8-b45c-52540065bddc&lt;/a&gt;&lt;br/&gt;
2018-03-03 2.10.3.35 - fails in &#8220;test_7f.4 last&#8221;&lt;br/&gt;
&lt;a href=&quot;https://testing.whamcloud.com/test_sets/f33a4326-1f0f-11e8-a6ca-52540065bddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/f33a4326-1f0f-11e8-a6ca-52540065bddc&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the following test session, replay-vbr test 7e fails in the way described above and test 7f hangs&lt;br/&gt;
2018-07-15 2.10.4.8 - fails in &#8220;test_7e.5 last&#8221;&lt;br/&gt;
&lt;a href=&quot;https://testing.whamcloud.com/test_sets/0ca6ca46-87fc-11e8-b376-52540065bddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/0ca6ca46-87fc-11e8-b376-52540065bddc&lt;/a&gt;&lt;br/&gt;
2018-03-14 2.10.59 - fails in &#8220;test_7e.5 last&#8221;&lt;br/&gt;
&lt;a href=&quot;https://testing.whamcloud.com/test_sets/d26f60b0-2809-11e8-b6a0-52540065bddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/d26f60b0-2809-11e8-b6a0-52540065bddc&lt;/a&gt;&lt;/p&gt;</description>
                <environment></environment>
        <key id="52985">LU-11256</key>
            <summary>replay-vbr test 7f is failing with &apos;Restart of mds1 failed!&apos;</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="jamesanunez">James Nunez</reporter>
                        <labels>
                    </labels>
                <created>Wed, 15 Aug 2018 20:06:40 +0000</created>
                <updated>Fri, 30 Nov 2018 18:49:34 +0000</updated>
                                            <version>Lustre 2.12.0</version>
                    <version>Lustre 2.10.4</version>
                    <version>Lustre 2.10.5</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>2</watches>
                                                                            <comments>
                            <comment id="232063" author="adilger" created="Thu, 16 Aug 2018 17:50:05 +0000"  >&lt;p&gt;It looks like there is a check in &lt;tt&gt;mount_facet()&lt;/tt&gt; that is checking if the &quot;device&quot; is a block device, and if not then add &quot;&lt;tt&gt;-o loop&lt;/tt&gt;&quot; to the mount options.&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
mount_facet() {    
        &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; [ $(facet_fstype $facet) == ldiskfs ] &amp;amp;&amp;amp;
           ! do_facet $facet test -b ${!dev}; then
                opts=$(csa_add &lt;span class=&quot;code-quote&quot;&gt;&quot;$opts&quot;&lt;/span&gt; -o loop)
        fi

        &lt;span class=&quot;code-keyword&quot;&gt;case&lt;/span&gt; $fstype in
        ldiskfs)
                devicelabel=$(do_facet ${facet} &lt;span class=&quot;code-quote&quot;&gt;&quot;$E2LABEL ${!dev}&quot;&lt;/span&gt;);;
        esac

        echo &lt;span class=&quot;code-quote&quot;&gt;&quot;Starting ${facet}: $opts ${!dev} $mntpt&quot;&lt;/span&gt;
        &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; [ $RC -ne 0 ]; then
                echo &lt;span class=&quot;code-quote&quot;&gt;&quot;Start of ${!dev} on ${facet} failed ${RC}&quot;&lt;/span&gt;
                &lt;span class=&quot;code-keyword&quot;&gt;return&lt;/span&gt; $RC
        fi
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;To further debug this issue, it would make sense to add some additional debugging into the &quot;&lt;tt&gt;test -b&lt;/tt&gt;&quot; failure case to determine if the device even exists:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
        &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; [ $(facet_fstype $facet) == ldiskfs ] &amp;amp;&amp;amp;
           ! do_facet $facet test -b ${!dev}; then
                &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; ! do_facet $facet test -e ${!dev}; then
                        do_facet $facet &lt;span class=&quot;code-quote&quot;&gt;&quot;ls -lR $(dirname ${!dev})&quot;&lt;/span&gt;
                        error &lt;span class=&quot;code-quote&quot;&gt;&quot;$facet: device ${!dev} does not exist&quot;&lt;/span&gt;
                fi
                opts=$(csa_add &lt;span class=&quot;code-quote&quot;&gt;&quot;$opts&quot;&lt;/span&gt; -o loop)
        fi
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="232314" author="jamesanunez" created="Mon, 20 Aug 2018 19:02:53 +0000"  >&lt;p&gt;Similar failure on recovery-scale-mds in test failover_mds at &lt;a href=&quot;https://testing.whamcloud.com/test_sets/4e5eb398-a24d-11e8-a5f2-52540065bddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/4e5eb398-a24d-11e8-a5f2-52540065bddc&lt;/a&gt;&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="50257">LU-10519</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="50927">LU-10708</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i000t3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>