<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:46:22 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-11722] replay-single test 11 fails with &#8220;Restart of mds1 failed!&#8221;</title>
                <link>https://jira.whamcloud.com/browse/LU-11722</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;replay-single test_11 fails with error message &#8220;Restart of mds1 failed!&#8221;. Looking at the client test_log at &lt;a href=&quot;https://testing.whamcloud.com/test_sets/79ecfc38-f0e8-11e8-bfe1-52540065bddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/79ecfc38-f0e8-11e8-bfe1-52540065bddc&lt;/a&gt; , we see a problem with the failover/mount of mds1&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Failover mds1 to trevis-39vm8
18:18:54 (1543169934) waiting for trevis-39vm8 network 900 secs ...
18:18:54 (1543169934) network interface is UP
CMD: trevis-39vm8 hostname
mount facets: mds1
CMD: trevis-39vm8 test -b /dev/lvm-Role_MDS/P1
CMD: trevis-39vm8 e2label /dev/lvm-Role_MDS/P1
trevis-39vm8: e2label: No such file or directory while trying to open /dev/lvm-Role_MDS/P1
trevis-39vm8: Couldn&apos;t find valid filesystem superblock.
Starting mds1:   -o loop /dev/lvm-Role_MDS/P1 /mnt/lustre-mds1
CMD: trevis-39vm8 mkdir -p /mnt/lustre-mds1; mount -t lustre   -o loop 		                   /dev/lvm-Role_MDS/P1 /mnt/lustre-mds1
trevis-39vm8: mount: /dev/lvm-Role_MDS/P1: failed to setup loop device: No such file or directory
Start of /dev/lvm-Role_MDS/P1 on mds1 failed 32
 replay-single test_11: @@@@@@ FAIL: Restart of mds1 failed! 
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;In the console log for MDS (vm7), we see the node failing&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[  117.663213] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == replay-single test 11: create open write rename \|X\| create-old-name read ========================== 18:18:38 \(1543169918\)
[  117.849978] Lustre: DEBUG MARKER: == replay-single test 11: create open write rename |X| create-old-name read ========================== 18:18:38 (1543169918)
[  118.031444] Lustre: DEBUG MARKER: sync; sync; sync
[  119.317310] Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 notransno
[  119.653586] Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 readonly
[  119.911790] LustreError: 6330:0:(osd_handler.c:2198:osd_ro()) *** setting lustre-MDT0000 read-only ***
[  120.034914] Turning device dm-0 (0xfc00000) read-only
[  120.202030] Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds1 REPLAY BARRIER on lustre-MDT0000
[  120.367027] Lustre: DEBUG MARKER: mds1 REPLAY BARRIER on lustre-MDT0000
[  120.608673] Lustre: DEBUG MARKER: /usr/sbin/lctl dl

&amp;lt;ConMan&amp;gt; Console [trevis-39vm7] disconnected from &amp;lt;trevis-39:6006&amp;gt; at 11-25 18:18.

&amp;lt;ConMan&amp;gt; Console [trevis-39vm7] connected to &amp;lt;trevis-39:6006&amp;gt; at 11-25 18:19.
......... ok
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;In the console log for the failover MDS (vm8), we see the start of replay-single test 10 and then a call trace&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[  117.876287] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == replay-single test 10: create \|X\| rename unlink =================================================== 18:17:21 \(1543169841\)
[  118.074071] Lustre: DEBUG MARKER: == replay-single test 10: create |X| rename unlink =================================================== 18:17:21 (1543169841)
[  118.243565] Lustre: DEBUG MARKER: sync; sync; sync
[  119.576474] Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 notransno
[  119.905069] Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 readonly
[  120.175396] LustreError: 5633:0:(osd_handler.c:2198:osd_ro()) *** setting lustre-MDT0000 read-only ***
[  120.348338] Turning device dm-0 (0xfc00000) read-only
[  120.510660] Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds1 REPLAY BARRIER on lustre-MDT0000
[  120.672831] Lustre: DEBUG MARKER: mds1 REPLAY BARRIER on lustre-MDT0000
[  120.914538] Lustre: DEBUG MARKER: /usr/sbin/lctl dl

&amp;lt;ConMan&amp;gt; Console [trevis-39vm8] disconnected from &amp;lt;trevis-39:6007&amp;gt; at 11-25 18:17.

&amp;lt;ConMan&amp;gt; Console [trevis-39vm8] connected to &amp;lt;trevis-39:6007&amp;gt; at 11-25 18:17.
......... ok
&#8230;
trevis-39vm8 login: [  130.156081] random: crng init done
[  133.845539]  session1: session recovery timed out after 120 secs
[  133.846770] scsi 2:0:0:0: rejecting I/O to offline device
[  133.847790] scsi 2:0:0:0: rejecting I/O to offline device
[  133.849029] scsi 2:0:0:0: rejecting I/O to offline device
[  133.870016] FS-Cache: Loaded
[  133.905941] FS-Cache: Netfs &apos;nfs&apos; registered for caching
[  133.917634] Key type dns_resolver registered
[  133.946337] NFS: Registering the id_resolver key type
[  133.947290] Key type id_resolver registered
[  133.948061] Key type id_legacy registered
[ 3805.931636] SysRq : Changing Loglevel
[ 3805.932529] Loglevel set to 8
[ 3806.457206] SysRq : Show State
[ 3806.457932]   task                        PC stack   pid father
[ 3806.459017] systemd         S ffff98503c140000     0     1      0 0x00000000
[ 3806.460287] Call Trace:
[ 3806.460809]  [&amp;lt;ffffffffa4167bc9&amp;gt;] schedule+0x29/0x70
[ 3806.461812]  [&amp;lt;ffffffffa4166dfd&amp;gt;] schedule_hrtimeout_range_clock+0x12d/0x150
[ 3806.463126]  [&amp;lt;ffffffffa3c8e869&amp;gt;] ? ep_scan_ready_list.isra.7+0x1b9/0x1f0
[ 3806.464436]  [&amp;lt;ffffffffa4166e33&amp;gt;] schedule_hrtimeout_range+0x13/0x20
[ 3806.465578]  [&amp;lt;ffffffffa3c8eafe&amp;gt;] ep_poll+0x23e/0x360
[ 3806.466496]  [&amp;lt;ffffffffa3c531f1&amp;gt;] ? do_unlinkat+0xf1/0x2d0
[ 3806.467542]  [&amp;lt;ffffffffa3ad67b0&amp;gt;] ? wake_up_state+0x20/0x20
[ 3806.468515]  [&amp;lt;ffffffffa3c8ffcd&amp;gt;] SyS_epoll_wait+0xed/0x120
[ 3806.469550]  [&amp;lt;ffffffffa4174d15&amp;gt;] ? system_call_after_swapgs+0xa2/0x146
[ 3806.470680]  [&amp;lt;ffffffffa4174ddb&amp;gt;] system_call_fastpath+0x22/0x27
[ 3806.471796]  [&amp;lt;ffffffffa4174d21&amp;gt;] ? system_call_after_swapgs+0xae/0x146
&#8230;
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Neither of these console logs has the start of recovery-small test 12.&lt;/p&gt;

&lt;p&gt;When replay-single test 11 fails in this way, we see test 12 hang.&lt;/p&gt;

&lt;p&gt;I&#8217;ve look back at results for all branches since April and only found one failure that looks the same as this one. Logs are at &lt;a href=&quot;https://testing.whamcloud.com/test_sets/efc9036e-a90a-11e8-80f7-52540065bddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/efc9036e-a90a-11e8-80f7-52540065bddc&lt;/a&gt; .&lt;/p&gt;</description>
                <environment></environment>
        <key id="54165">LU-11722</key>
            <summary>replay-single test 11 fails with &#8220;Restart of mds1 failed!&#8221;</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="jamesanunez">James Nunez</reporter>
                        <labels>
                    </labels>
                <created>Fri, 30 Nov 2018 18:21:50 +0000</created>
                <updated>Fri, 30 Nov 2018 18:49:34 +0000</updated>
                                            <version>Lustre 2.12.0</version>
                    <version>Lustre 2.10.6</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>1</watches>
                                                                                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i00787:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>