<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:15:11 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-1275] Lustre 2.1.1 REPLAY_SINGLE test_0a FAIL: Restart of mds failed</title>
                <link>https://jira.whamcloud.com/browse/LU-1275</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;My acc-sm set-ups has been used in testing 1.8.5, 1.8.6, and 1.8.7 successfully.&lt;br/&gt;
This is the first time I ran acc-sm against 2.1.1.&lt;br/&gt;
The SANITY and SANITYN passed, but all tests in REPLAY_SINGLE failed since&lt;br/&gt;
&quot;@@@@@@ FAIL: Restart of mds failed&quot;.&lt;/p&gt;



&lt;p&gt;== test 0a: empty replay == 12:05:12&lt;br/&gt;
Filesystem           1K-blocks      Used Available Use% Mounted on&lt;br/&gt;
service360@o2ib:/lustre&lt;br/&gt;
                       3937056    205112   3531816   6% /mnt/nbp0-1&lt;br/&gt;
Failing mds on node service360&lt;br/&gt;
Stopping /mnt/mds (opts&lt;img class=&quot;emoticon&quot; src=&quot;https://jira.whamcloud.com/images/icons/emoticons/smile.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;br/&gt;
affected facets: mds&lt;br/&gt;
df pid is 13509&lt;br/&gt;
Failover mds to service360&lt;br/&gt;
12:05:26 (1333134326) waiting for service360 network 900 secs ...&lt;br/&gt;
12:05:26 (1333134326) network interface is UP&lt;br/&gt;
Starting mds: -o errors=panic,acl  /dev/sdb1 /mnt/mds&lt;br/&gt;
service360: mount.lustre: mount /dev/sdb1 at /mnt/mds failed: Invalid argument&lt;br/&gt;
service360: This may have multiple causes.&lt;br/&gt;
service360: Are the mount options correct?&lt;br/&gt;
service360: Check the syslog for more info.&lt;br/&gt;
mount -t lustre  /dev/sdb1 /mnt/mds&lt;br/&gt;
Start of /dev/sdb1 on mds failed 22&lt;br/&gt;
 replay-single test_0a: @@@@@@ FAIL: Restart of mds failed!&lt;/p&gt;


&lt;p&gt;The /var/log/message of the MGS/MDS node showed:&lt;br/&gt;
...&lt;br/&gt;
Mar 30 12:05:10 service360 kernel: Lustre: MGC10.151.26.38@o2ib: Reactivating import&lt;br/&gt;
Mar 30 12:05:10 service360 kernel: LustreError: 11254:0:(llog_lvfs.c:473:llog_lvfs_next_block()) Invalid llog tail at log id 17/2375643311 offset 14432&lt;br/&gt;
Mar 30 12:05:10 service360 kernel: LustreError: 11254:0:(mgs_handler.c:783:mgs_handle()) MGS handle cmd=502 rc=-22&lt;br/&gt;
...&lt;br/&gt;
The replay-single.test_0a.debug_log.service360.log.&lt;span class=&quot;error&quot;&gt;&amp;#91;12&amp;#93;&lt;/span&gt; are attached.&lt;/p&gt;</description>
                <environment>Server runs centos 6.2, ofed-1.5.4.1, Lustre 2.1.1.&lt;br/&gt;
Client runs sles11sp1, ofed-1.5.4.1, Lustre 1.8.6.&lt;br/&gt;
MGS/MDS uses the same device. Two OSS&amp;#39;es. Two clients.</environment>
        <key id="13832">LU-1275</key>
            <summary>Lustre 2.1.1 REPLAY_SINGLE test_0a FAIL: Restart of mds failed</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="2">Won&apos;t Fix</resolution>
                                        <assignee username="mdiep">Minh Diep</assignee>
                                    <reporter username="jaylan">Jay Lan</reporter>
                        <labels>
                    </labels>
                <created>Fri, 30 Mar 2012 17:55:54 +0000</created>
                <updated>Wed, 5 Mar 2014 19:15:45 +0000</updated>
                            <resolved>Wed, 5 Mar 2014 19:15:45 +0000</resolved>
                                    <version>Lustre 2.1.1</version>
                    <version>Lustre 1.8.6</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>4</watches>
                                                                            <comments>
                            <comment id="32983" author="pjones" created="Fri, 30 Mar 2012 18:03:45 +0000"  >&lt;p&gt;Minh&lt;/p&gt;

&lt;p&gt;Could you please help with this one?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="33385" author="jaylan" created="Tue, 3 Apr 2012 16:30:55 +0000"  >&lt;p&gt;Could you please help on this?&lt;br/&gt;
The same test environment worked fine in 1.8.5 and 1.8.6.&lt;br/&gt;
One single test failure in 1.8.7 (see &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-1246&quot; title=&quot;SANITY_QUOTA test_32 failed in cleanup_and_setup_lustre with LOAD_MODULES_REMOTE=true&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-1246&quot;&gt;&lt;del&gt;LU-1246&lt;/del&gt;&lt;/a&gt;)&lt;br/&gt;
Much more failure between 2.1.1 server and 1.8.7 client, including this one.&lt;br/&gt;
Totally not working between 2.1.1 server and 2.1.1 client.&lt;/p&gt;

&lt;p&gt;I am going to spend time to convert to auster for 2.1.1 server + 2.1.1 client,&lt;br/&gt;
but I really need help in evaluating my environment of 2.1.1 server + 1.8.7 client.&lt;/p&gt;</comment>
                            <comment id="33387" author="mdiep" created="Tue, 3 Apr 2012 17:28:29 +0000"  >&lt;p&gt;ok, looking into this&lt;/p&gt;</comment>
                            <comment id="33388" author="mdiep" created="Tue, 3 Apr 2012 17:31:34 +0000"  >&lt;p&gt;can you show me the config file? or local.sh if you modified it&lt;/p&gt;</comment>
                            <comment id="33390" author="jaylan" created="Tue, 3 Apr 2012 18:03:41 +0000"  >&lt;p&gt;The command used in testing was:&lt;/p&gt;
&lt;ol&gt;
	&lt;li&gt;ACC_SM_ONLY=&quot;REPLAY_SINGLE&quot; NAME=ncli_nas.v3 RCLIENTS=&quot;service332&quot; sh acceptance-small.sh&lt;/li&gt;
&lt;/ol&gt;


&lt;p&gt;The ncli_nas.v3 will be attached.&lt;/p&gt;</comment>
                            <comment id="33391" author="jaylan" created="Tue, 3 Apr 2012 18:08:06 +0000"  >&lt;p&gt;I accidentally also attached nas.v3.sh. It was a wrapper. The end result was to&lt;br/&gt;
run the command I wrote in the previous comment. The configuration file is ncli_nas.v3.&lt;/p&gt;</comment>
                            <comment id="33392" author="mdiep" created="Tue, 3 Apr 2012 18:14:11 +0000"  >&lt;p&gt;thanks. Did you run this on a client that was running 1.8.6?&lt;/p&gt;</comment>
                            <comment id="33393" author="jaylan" created="Tue, 3 Apr 2012 18:30:27 +0000"  >&lt;p&gt;Yes. It was started from service331, a client. All nodes (mds, 2 oss&apos;es and 2 clients) have the same set of configuration.&lt;/p&gt;</comment>
                            <comment id="33394" author="mdiep" created="Tue, 3 Apr 2012 18:43:21 +0000"  >&lt;p&gt;I don&apos;t have a system to try it out now. could you manually run &quot;mount -t lustre -o errors=panic,acl /dev/sdb1 /mnt/mds&quot; on the mds to see if it works&lt;/p&gt;</comment>
                            <comment id="33396" author="jaylan" created="Tue, 3 Apr 2012 20:00:21 +0000"  >&lt;p&gt;I know for the fact that &quot;mount -t lustre -o errors=panic,acl /dev/sdb1 /mnt/mds&quot; works because the command has been executed so many times.&lt;/p&gt;

&lt;p&gt;However, that brought some thought to me. In fact I ran the acceptance-small.sh in a for-loop:&lt;/p&gt;

&lt;p&gt;for i in SANITY SANITYN REPLAY_SINGLE CONF_SANITY RECOVERY_SMALL REPLAY_OST_SINGLE REPLAY_DUAL INSANITY SANITY_QUOTA LNET_SELFTEST MMP; do&lt;br/&gt;
    mkdir $TMP/$i&lt;br/&gt;
    umount /mnt/nbp0-1 /mnt/nbp0-2 1&amp;gt; /dev/null 2&amp;gt;&amp;amp;1&lt;br/&gt;
    echo run $i &amp;gt;$TMP/${i}/${i}.output 2&amp;gt;&amp;amp;1&lt;br/&gt;
    case $i in&lt;br/&gt;
      SANITY|SANITYN|REPLAY_SINGLE|CONF_SANITY|RECOVERY_SMALL|REPLAY_OST_SINGLE|REPLAY_DUAL|INSANITY|LNET_SELFTEST|MMP)&lt;br/&gt;
        ACC_SM_ONLY=&quot;$i&quot; NAME=ncli_nas.v3 RCLIENTS=&quot;service332&quot; sh acceptance-small.sh &amp;gt;&amp;gt;$TMP/${i}/${i}.output 2&amp;gt;&amp;amp;1;;&lt;br/&gt;
      SANITY_QUOTA)&lt;br/&gt;
        ACC_SM_ONLY=&quot;$i&quot; RCLIENTS=&quot;service332&quot; MDSSIZE=4000000 OSTSIZE=4000000 NAME=ncli_nas.v3 sh acceptance-small.sh &amp;gt;&amp;gt;$TMP/${i}/${i}.output 2&amp;gt;&amp;amp;1;;&lt;br/&gt;
      *)&lt;br/&gt;
        echo &quot;Test $i not supported.&quot;;;&lt;br/&gt;
    esac&lt;br/&gt;
done&lt;/p&gt;


&lt;p&gt;So, by the time the REPLAY_SINGLE is executed, both SANITY and SANITYN has completed. That means it was not the same as starting from ground zero.&lt;/p&gt;

&lt;p&gt;So, I rebooted all the machines. Ran &quot;mount -t lustre&quot; to make sure it worked.&lt;br/&gt;
Umounted it. Then just ran the acceptance-small.sh with REPLAY_SINGLE without executing SANITY and SANITYN first. Well, it succeeded!&lt;/p&gt;

&lt;p&gt;Now, this wrapper worked when the lustre server is 1.8.6 (or 1.8.7). Any suggestion to make it work when server runs 2.1.1?&lt;/p&gt;</comment>
                            <comment id="33398" author="jaylan" created="Tue, 3 Apr 2012 20:57:36 +0000"  >&lt;p&gt;Since the REPLAY_SINGLE can be executed successfully on a clean environment, you can close this ticket then. I will figure out a way to work around my problem when testing with 2.x servers. Suggestion is welcome &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.whamcloud.com/images/icons/emoticons/smile.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/p&gt;</comment>
                            <comment id="33401" author="mdiep" created="Tue, 3 Apr 2012 21:27:57 +0000"  >&lt;p&gt;I need to reproduce this in the lab and investigate the cause. In the mean time, please try this. Add MDSDEV1=/dev/sdb1 in the config file to see if it makes any different. If you don&apos;t care to reformat the FS before every test, you could put export REFORMAT=true in the config file.&lt;/p&gt;

&lt;p&gt;I also suggest you to explore auster script which has an option to send the result back in our maloo result db.&lt;/p&gt;</comment>
                            <comment id="33499" author="jaylan" created="Wed, 4 Apr 2012 20:18:57 +0000"  >&lt;p&gt;I have this line in my configuration file:&lt;br/&gt;
export REFORMAT=&quot;--reformat&quot;&lt;/p&gt;

&lt;p&gt;Would it have the same effect as &quot;export REFORMAT=true&quot;?&lt;/p&gt;</comment>
                            <comment id="33800" author="mdiep" created="Fri, 6 Apr 2012 11:55:27 +0000"  >&lt;p&gt;yes&lt;/p&gt;</comment>
                            <comment id="33833" author="jaylan" created="Fri, 6 Apr 2012 20:12:32 +0000"  >&lt;p&gt;Attached two files, cut from /var/log/messages of the mds server between the MARKER of beginning and end of test 0a.&lt;/p&gt;

&lt;p&gt;The *.FAIL was the run that failed. and The *.PASS was the run that passed.&lt;/p&gt;</comment>
                            <comment id="33835" author="jaylan" created="Fri, 6 Apr 2012 20:17:10 +0000"  >&lt;p&gt;On a second thought I do not feel comfortable to declare this is a test issue (ie, is a problem of test environment setup.) It could also resulted from mds behaving differently in different situations and represents a real problem.&lt;/p&gt;

&lt;p&gt;We do not know enough to say either way.&lt;/p&gt;
</comment>
                            <comment id="78423" author="jfc" created="Wed, 5 Mar 2014 01:17:16 +0000"  >&lt;p&gt;Jay &amp;#8211; is this still an issue of concern to you?&lt;br/&gt;
Is there any further action you&apos;d like us to take?&lt;br/&gt;
I&apos;d like to mark this as resolved &amp;#8211; am I OK to go ahead and do that?&lt;br/&gt;
Thanks,&lt;br/&gt;
~ jfc.&lt;/p&gt;</comment>
                            <comment id="78503" author="jaylan" created="Wed, 5 Mar 2014 18:37:09 +0000"  >&lt;p&gt;Yes, please. No longer a problem. Thanks!&lt;/p&gt;</comment>
                            <comment id="78507" author="jfc" created="Wed, 5 Mar 2014 19:14:06 +0000"  >&lt;p&gt;Thank you &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.whamcloud.com/images/icons/emoticons/smile.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/p&gt;</comment>
                            <comment id="78508" author="jfc" created="Wed, 5 Mar 2014 19:15:45 +0000"  >&lt;p&gt;Not clear if this was a test issue &amp;#8211; but time has moved on and it is no longer a problem.&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="11053" name="nas.v3.sh" size="2662" author="jaylan" created="Tue, 3 Apr 2012 18:03:58 +0000"/>
                            <attachment id="11052" name="ncli_nas.v3.sh" size="2401" author="jaylan" created="Tue, 3 Apr 2012 18:02:12 +0000"/>
                            <attachment id="11096" name="replay-single.s360.0406.FAIL" size="8383" author="jaylan" created="Fri, 6 Apr 2012 20:12:32 +0000"/>
                            <attachment id="11097" name="replay-single.s360.0406.PASS" size="7329" author="jaylan" created="Fri, 6 Apr 2012 20:12:32 +0000"/>
                            <attachment id="11036" name="replay-single.test_0a.debug_log.service360.log.1" size="8284207" author="jaylan" created="Fri, 30 Mar 2012 17:55:54 +0000"/>
                            <attachment id="11037" name="replay-single.test_0a.debug_log.service360.log.2" size="5682488" author="jaylan" created="Fri, 30 Mar 2012 17:55:54 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzvf4n:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>6096</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>