<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:25:39 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-9376] Recovery bug exposed during sanity 103b test</title>
                <link>https://jira.whamcloud.com/browse/LU-9376</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;After the recent mergers during testing I begain to see failures in my sanity test for test 103a. With the test I see on the MDS node the following error:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[ 2272.188019] LDISKFS-fs (dm-0): Mount option &quot;noacl&quot; will be removed by 3.5
Contact linux-ldiskfs@vger.kernel.org if you think we should keep it.

[ 2272.216119] LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. Opts: noacl,user_xattr,errors=remount-ro,no_mbcache,nodelalloc
[ 2272.627509] Lustre: *** cfs_fail_loc=15b, val=0***
[ 2272.634026] Lustre: Skipped 123 previous similar messages
[ 2272.641089] LustreError: 18834:0:(llog_cat.c:396:llog_cat_current_log()) lustre-OST0000-osc-MDT0000: next log does not exist!
[ 2272.654143] LustreError: 18834:0:(llog_cat.c:396:llog_cat_current_log()) Skipped 62 previous similar messages
[ 2272.665796] LustreError: 18834:0:(osp_sync.c:1439:osp_sync_init()) lustre-OST0000-osc-MDT0000: can&apos;t initialize llog: rc = -5
[ 2272.679028] LustreError: 18834:0:(obd_config.c:574:class_setup()) setup lustre-OST0000-osc-MDT0000 failed (-5)
[ 2272.690791] LustreError: 18834:0:(obd_config.c:1709:class_config_llog_handler()) MGC10.37.248.196@o2ib1: cfg command failed: rc = -5
[ 2272.706189] Lustre: cmd=cf003 0:lustre-OST0000-osc-MDT0000 1:lustre-OST0000_UUID 2:10.37.248.198@o2ib1

[ 2272.721032] LustreError: 18834:0:(llog.c:616:llog_process_thread()) Local llog found corrupted
[ 2272.744649] Lustre: lustre-MDT0000: Imperative Recovery not enabled, recovery window 60-180
[ 2273.451269] Lustre: DEBUG MARKER: ninja34: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 6
[ 2274.271991] Lustre: lustre-MDT0000: Will be in recovery for at least 1:00, or until 1 client reconnects
[ 2274.303786] Lustre: lustre-MDT0000: Recovery over after 0:01, of 1 clients 1 recovered and 0 were evicted.
[ 2274.894275] Lustre: Failing over lustre-MDT0000
[ 2275.277776] Lustre: server umount lustre-MDT0000 complete
[ 2277.131077] LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
[ 2277.430391] Lustre: *** cfs_fail_loc=15b, val=0***
[ 2277.436972] Lustre: Skipped 370 previous similar messages
[ 2277.503475] LustreError: 20275:0:(genops.c:334:class_newdev()) Device lustre-OST0000-osc-MDT0000 already exists at 7, won&apos;t add
[ 2277.518507] LustreError: 20275:0:(obd_config.c:366:class_attach()) Cannot create device lustre-OST0000-osc-MDT0000 of type osp : -17
[ 2277.533986] LustreError: 20275:0:(obd_config.c:1709:class_config_llog_handler()) MGC10.37.248.196@o2ib1: cfg command failed: rc = -17
[ 2277.549656] Lustre: cmd=cf001 0:lustre-OST0000-osc-MDT0000 1:osp 2:lustre-MDT0000-mdtlov_UUID

[ 2277.564109] LustreError: 15c-8: MGC10.37.248.196@o2ib1: The configuration from log &apos;lustre-MDT0000&apos; failed (-17). This may be the result of.
[ 2277.564113] LustreError: 20223:0:(obd_mount_server.c:1351:server_start_targets()) failed to start server lustre-MDT0000: -17
[ 2277.564207] LustreError: 20223:0:(obd_mount_server.c:1840:server_fill_super()) Unable to start targets: -17
[ 2277.564233] Lustre: Failing over lustre-MDT0000
[ 2277.895953] Lustre: server umount lustre-MDT0000 complete
[ 2277.903356] LustreError: 20223:0:(obd_mount.c:1502:lustre_fill_super()) Unable to mount (-17)
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</description>
                <environment>The error occurs on the MDS server running RHEL7.3 with ldiskfs while running the sanity test 103b</environment>
        <key id="45624">LU-9376</key>
            <summary>Recovery bug exposed during sanity 103b test</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="emoly.liu">Emoly Liu</assignee>
                                    <reporter username="simmonsja">James A Simmons</reporter>
                        <labels>
                    </labels>
                <created>Thu, 20 Apr 2017 21:42:01 +0000</created>
                <updated>Sat, 26 Aug 2017 13:06:20 +0000</updated>
                            <resolved>Sun, 13 Aug 2017 22:42:26 +0000</resolved>
                                    <version>Lustre 2.10.0</version>
                                    <fixVersion>Lustre 2.11.0</fixVersion>
                                        <due></due>
                            <votes>1</votes>
                                    <watches>7</watches>
                                                                            <comments>
                            <comment id="192934" author="simmonsja" created="Thu, 20 Apr 2017 21:45:23 +0000"  >&lt;p&gt;I have attached a lctl dump from the MDS to this ticket from when the bug occurred.&lt;/p&gt;</comment>
                            <comment id="193059" author="adilger" created="Fri, 21 Apr 2017 17:27:39 +0000"  >&lt;p&gt;I see in your log snippet:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[ 2277.430391] Lustre: *** cfs_fail_loc=15b, val=0***
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;but this shouldn&apos;t be happening during sanity test_105a.  That is left over from sanity test_60e, which is very strange.  Are you sure these are the right logs from test_103a?  Could you please also include the stdout output from the test.&lt;/p&gt;</comment>
                            <comment id="193264" author="simmonsja" created="Mon, 24 Apr 2017 18:57:37 +0000"  >&lt;p&gt;The log I posted is from running sanity.sh from start to finish. I have discovered that the running 103b stand alone will pass. So some left over state is causing this strange failure. The stdout from the MDS for the total sanity &#160;run is posted in the description. This is why you see sanity 63 left overs.&lt;/p&gt;</comment>
                            <comment id="193407" author="pjones" created="Tue, 25 Apr 2017 17:22:49 +0000"  >&lt;p&gt;Emoly&lt;/p&gt;

&lt;p&gt;Could you please look into this issue?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="193725" author="emoly.liu" created="Thu, 27 Apr 2017 03:51:33 +0000"  >&lt;p&gt;James, &lt;br/&gt;
Can you show me your command to run this test? because we have already used reset_fail_loc() to set fail_loc=0 in the end of each test case on all client nodes and active server nodes.&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;reset_fail_loc () {
        echo -n &quot;Resetting fail_loc on all nodes...&quot;
        do_nodes $(comma_list $(nodes_list)) &quot;lctl set_param -n fail_loc=0 \
            fail_val=0 2&amp;gt;/dev/null&quot; || true
        echo done.
}
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="194235" author="simmonsja" created="Wed, 3 May 2017 05:49:28 +0000"  >&lt;p&gt;I just do a llmount.sh and then run sanity.sh to reproduce this problem. The setup is a single client and the back end ldiskfs with one MGS, one MDS and one OSS server. Only the OSS server has 2 disk. I did git bisect it it and found the problem has existed for a long time. Only reason I didn&apos;t see it before was that I recently changed my test bed configuration, split MGS and MDS into two different servers. I collecting and looking at debug logs. Will have more info shortly.&lt;/p&gt;</comment>
                            <comment id="195715" author="adilger" created="Fri, 12 May 2017 18:03:09 +0000"  >&lt;p&gt;On a related note, &lt;tt&gt;Mount option &quot;noacl&quot; will be removed by 3.5&lt;/tt&gt;, and we are already at kernel 4.12 so this &quot;noacl&quot; test_103b should probably just be removed completely, and the mention of the &lt;tt&gt;noacl&lt;/tt&gt; option removed from &lt;tt&gt;mount.lustre.8&lt;/tt&gt;.&lt;/p&gt;

&lt;p&gt;James, one thought about why you are seeing this spurious &lt;tt&gt;cfs_fail_loc=15b&lt;/tt&gt;, is whether your MGS is on a separate node, or just a separate device from the MDS?  I&apos;m wondering if for some reason one of the nodes is not being caught by the &lt;tt&gt;reset_fail_loc&lt;/tt&gt; call to &lt;tt&gt;nodes=&quot;$nodes $(facets_nodes $(get_facets))&quot;&lt;/tt&gt;.&lt;/p&gt;</comment>
                            <comment id="195784" author="gerrit" created="Mon, 15 May 2017 06:57:58 +0000"  >&lt;p&gt;Emoly Liu (emoly.liu@intel.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/27109&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/27109&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9376&quot; title=&quot;Recovery bug exposed during sanity 103b test&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9376&quot;&gt;&lt;del&gt;LU-9376&lt;/del&gt;&lt;/a&gt; tests: remove sanity.sh test_103b&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 7b3ebd654d07b5a7cf9d2951a75d80fc9967fb30&lt;/p&gt;</comment>
                            <comment id="195853" author="simmonsja" created="Mon, 15 May 2017 16:41:16 +0000"  >&lt;p&gt;Yes my MGS is a separate node from the MDS. &lt;/p&gt;</comment>
                            <comment id="204565" author="simmonsja" created="Sat, 5 Aug 2017 19:00:28 +0000"  >&lt;p&gt;The patch for &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9725&quot; title=&quot;Mount commands don&amp;#39;t return for targets in LFS with DNE and 3 MDTs &quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9725&quot;&gt;&lt;del&gt;LU-9725&lt;/del&gt;&lt;/a&gt; resolves this bug. A patch still exist that removes this obsolete test so don&apos;t close this ticket once &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9725&quot; title=&quot;Mount commands don&amp;#39;t return for targets in LFS with DNE and 3 MDTs &quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9725&quot;&gt;&lt;del&gt;LU-9725&lt;/del&gt;&lt;/a&gt; lands.&lt;/p&gt;</comment>
                            <comment id="205260" author="gerrit" created="Sun, 13 Aug 2017 17:17:44 +0000"  >&lt;p&gt;Oleg Drokin (oleg.drokin@intel.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/27109/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/27109/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9376&quot; title=&quot;Recovery bug exposed during sanity 103b test&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9376&quot;&gt;&lt;del&gt;LU-9376&lt;/del&gt;&lt;/a&gt; tests: remove sanity.sh test_103b&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 30faa618bbc8775595bf25803d06410fe0e67fd6&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="46982">LU-9725</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="26392" name="dump.log.gz" size="860407" author="simmonsja" created="Thu, 20 Apr 2017 21:44:50 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzzas7:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>