<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:48:59 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-5153] LustreError: 14404:0:(fld_index.c:176:fld_index_create()) ASSERTION( mutex_is_locked(&amp;fld-&gt;lsf_lock) ) failed:</title>
                <link>https://jira.whamcloud.com/browse/LU-5153</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;When trying to start out large 2.5 DNE test bed with 2.6, we hit the following assertion:&lt;br/&gt;
(09:16:38 AM) dmb: Jun  6 09:12:35 galaxy-esf-mds004 kernel: LustreError: 14404:0:(fld_index.c:176:fld_index_create()) ASSERTION( mutex_is_locked(&amp;amp;fld-&amp;gt;lsf_lock) ) failed:&lt;br/&gt;
(09:16:38 AM) dmb: Jun  6 09:12:35 galaxy-esf-mds004 kernel: LustreError: 14404:0:(fld_index.c:176:fld_index_create()) LBUG&lt;br/&gt;
(09:16:38 AM) dmb: Jun  6 09:12:35 galaxy-esf-mds004 kernel: Pid: 14404, comm: llog_process_th&lt;br/&gt;
(09:16:38 AM) dmb: Jun  6 09:12:35 galaxy-esf-mds004 kernel:&lt;br/&gt;
(09:16:38 AM) dmb: Jun  6 09:12:35 galaxy-esf-mds004 kernel: Call Trace:&lt;br/&gt;
(09:16:38 AM) dmb: Jun  6 09:12:35 galaxy-esf-mds004 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0a8b895&amp;gt;&amp;#93;&lt;/span&gt; libcfs_debug_dumpstack+0x55/0x80 &lt;span class=&quot;error&quot;&gt;&amp;#91;libcfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
(09:16:38 AM) dmb: Jun  6 09:12:35 galaxy-esf-mds004 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0a8be97&amp;gt;&amp;#93;&lt;/span&gt; lbug_with_loc+0x47/0xb0 &lt;span class=&quot;error&quot;&gt;&amp;#91;libcfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
(09:16:38 AM) dmb: Jun  6 09:12:35 galaxy-esf-mds004 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa104adf3&amp;gt;&amp;#93;&lt;/span&gt; fld_index_create+0x5a3/0x750 &lt;span class=&quot;error&quot;&gt;&amp;#91;fld&amp;#93;&lt;/span&gt;&lt;br/&gt;
(09:16:38 AM) dmb: Jun  6 09:12:35 galaxy-esf-mds004 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa1370f2e&amp;gt;&amp;#93;&lt;/span&gt; ? osd_trans_start+0x21e/0x660 &lt;span class=&quot;error&quot;&gt;&amp;#91;osd_ldiskfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
(09:16:38 AM) dmb: Jun  6 09:12:35 galaxy-esf-mds004 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa104b7f1&amp;gt;&amp;#93;&lt;/span&gt; fld_insert_entry+0x291/0x380 &lt;span class=&quot;error&quot;&gt;&amp;#91;fld&amp;#93;&lt;/span&gt;&lt;br/&gt;
(09:16:38 AM) dmb: Jun  6 09:12:35 galaxy-esf-mds004 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa104976a&amp;gt;&amp;#93;&lt;/span&gt; fld_update_from_controller+0x27a/0x540 &lt;span class=&quot;error&quot;&gt;&amp;#91;fld&amp;#93;&lt;/span&gt;&lt;br/&gt;
(09:16:38 AM) dmb: Jun  6 09:12:35 galaxy-esf-mds004 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa1485496&amp;gt;&amp;#93;&lt;/span&gt; mdt_register_lwp_callback+0x76/0x2d0 &lt;span class=&quot;error&quot;&gt;&amp;#91;mdt&amp;#93;&lt;/span&gt;&lt;br/&gt;
(09:16:38 AM) dmb: Jun  6 09:12:35 galaxy-esf-mds004 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0c4943f&amp;gt;&amp;#93;&lt;/span&gt; lustre_lwp_connect+0x83f/0xc90 &lt;span class=&quot;error&quot;&gt;&amp;#91;obdclass&amp;#93;&lt;/span&gt;&lt;/p&gt;


&lt;p&gt;Looking at the assertion and other call chains to this function, I see that the mutex in question is usually taken around calls to:&lt;br/&gt;
fld_insert_entry&lt;/p&gt;

&lt;p&gt;The problematic call chain was introduced by this commit:&lt;br/&gt;
commit 519a65ddc04673022124f421e4809f8a87f790d7&lt;br/&gt;
Author: wang di &amp;lt;di.wang@intel.com&amp;gt;&lt;br/&gt;
Date:   Tue Oct 8 02:13:27 2013 -0700&lt;/p&gt;

&lt;p&gt;    &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4076&quot; title=&quot;Create local FLDB for each non0-MDT, so it does not need to send RPC when checking remote FID.&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4076&quot;&gt;&lt;del&gt;LU-4076&lt;/del&gt;&lt;/a&gt; fld: add local fldb to each target&lt;/p&gt;

&lt;p&gt;    1. Add local FLDB to each MDT, so OSD/OUT can check whether&lt;br/&gt;
    FID is remote by looking up local FLDB, i.e. no need send RPC&lt;br/&gt;
    to MDT0.&lt;/p&gt;

&lt;p&gt;    2. OSD will only do local lookup when checking remote FID.&lt;/p&gt;

&lt;p&gt;    3. During upgrade, MDTn(n != 0) needs to retrieve its fldb&lt;br/&gt;
    entries from controller(MDT0) and insert into the local&lt;br/&gt;
    FLDB.&lt;/p&gt;

&lt;p&gt;    4. MDT should also use LWP(instead of OSP) to communicate&lt;br/&gt;
    with sequence controller (MDT0).&lt;/p&gt;

&lt;p&gt;    Signed-off-by: wang di &amp;lt;di.wang@intel.com&amp;gt;&lt;br/&gt;
    Change-Id: I788a543aeb7305dfbad3cc41b586f9337f227119&lt;br/&gt;
    Reviewed-on: &lt;a href=&quot;http://review.whamcloud.com/7884&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/7884&lt;/a&gt;&lt;br/&gt;
    Reviewed-by: John L. Hammond &amp;lt;john.hammond@intel.com&amp;gt;&lt;br/&gt;
    Tested-by: Jenkins&lt;br/&gt;
    Reviewed-by: Andreas Dilger &amp;lt;andreas.dilger@intel.com&amp;gt;&lt;br/&gt;
    Tested-by: Maloo &amp;lt;hpdd-maloo@intel.com&amp;gt;&lt;br/&gt;
    Reviewed-by: Oleg Drokin &amp;lt;oleg.drokin@intel.com&amp;gt;&lt;/p&gt;


&lt;p&gt;I will generate a patch.&lt;/p&gt;</description>
                <environment>Large DNE system on CentOS, upgrading from 2.5 with remote directories to 2.6/master.  Occurred on some MDSes when trying to start the MDTs.</environment>
        <key id="25050">LU-5153</key>
            <summary>LustreError: 14404:0:(fld_index.c:176:fld_index_create()) ASSERTION( mutex_is_locked(&amp;fld-&gt;lsf_lock) ) failed:</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="1" iconUrl="https://jira.whamcloud.com/images/icons/priorities/blocker.svg">Blocker</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="di.wang">Di Wang</assignee>
                                    <reporter username="paf">Patrick Farrell</reporter>
                        <labels>
                            <label>dne</label>
                            <label>dne2</label>
                            <label>patch</label>
                    </labels>
                <created>Fri, 6 Jun 2014 14:33:12 +0000</created>
                <updated>Mon, 16 Jun 2014 14:29:22 +0000</updated>
                            <resolved>Mon, 16 Jun 2014 14:29:21 +0000</resolved>
                                    <version>Lustre 2.6.0</version>
                                    <fixVersion>Lustre 2.6.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>4</watches>
                                                                            <comments>
                            <comment id="86015" author="paf" created="Fri, 6 Jun 2014 14:44:23 +0000"  >&lt;p&gt;Patch here:&lt;br/&gt;
  &lt;a href=&quot;http://review.whamcloud.com/10631&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/10631&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="86023" author="paf" created="Fri, 6 Jun 2014 15:56:59 +0000"  >&lt;p&gt;A bit of further investigation shows this is happening when upgrading 2.5 DNE systems to 2.6.  This LBUG occurs on all of the non-primary MDSes.  (This also matches what&apos;s expected in the code, as this code is only called when updating those non-primary MDSes.)&lt;/p&gt;</comment>
                            <comment id="86046" author="pjones" created="Fri, 6 Jun 2014 19:56:23 +0000"  >&lt;p&gt;Di&lt;/p&gt;

&lt;p&gt;Could you please review this patch?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="86675" author="jlevi" created="Mon, 16 Jun 2014 14:29:22 +0000"  >&lt;p&gt;Patch landed to Master. Please reopen ticket if more work is needed.&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzwnvj:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>14218</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>