<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:38:33 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-3975] Race loading ldiskfs with parallel mounts</title>
                <link>https://jira.whamcloud.com/browse/LU-3975</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Parallel calls to mount targets when ldiskfs is not already loaded can lead to hitting a race in the kernel when it attempts to load the module, which can result in the second mount failing. This race is not unique to ldiskfs and can affect any module that does not protect itself with some sort of locking mechanism.&lt;/p&gt;

&lt;p&gt;This bug was fixed in kernel-3.7.0, and is documented here: &lt;a href=&quot;https://bugzilla.redhat.com/show_bug.cgi?id=771285&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://bugzilla.redhat.com/show_bug.cgi?id=771285&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Which links to this fix here: &lt;a href=&quot;http://thread.gmane.org/gmane.linux.kernel/1358707/focus=1358709&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://thread.gmane.org/gmane.linux.kernel/1358707/focus=1358709&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We have confirmed that this fix has not been backported to the 2.6.32 kernel yet. We have opened a bug with RedHat regarding the issue here: &lt;a href=&quot;https://bugzilla.redhat.com/show_bug.cgi?id=1009704&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://bugzilla.redhat.com/show_bug.cgi?id=1009704&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This can cause parallel calls to mkfs.lustre to fail as well, as the mounts in ldiskfs_write_ldd can hit this race if ldiskfs is not already loaded.&lt;/p&gt;

&lt;p&gt;I think there are two outstanding questions here:&lt;/p&gt;

&lt;p&gt;(1) Do we want to try to do the backport ourselves and not wait on RedHat?&lt;br/&gt;
(2) Is it safe to explicitly &quot;modprobe ldiskfs&quot; prior to calling mkfs.lustre to protect ourselves against the race? Or could loading the module explicitly cause some other issues with Lustre?&lt;/p&gt;</description>
                <environment></environment>
        <key id="21040">LU-3975</key>
            <summary>Race loading ldiskfs with parallel mounts</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="3">Duplicate</resolution>
                                        <assignee username="hongchao.zhang">Hongchao Zhang</assignee>
                                    <reporter username="kelsey">Kelsey Prantis</reporter>
                        <labels>
                    </labels>
                <created>Thu, 19 Sep 2013 21:21:27 +0000</created>
                <updated>Mon, 23 Dec 2013 21:35:59 +0000</updated>
                            <resolved>Wed, 27 Nov 2013 19:13:38 +0000</resolved>
                                                                        <due></due>
                            <votes>0</votes>
                                    <watches>11</watches>
                                                                            <comments>
                            <comment id="67062" author="adilger" created="Thu, 19 Sep 2013 21:46:25 +0000"  >&lt;p&gt;The ldiskfs module is standalone, so there shouldn&apos;t be any problems loading it explicitly.  It would be good to load it in mount.lustre and mkfs.lustre so that this problem is fixed for all Lustre users and we don&apos;t have to debug it again.  There is also no problem to have IML load ldiskfs and patch mkfs.lustre and mount.lustre to do the same.&lt;/p&gt;

&lt;p&gt;It would also be possible to patch the kernel, since we already patch the RHEL kernel for other reasons, though we are trying to eliminate the kernel patches.  This would be better from the POV that the patch would naturally disappear when the fix is backported (unlike the workarounds), but has the problem that it only works for the specific kernels that are patched (i.e. we&apos;d need to patch SLES11 in addition to RHEL6) and would likely be more effort to catch the various Lustre kernels in use.&lt;/p&gt;

&lt;p&gt;Summary: any/all of the above fixes/workarounds are acceptable.&lt;/p&gt;
</comment>
                            <comment id="67079" author="brian" created="Fri, 20 Sep 2013 00:16:27 +0000"  >&lt;p&gt;From my understanding of the problem, modprobing the module doesn&apos;t really eliminate the race but rather just mitigates it by allowing a sufficient amount of time to go by so that everyone that will want the module will have it loaded by the time they get to a point of mounting something.  So in order for &lt;tt&gt;mkfs.lustre&lt;/tt&gt; to benefit from this work-around, it should issue the &lt;tt&gt;modprobe&lt;/tt&gt; before starting the &lt;tt&gt;mke2fs&lt;/tt&gt; to give as much time as possible before it&apos;s &lt;tt&gt;mount(2)&lt;/tt&gt; will want it to be there.&lt;/p&gt;</comment>
                            <comment id="67232" author="apittman" created="Mon, 23 Sep 2013 13:45:49 +0000"  >&lt;p&gt;This looks like a duplicate of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-1279&quot; title=&quot;failure trying to mount two targets at the same time after boot&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-1279&quot;&gt;&lt;del&gt;LU-1279&lt;/del&gt;&lt;/a&gt;.  We are able to avoid this by pre-loading modules in the correct order although that approach isn&apos;t without it&apos;s problems, in particular the module list that we need to load is very long and Lustre version specific, we&apos;ve also found &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3948&quot; title=&quot;Unable to load lustre module when the filesystem is running.&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3948&quot;&gt;&lt;del&gt;LU-3948&lt;/del&gt;&lt;/a&gt;.&lt;/p&gt;</comment>
                            <comment id="70470" author="pjones" created="Fri, 1 Nov 2013 12:55:48 +0000"  >&lt;p&gt;Hongchao&lt;/p&gt;

&lt;p&gt;Could you please confirm whether this is a duplicate of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-1279&quot; title=&quot;failure trying to mount two targets at the same time after boot&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-1279&quot;&gt;&lt;del&gt;LU-1279&lt;/del&gt;&lt;/a&gt;?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="71135" author="paf" created="Fri, 8 Nov 2013 17:16:45 +0000"  >&lt;p&gt;Hongchao,&lt;/p&gt;

&lt;p&gt;Trying to port this kernel patch back to CentOS 6.4.  It doesn&apos;t land cleanly it all, in fact it seems to depend on a fairly significant rewrite of module.c.&lt;/p&gt;

&lt;p&gt;For example, the second component of this patch shows module_mutex being locked (in load_module), but in 2.6.32, that mutex isn&apos;t referenced in load_module at all, as best I can tell.&lt;/p&gt;

&lt;p&gt;There are other differences, such as there&apos;s no label &apos;free_arch_cleanup&apos; in load_module, and the code that returns EEXIST appears to have been relocated in load_module as well.&lt;/p&gt;

&lt;p&gt;I&apos;m trying to get the kernel git repo checked out so I can try tracking down more of the patch history here, but I&apos;m getting 404s trying to check out the kernel.org Linux repo...  We&apos;ll see.&lt;/p&gt;

&lt;p&gt;Kelsey, has there been any update on the RedHat bug?  It&apos;d be great if they&apos;d do the porting.&lt;/p&gt;

&lt;p&gt;Here&apos;s the problematic part of the patch I was referring to:&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;+again:
 	mutex_lock(&amp;amp;module_mutex);
-	&lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (find_module(mod-&amp;gt;name)) {
+	&lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; ((old = find_module(mod-&amp;gt;name)) != NULL) {
+		&lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (old-&amp;gt;state == MODULE_STATE_COMING) {
+			&lt;span class=&quot;code-comment&quot;&gt;/* Wait in &lt;span class=&quot;code-keyword&quot;&gt;case&lt;/span&gt; it fails to load. */&lt;/span&gt;
+			mutex_unlock(&amp;amp;module_mutex);
+			err = wait_event_interruptible(module_wq,
+					       finished_loading(mod-&amp;gt;name));
+			&lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (err)
+				&lt;span class=&quot;code-keyword&quot;&gt;goto&lt;/span&gt; free_arch_cleanup;
+			&lt;span class=&quot;code-keyword&quot;&gt;goto&lt;/span&gt; again;
+		}
 		err = -EEXIST;
 		&lt;span class=&quot;code-keyword&quot;&gt;goto&lt;/span&gt; unlock;
 	}
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="72425" author="adilger" created="Wed, 27 Nov 2013 19:13:38 +0000"  >&lt;p&gt;Closing this as a duplicate of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-1279&quot; title=&quot;failure trying to mount two targets at the same time after boot&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-1279&quot;&gt;&lt;del&gt;LU-1279&lt;/del&gt;&lt;/a&gt;.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                            <outwardlinks description="duplicates">
                                        <issuelink>
            <issuekey id="13847">LU-1279</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                                        </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="13847">LU-1279</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzw38n:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>10609</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>