<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:15:29 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-1307] Clients having issues mounting Lustre</title>
                <link>https://jira.whamcloud.com/browse/LU-1307</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Customer reports that some clients have difficulties mounting Lustre filesystems.  Running lustre_rmmod then mount -at lustre seemes to clear up the problem.  This is right after a reboot of the system.&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;root@dtn1 ~&amp;#93;&lt;/span&gt;# mount -at lustre&lt;br/&gt;
mount.lustre: mount 10.174.80.40@o2ib2:10.174.80.41@o2ib2:/scratch1 at /scratch1 failed: No such device&lt;br/&gt;
Are the lustre modules loaded?&lt;br/&gt;
Check /etc/modprobe.conf and /proc/filesystems&lt;br/&gt;
Note &apos;alias lustre llite&apos; should be removed from modprobe.conf&lt;br/&gt;
mount.lustre: mount 10.174.80.42@o2ib2:10.174.80.43@o2ib2:/scratch2 at /scratch2 failed: No such device&lt;br/&gt;
Are the lustre modules loaded?&lt;br/&gt;
Check /etc/modprobe.conf and /proc/filesystems&lt;br/&gt;
Note &apos;alias lustre llite&apos; should be removed from modprobe.conf&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@dtn1 ~&amp;#93;&lt;/span&gt;# lustre_rmmod&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@dtn1 ~&amp;#93;&lt;/span&gt;# mount -at lustre&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@dtn1 ~&amp;#93;&lt;/span&gt;# df -h&lt;br/&gt;
Filesystem            Size  Used Avail Use% Mounted on&lt;br/&gt;
/dev/mapper/vg_dtn1-lv_root&lt;br/&gt;
                       50G   17G   31G  36% /&lt;br/&gt;
tmpfs                  24G     0   24G   0% /dev/shm&lt;br/&gt;
/dev/sda1             485M   52M  408M  12% /boot&lt;br/&gt;
10.174.80.40@o2ib2:10.174.80.41@o2ib2:/scratch1&lt;br/&gt;
                      2.5P  288T  2.2P  12% /scratch1&lt;br/&gt;
10.174.80.42@o2ib2:10.174.80.43@o2ib2:/scratch2&lt;br/&gt;
                      3.1P  427T  2.7P  14% /scratch2&lt;/p&gt;

&lt;p&gt;/etc/fstab:&lt;br/&gt;
...&lt;br/&gt;
10.174.80.40@o2ib2:10.174.80.41@o2ib2:/scratch1 /scratch1 lustre defaults,flock 0 0&lt;br/&gt;
10.174.80.42@o2ib2:10.174.80.43@o2ib2:/scratch2 /scratch2 lustre defaults,flock 0 0&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;root@dtn1 ~&amp;#93;&lt;/span&gt;# cat /etc/modprobe.d/lustre.conf &lt;/p&gt;
&lt;ol&gt;
	&lt;li&gt;Lustre module configuration file&lt;br/&gt;
options lnet networks=&quot;o2ib2(ib0)&quot;&lt;/li&gt;
&lt;/ol&gt;


&lt;p&gt;Also, I have attached /var/log/messages showing the recent boot and the lustre errors reported.&lt;/p&gt;

&lt;p&gt;You can see in the log that I ran mount -at lustre at Apr 11 13:14:20.&lt;br/&gt;
Then I ran lustre_rmmod and mount -at lustre and it worked.&lt;/p&gt;

&lt;p&gt;The customer is asking why this is happening and I do not have an explanation.&lt;br/&gt;
I encountered similar issues on other clients after a reboot of the entire system.&lt;/p&gt;</description>
                <environment>Servers:  CentOS 5.5&lt;br/&gt;
Clients: RHEL 6.0</environment>
        <key id="13943">LU-1307</key>
            <summary>Clients having issues mounting Lustre</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="5">Cannot Reproduce</resolution>
                                        <assignee username="doug">Doug Oucharek</assignee>
                                    <reporter username="dnelson@ddn.com">Dennis Nelson</reporter>
                        <labels>
                    </labels>
                <created>Wed, 11 Apr 2012 09:45:38 +0000</created>
                <updated>Mon, 29 May 2017 04:08:58 +0000</updated>
                            <resolved>Mon, 29 May 2017 04:08:58 +0000</resolved>
                                    <version>Lustre 1.8.6</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>3</watches>
                                                                            <comments>
                            <comment id="34525" author="cliffw" created="Wed, 11 Apr 2012 10:36:49 +0000"  >&lt;p&gt;Are you certain the servers have finished recovery after the reboot? Please examine the client system log, there should be LustreErrors there which may provide more information&lt;/p&gt;</comment>
                            <comment id="34527" author="dnelson@ddn.com" created="Wed, 11 Apr 2012 10:46:25 +0000"  >&lt;p&gt;Yes, I am sure that recovery was complete.  The servers were booted yesterday and have been back in production for over 12 hours.  I included the messages file on the original post, it had lustre errors in it.&lt;/p&gt;

&lt;p&gt;I believe that what might be happening is that the system is attempting to mount the filesystems before the IB network is functioning and it puts the system in an error state that it cannot recover from without unloading the modules.  Is that possible?  Shouldn&apos;t a new mount request attempt to make the communication to the servers again instead of just erroring out because there was an error previously?&lt;/p&gt;</comment>
                            <comment id="38938" author="cliffw" created="Wed, 16 May 2012 14:19:14 +0000"  >&lt;p&gt;You should use the _netdev option, and thus avoid Lustre client mount attempts prior to network startup. The explanation is simple: you are trying&lt;br/&gt;
to mount a network file system before you have a live network. I am not sure why you would need a module unload, that should not be necessary. Simply waiting for the net to be up should be enough. &lt;/p&gt;</comment>
                            <comment id="39915" author="pjones" created="Mon, 4 Jun 2012 06:00:11 +0000"  >&lt;p&gt;Dennis&lt;/p&gt;

&lt;p&gt;Any further questions or can we close this ticket?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="39922" author="ndauchy" created="Mon, 4 Jun 2012 10:44:38 +0000"  >&lt;p&gt;IMHO this is still a bug.  Yes, the _netdev option can help.  However, the lustre client &lt;b&gt;should&lt;/b&gt; gracefully handle problems when it tries to mount prior to the IB net being fully up, and a remount &lt;b&gt;should&lt;/b&gt; be sufficient.  The need for lustre_rmmod is not intuitive to systems admins.  It can even be problematic if the client has another (active) Lustre mount and it is therefore impossible to unload the lustre modules.&lt;/p&gt;

&lt;p&gt;Thanks,&lt;br/&gt;
Nathan&lt;/p&gt;</comment>
                            <comment id="39953" author="adilger" created="Mon, 4 Jun 2012 16:53:15 +0000"  >&lt;p&gt;Some notes here:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;I agree with Cliff that using _netdev can avoid this problem in most cases, which is what I use at home, but it isn&apos;t totally clear whether the IB network startup is treated the same as ethernet or not (i.e. ensure that the &quot;mount _netdev filesystems&quot; step will be appropriately delayed until after IB is up).&lt;/li&gt;
	&lt;li&gt;I agree with Nathan that this is still a problem, since there can be other network problems that result in this &quot;sticky&quot; error (even with TCP), and it should be addressed.&lt;/li&gt;
	&lt;li&gt;the &quot;mount.lustre&quot; command allows a &quot;retry=N&quot; mount option that will allow the client to repeat the mount up to N times on failure, but this may not be enough in this case, and can potentially hang the rest of the startup process.&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;I believe that the root of the problem is with the ptlrpc module, since it starts the network connections when loaded, and may not retry establishing those connections if the network device was originally unavailable when it started.&lt;/p&gt;</comment>
                            <comment id="197386" author="adilger" created="Mon, 29 May 2017 04:08:58 +0000"  >&lt;p&gt;Close old ticket.&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="11133" name="messages" size="89762" author="dnelson@ddn.com" created="Wed, 11 Apr 2012 09:45:38 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10490" key="com.atlassian.jira.plugin.system.customfieldtypes:datepicker">
                        <customfieldname>End date</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Thu, 7 Jun 2012 09:45:38 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzw0cn:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>10136</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10493" key="com.atlassian.jira.plugin.system.customfieldtypes:datepicker">
                        <customfieldname>Start date</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Wed, 11 Apr 2012 09:45:38 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    </customfields>
    </item>
</channel>
</rss>