<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:34:53 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-3550] Stale file handle on mount when mounting Lustre 2.4 via NFS</title>
                <link>https://jira.whamcloud.com/browse/LU-3550</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;When attempting to mount NFS exported Lustre, the mount operation reports &apos;stale file handle&apos; and fails to complete.  This happens with 2.4 servers and a 2.4 client.  It does NOT happen with a 2.4 client and 2.2 servers.&lt;/p&gt;

&lt;p&gt;Investigation of the NFS traffic between the NFS client and NFS server (Lustre client) shows the NFS client requesting the file handle for the mount, then receiving a file handle back from the server.  There is a bit more chatter, then the client sends back the same file handle as part of an info request.  Then the server responds with a stale file handle error.&lt;/p&gt;

&lt;p&gt;This is happening on both CentOS 6.4 and SLES11SP2 clients.&lt;/p&gt;

&lt;p&gt;I&apos;m attaching a series of logs of this issue.&lt;br/&gt;
Here&apos;s a description of what&apos;s in those logs:&lt;br/&gt;
Lustre MDS (2.4). (Full DK logs provided)&lt;br/&gt;
Lustre Client(2.4)/NFS Server &lt;span class=&quot;error&quot;&gt;&amp;#91;The source of the NFS export&amp;#93;&lt;/span&gt; (Full DK logs &amp;amp; /var/log/messages with nfsd debug on full (0x7FFF))&lt;br/&gt;
NFS Client (/var/log/messages with nfs debug set to 1, and a tcpdump of all traffic)&lt;/p&gt;

&lt;p&gt;For analyzing the tcpdump (if you need it - I suspect the NFS debug logs will make it irrelevant), the IP addresses:&lt;br/&gt;
NFS Server: 172.29.53.155&lt;br/&gt;
NFS Client: 172.29.53.160&lt;/p&gt;

&lt;p&gt;The /var/log/messages logs are not trimmed, sorry. Look for the last debug markers from Lustre in those files and you can line them up with the rest of the logs.&lt;/p&gt;</description>
                <environment></environment>
        <key id="19660">LU-3550</key>
            <summary>Stale file handle on mount when mounting Lustre 2.4 via NFS</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="yong.fan">nasf</assignee>
                                    <reporter username="paf">Patrick Farrell</reporter>
                        <labels>
                    </labels>
                <created>Tue, 2 Jul 2013 17:48:31 +0000</created>
                <updated>Wed, 20 Nov 2013 09:26:05 +0000</updated>
                            <resolved>Mon, 18 Nov 2013 01:30:22 +0000</resolved>
                                    <version>Lustre 2.4.0</version>
                                    <fixVersion>Lustre 2.5.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>10</watches>
                                                                            <comments>
                            <comment id="61922" author="paf" created="Mon, 8 Jul 2013 23:03:15 +0000"  >&lt;p&gt;The underlying issue is that NFS on Linux does not currently support 64 bit root inodes.&lt;/p&gt;

&lt;p&gt;This means that Lustre, with a 2.4 MDS, cannot be exported over NFS.&lt;/p&gt;

&lt;p&gt;After working this out by reading the NFS code, I discovered Andreas has a patch going upstream to the kernel for this:&lt;br/&gt;
&lt;a href=&quot;http://www.spinics.net/lists/linux-nfs/msg37384.html&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://www.spinics.net/lists/linux-nfs/msg37384.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Is this documented somewhere as a known regression?&lt;br/&gt;
Is there any plan to resolve this separately for Lustre clients?&lt;/p&gt;</comment>
                            <comment id="61925" author="paf" created="Mon, 8 Jul 2013 23:31:05 +0000"  >&lt;p&gt;Excuse me - A closer look at the patch from Andreas suggests it&apos;s for a related issue but not exactly the one we&apos;re facing.&lt;/p&gt;

&lt;p&gt;The issue I&apos;m looking at comes up in mk_fsid, called from fh_compose, which is called from exp_rootfh:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;&lt;span class=&quot;code-keyword&quot;&gt;static&lt;/span&gt; inline void mk_fsid(&lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt; vers, u32 *fsidv, dev_t dev, ino_t ino,
                           u32 fsid, unsigned &lt;span class=&quot;code-object&quot;&gt;char&lt;/span&gt; *uuid)
{
        u32 *up;
        &lt;span class=&quot;code-keyword&quot;&gt;switch&lt;/span&gt;(vers) {
        &lt;span class=&quot;code-keyword&quot;&gt;case&lt;/span&gt; FSID_DEV:
                fsidv[0] = htonl((MAJOR(dev)&amp;lt;&amp;lt;16) |
                                 MINOR(dev));
                fsidv[1] = ino_t_to_u32(ino);
                &lt;span class=&quot;code-keyword&quot;&gt;break&lt;/span&gt;;
}
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Where we see the inode being coerced to 32 bits.  This is what goes out on the wire to the client, even though Lustre has 64 bit inodes.&lt;/p&gt;

&lt;p&gt;I will have to look more closely at Andreas&apos;s patch and the issue it&apos;s resolving, as well as the code I noted above, to understand fully what&apos;s going on.&lt;/p&gt;</comment>
                            <comment id="61954" author="paf" created="Tue, 9 Jul 2013 17:49:28 +0000"  >&lt;p&gt;Andreas&apos;s patch is for an issue with parsing 64 bit inode numbers in NFS-utils, and so isn&apos;t involved here.&lt;/p&gt;

&lt;p&gt;The problem is this:&lt;br/&gt;
In NFS, for mount requests, the root of the file system is identified by only the FSID.  The FSID is defined as a 64 bit integer type in NFSv3 (See &lt;a href=&quot;http://tools.ietf.org/html/rfc1813&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://tools.ietf.org/html/rfc1813&lt;/a&gt; for the NFSv3 RFC), but in the default case in Linux, it&apos;s built as two 32 bit integers, one of which is the inode.  For NFSv4, it is two 64 bit types (&lt;a href=&quot;http://www.ietf.org/rfc/rfc3530.txt&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://www.ietf.org/rfc/rfc3530.txt&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;Linux includes an option for a 64 bit inode type, in that case:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;&lt;span class=&quot;code-keyword&quot;&gt;case&lt;/span&gt; FSID_UUID16_INUM:
                &lt;span class=&quot;code-comment&quot;&gt;/* 8 &lt;span class=&quot;code-object&quot;&gt;byte&lt;/span&gt; inode and 16 &lt;span class=&quot;code-object&quot;&gt;byte&lt;/span&gt; fsid */&lt;/span&gt;
                *(u64*)fsidv = (u64)ino;
                memcpy(fsidv+2, uuid, 16);
                &lt;span class=&quot;code-keyword&quot;&gt;break&lt;/span&gt;;
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&amp;#8212;&lt;br/&gt;
The vers value set here is a function of the export options and whether or not it&apos;s a root export.&lt;br/&gt;
Here&apos;s the relevant code, from set_version_and_fsid_type:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;        } &lt;span class=&quot;code-keyword&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (exp-&amp;gt;ex_flags &amp;amp; NFSEXP_FSID) {
                fsid_type = FSID_NUM;
        } &lt;span class=&quot;code-keyword&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (exp-&amp;gt;ex_uuid) {
                &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (fhp-&amp;gt;fh_maxsize &amp;gt;= 64) {
                        &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (is_root_export(exp))
                                fsid_type = FSID_UUID16;
                        &lt;span class=&quot;code-keyword&quot;&gt;else&lt;/span&gt;
                                fsid_type = FSID_UUID16_INUM;
                } &lt;span class=&quot;code-keyword&quot;&gt;else&lt;/span&gt; {
                        &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (is_root_export(exp))
                                fsid_type = FSID_UUID8;
                        &lt;span class=&quot;code-keyword&quot;&gt;else&lt;/span&gt;
                                fsid_type = FSID_UUID4_INUM;
                }
        } &lt;span class=&quot;code-keyword&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (!old_valid_dev(exp_sb(exp)-&amp;gt;s_dev))
                &lt;span class=&quot;code-comment&quot;&gt;/* &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; newer device numbers, we must use a newer fsid format */&lt;/span&gt;
                fsid_type = FSID_ENCODE_DEV;
        &lt;span class=&quot;code-keyword&quot;&gt;else&lt;/span&gt;
                fsid_type = FSID_DEV;
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The export option in question (ex_uuid) is one I can&apos;t quite figure out how to set in the export options.&lt;br/&gt;
The code which parses the export options seems to be expecting the character sequence &quot;uuid&quot; (svc_export_parse), but when specifying -o uuid=&lt;span class=&quot;error&quot;&gt;&amp;#91;some uuid&amp;#93;&lt;/span&gt;, I get an error.&lt;br/&gt;
I can&apos;t find out how to actually set this flag.&lt;/p&gt;

&lt;p&gt;On the other hand, when I do -o fsid= on the export I can specify an integer or a UUID.&lt;br/&gt;
This (either the integer or the UUID)  allows me to mount the NFS export and do normal operations on it with NFSv4 or NFSv3 (at least client side.  I will be testing with an NFSv3 only server today.).&lt;/p&gt;

&lt;p&gt;Presumably this is hitting this case in set_version_and_fsid:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;} &lt;span class=&quot;code-keyword&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (exp-&amp;gt;ex_flags &amp;amp; NFSEXP_FSID) {
                fsid_type = FSID_NUM;
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;In any case, this appears to be a work around.  Longer term, if we don&apos;t wish to have to specify -o fsid=, the NFS code in the kernel would need to change somehow to support 64 bit inodes in FSIDs.&lt;/p&gt;</comment>
                            <comment id="62154" author="paf" created="Thu, 11 Jul 2013 20:19:43 +0000"  >&lt;p&gt;I&apos;ve discussed this internally at Cray with someone with NFS expertise.&lt;/p&gt;

&lt;p&gt;He agrees that this work around (using the -o fsid= option to exportfs when exporting Lustre over NFS) is the appropriate solution, as the only other option is a fairly invasive patch to the NFS code in the Linux kernel.  In light of that, WC may want to update documentation for exporting Lustre over NFS, but no code changes are necessary.&lt;/p&gt;</comment>
                            <comment id="62178" author="yong.fan" created="Fri, 12 Jul 2013 12:58:20 +0000"  >&lt;p&gt;There are two issues for this topic:&lt;/p&gt;

&lt;p&gt;1) Originally, Lustre did not return FSID via statfs() to nfs-utils. This issue has been resolved by the patch &lt;a href=&quot;http://review.whamcloud.com/6493&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/6493&lt;/a&gt;, which has already been landed to master (Lustre-2.5)&lt;/p&gt;

&lt;p&gt;2) The nfs-utils defect of converting 64-bits ino# into 32-bits and causes information lost as to cannot locate the right root. It can be resolved by the patch:&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;diff --git a/utils/mountd/cache.c b/utils/mountd/cache.c
index 517aa62..a7212e7 100644
--- a/utils/mountd/cache.c
+++ b/utils/mountd/cache.c
@@ -388,7 +388,7 @@ struct parsed_fsid {
        &lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt; fsidtype;
        /* We could use a union &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;code-keyword&quot;&gt;this&lt;/span&gt;, but it would be more
         * complicated; why bother? */
-       unsigned &lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt; inode;
+       uint64_t inode;
        unsigned &lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt; minor;
        unsigned &lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt; major;
        unsigned &lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt; fsidnum;
-- 
1.7.1
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;If you have chance, you can test above two patches together for verification. Thanks!&lt;/p&gt;</comment>
                            <comment id="62189" author="paf" created="Fri, 12 Jul 2013 14:54:17 +0000"  >&lt;p&gt;nasf,&lt;/p&gt;

&lt;p&gt;I&apos;ve been trying to build nfs-utils 1.2.3 &lt;span class=&quot;error&quot;&gt;&amp;#91;default in CentOS 6.4&amp;#93;&lt;/span&gt; (without patches, just to verify I can) and I am stuck in a dependency hell, with it not finding various installed packages.  A bit of searching shows that patching has been done to nfs-utils to clean up a lot of unnecessary dependencies, which include the ones I&apos;m dealing with.&lt;br/&gt;
(&lt;a href=&quot;http://www.spinics.net/lists/linux-nfs/msg26388.html&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://www.spinics.net/lists/linux-nfs/msg26388.html&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;However, as I understand it, the kernel nfsd /proc interface has changed since CentOS 6.4 and SLES11SP2, so I can&apos;t just go grab the latest nfs-utils and expect it to work.&lt;/p&gt;

&lt;p&gt;Do you have a particular version you recommend building, or any tips on this?&lt;/p&gt;

&lt;p&gt;I may be able to land that linking patch by itself without problem and will try that next, but I thought I&apos;d ask you as well.&lt;/p&gt;

&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;Patrick&lt;/li&gt;
&lt;/ul&gt;
</comment>
                            <comment id="62246" author="yong.fan" created="Sat, 13 Jul 2013 14:34:17 +0000"  >&lt;p&gt;Above patch is for the latest nfs-utils. If you want to use nfs-utils-1.2.3, then the following one:&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;343,344c343,344
&amp;lt; 	uint64_t inode=0;
&amp;lt; 	uint64_t inode64;
---
&amp;gt; 	unsigned &lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt; inode=0;
&amp;gt; 	unsigned &lt;span class=&quot;code-object&quot;&gt;long&lt;/span&gt; &lt;span class=&quot;code-object&quot;&gt;long&lt;/span&gt; inode64;
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="62247" author="paf" created="Sun, 14 Jul 2013 00:25:45 +0000"  >&lt;p&gt;nasf,&lt;/p&gt;

&lt;p&gt;Has WC tested the latest nfs-utils with CentOS 6.4?  I thought I saw a proc interface change between the CentOS 6.4 kernel and the kernels targeted by 1.2.8, but I could be wrong about that.&lt;/p&gt;

&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;Patrick&lt;/li&gt;
&lt;/ul&gt;
</comment>
                            <comment id="62249" author="yong.fan" created="Sun, 14 Jul 2013 01:42:14 +0000"  >&lt;p&gt;Hi Patrick,&lt;/p&gt;

&lt;p&gt;I downloaded the nfs-untils-1.2.3 source, and patched/compiled/tested on RHEL6 (2.6.32-358.6.1.el6). Not care the proc changes.&lt;/p&gt;</comment>
                            <comment id="69758" author="paf" created="Thu, 24 Oct 2013 15:00:20 +0000"  >&lt;p&gt;I&apos;m not sure what the long term plan is regarding this bug.  The fundamental limitation isn&apos;t in Lustre, and we&apos;ve got an acceptable workaround with setting FSID manually.&lt;/p&gt;

&lt;p&gt;Is further work planned on the Intel side, or should this bug be closed?  Cray is getting along fine with the work around.&lt;/p&gt;</comment>
                            <comment id="71741" author="yong.fan" created="Mon, 18 Nov 2013 01:28:42 +0000"  >&lt;p&gt;We have submitted related patch to the kernel maintainer, and hope the issue can be resolved from root. From Intel side, we cannot do more but waiting for the respond. If you have got your things work, we can close this ticket, and reopen it in future when needed.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="17757">LU-2904</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="21250">LU-4057</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="13103" name="NFS Logs.tar" size="932352" author="paf" created="Tue, 2 Jul 2013 17:48:52 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzvujr:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>8928</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>