<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:57:29 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-12997] getcwd() returns ENOENT on RHEL7</title>
                <link>https://jira.whamcloud.com/browse/LU-12997</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;We have been seeing getcwd() return ENOENT on directories that are, in&lt;br/&gt;
fact, always there. We can reliably reproduce this problem with the&lt;br/&gt;
attached test-getcwd.c code on Lustre Server 2.12.2 and Lustre Client&lt;br/&gt;
2.12.3 on RHEL7.7 as well as many other Lustre version and RHEL7&lt;br/&gt;
version combinations.&lt;/p&gt;

&lt;p&gt;We see reports in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9735&quot; title=&quot;Sles12Sp2 and 2.9 getcwd() sometimes fails&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9735&quot;&gt;&lt;del&gt;LU-9735&lt;/del&gt;&lt;/a&gt; about RHEL7 clients getting an ENOENT return&lt;br/&gt;
from getcwd(), but I don&apos;t understand if a solution is in the works or&lt;br/&gt;
not. We are also not sure if this is a Lustre problem, an RHEL kernel&lt;br/&gt;
problem, or both.&lt;/p&gt;

&lt;p&gt;The LD_PRELOAD workaround from &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9735&quot; title=&quot;Sles12Sp2 and 2.9 getcwd() sometimes fails&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9735&quot;&gt;&lt;del&gt;LU-9735&lt;/del&gt;&lt;/a&gt; is working for us, but I am&lt;br/&gt;
wondering if there is a proper solution pending. Is there anything we&lt;br/&gt;
can do to help?&lt;/p&gt;</description>
                <environment>RHEL7.7 as well as other minor versions of RHEL7 on x86_64.</environment>
        <key id="57438">LU-12997</key>
            <summary>getcwd() returns ENOENT on RHEL7</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="6">Not a Bug</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="krowe">K. Scott Rowe</reporter>
                        <labels>
                    </labels>
                <created>Thu, 21 Nov 2019 22:55:17 +0000</created>
                <updated>Mon, 13 Jul 2020 20:47:28 +0000</updated>
                            <resolved>Mon, 13 Jul 2020 20:47:28 +0000</resolved>
                                    <version>Lustre 2.12.2</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>9</watches>
                                                                            <comments>
                            <comment id="258666" author="simmonsja" created="Fri, 22 Nov 2019 00:32:01 +0000"  >&lt;p&gt;This is a race in&#160;ll_splice_alias() due to the use of d_move() when the inode is for a directories. Their are fixes for this but it corrects the way Lustre handles it dcache which causes other types of breakage. I tried some ideas to fix this but its a work in progress. Basically the bug is $MOUNT/.lustre/fid/$fid_for_mount means that dentry $fid_for_mount == dentry $MOUNT which causes a circular loop that crashes the node. I might be able to handle this special case using d_real() that is in newer kernels.&lt;/p&gt;</comment>
                            <comment id="258753" author="neilb" created="Mon, 25 Nov 2019 00:34:12 +0000"  >&lt;p&gt;This sounds like the bug fixed upstream by&lt;/p&gt;

&lt;p&gt;Commit 61647823aa92 (&quot;VFS: close race between getcwd() and d_move()&quot;)&lt;/p&gt;

&lt;p&gt;Fixed in v4.16&lt;/p&gt;

&lt;p&gt;Probably the best approach for lustre supporting older kernels is to copy d_drop() from a newer kernel into libcfs, and use that instead of the exported d_drop.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="264795" author="simmonsja" created="Fri, 6 Mar 2020 16:20:14 +0000"  >&lt;p&gt;I submitted a bugzilla:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://bugzilla.redhat.com/show_bug.cgi?id=1811124&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://bugzilla.redhat.com/show_bug.cgi?id=1811124&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="264824" author="krowe" created="Fri, 6 Mar 2020 22:21:55 +0000"  >&lt;p&gt;Thanks for continuing to work on this.&#160; I have some good news.&#160; We upgraded our servers from Lustre-2.5.5 to Lustre-2.10.8 on Feb. 20, 2020 and the problem has been greatly reduced.&lt;/p&gt;

&lt;p&gt;The test program (test-getcwd.c) still fails when run on our Lustre filesystem but the number of times a user has run into this getcwd() problem with other code since we upgraded our Lustre servers has dropped to almost zero.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="265094" author="simmonsja" created="Wed, 11 Mar 2020 12:48:59 +0000"  >&lt;p&gt;I was told by RedHat that a fix was landed to RHEL7.8&lt;/p&gt;</comment>
                            <comment id="266803" author="krowe" created="Fri, 3 Apr 2020 18:53:27 +0000"  >&lt;p&gt;W installed a machine with RHEL-7.8 using kernel 3.10.0-1127.el7.x86_64 and while I see a bug fix for a getcwd problem&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;$ rpm -qi --changelog kernel|grep getcwd
- [fs] vfs: close race between getcwd() and d_move() (Miklos Szeredi) [1631631]&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://bugzilla.redhat.com/show_bug.cgi?id=1631631&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;1631631&lt;/a&gt; is a different bug id than &lt;a href=&quot;https://bugzilla.redhat.com/show_bug.cgi?id=1811124&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;1811124&lt;/a&gt; that James A Simmons reported.&#160; And, I can still reproduce the problem on our Lustre-2.10.8 filesystem using the 2.12.4 client on RHEL-7.8.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;$ ./test-getcwd /lustre/aoc/sciops/krowe/tmp
test-getcwd: test-getcwd.c:44: main: Assertion `rc == 0&apos; failed.
Aborted
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="267550" author="simmonsja" created="Tue, 14 Apr 2020 13:13:36 +0000"  >&lt;p&gt;RHEL7.8 contains a fix so this can be closed. If people encounter this issue please move to RHEL7.8&lt;/p&gt;</comment>
                            <comment id="267556" author="krowe" created="Tue, 14 Apr 2020 14:18:28 +0000"  >&lt;p&gt;Perhaps I wasn&apos;t clear enough.&lt;/p&gt;

&lt;p&gt;This is still a problem.&lt;/p&gt;

&lt;p&gt;I can still reproduce this error with RHEL7.8 using the test-getcwd.c program above.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="267567" author="simmonsja" created="Tue, 14 Apr 2020 14:32:33 +0000"  >&lt;p&gt;Sigh. RedHat claimed this was fixed. Its going to take some push to get them to resolve this. I don&apos;t have the power to resolve this. Some one with greater influence with RedHat will have to discuss a fix.&lt;/p&gt;</comment>
                            <comment id="267568" author="simmonsja" created="Tue, 14 Apr 2020 14:38:14 +0000"  >&lt;p&gt;Peter can you take over this issue since you seem to have better relations with RedHat to resolve this.&lt;/p&gt;</comment>
                            <comment id="267569" author="krowe" created="Tue, 14 Apr 2020 14:38:34 +0000"  >&lt;p&gt;Do you have the ability to test this on an RHEL7.8 host?&#160; It would be good to have a second data point.&#160; I suppose it is possible I am seeing this issue with our RHEL7.8 host for some other reason that I can&apos;t think of.&lt;/p&gt;</comment>
                            <comment id="270589" author="krowe" created="Tue, 19 May 2020 20:01:21 +0000"  >&lt;p&gt;The kernel was just upgraded on my test RHEL-7.8 machine.&#160; It is now running (3.10.0-1127.8.2.el7.x86_64) and I no longer get getcwd() failures&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;$ ./test-getcwd /lustre/aoc/sciops/krowe/tmp
getcwd succeeded&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;I don&apos;t understand why this failed with kernel 3.10.0-1127.el7.x86_64 and works now but assuming it continues to work after more kernel updates I would say this problem may be fixed.&#160; Again, if you have the ability to check this yourself, please do.&#160; My environment may be customized in strange ways.&lt;/p&gt;</comment>
                            <comment id="275286" author="krowe" created="Mon, 13 Jul 2020 20:45:15 +0000"  >&lt;p&gt;I have since tested a later kernel, 3.10.0-1127.13.1.el7.x86_64, and it also works.&#160; So I think the solution is to upgrade to at least kernel 3.10.0-1127.el7.x86_64.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;This ticket can be closed.&#160; Thanks for your help.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="47785">LU-9868</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="33906" name="test-getcwd.c" size="1289" author="krowe" created="Thu, 21 Nov 2019 22:53:54 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i00psf:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>