<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:24:10 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-9208] getcwd() sometimes fails</title>
                <link>https://jira.whamcloud.com/browse/LU-9208</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;We have an IO test called Miranda which is written in Fortran (attached). It intermittently fails when running in Lustre because getcwd() returns NULL with errno set to ENOENT. This behavior is similar to what was reported in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-645&quot; title=&quot;getcwd fails&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-645&quot;&gt;&lt;del&gt;LU-645&lt;/del&gt;&lt;/a&gt;. We&apos;ve reproduced the problem on a single node on a Lustre 2.8 client server environment as well as Lustre 2.5. Typically the problem will occur within an hour or two when running Miranda continuously in a loop.  Typical invocation is something like&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;cd /p/lquake/some_lustre_dir
srun -N 1 -n 36 -pplustre28 /path/to/miranda_io 100
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The failing run prints out something like&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;forrtl: severe (121): Cannot access current working directory for unit 10739, file &quot;Unknown&quot;
Image              PC                Routine            Line        Source             
miranda_io         000000000040FEE9  Unknown               Unknown  Unknown
miranda_io         000000000041C992  Unknown               Unknown  Unknown
miranda_io         000000000040A4F1  Unknown               Unknown  Unknown
miranda_io         0000000000408F9E  Unknown               Unknown  Unknown
libc.so.6          00002AAAABBAEB35  Unknown               Unknown  Unknown
miranda_io         0000000000408EA9  Unknown               Unknown  Unknown
srun: error: opal93: task 735: Exited with exit code 121
srun: First task exited 30s ago
srun: tasks 0-734,736-1259: running
srun: task 735: exited abnormally
srun: Terminating job step 1166343.0
slurmd[opal38]: *** STEP 1166343.0 KILLED AT 2017-03-02T08:52:13 WITH SIGNAL 9 ***
slurmd[opal40]: *** STEP 1166343.0 KILLED AT 2017-03-02T08:52:13 WITH SIGNAL 9 ***
slurmd[opal35]: *** STEP 1166343.0 KILLED AT 2017-03-02T08:52:13 WITH SIGNAL 9 ***
slurmd[opal36]: *** STEP 1166343.0 KILLED AT 2017-03-02T08:52:13 WITH SIGNAL 9 ***
slurmd[opal41]: *** STEP 1166343.0 KILLED AT 2017-03-02T08:52:13 WITH SIGNAL 9 ***
...
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I haven&apos;t collected debug logs yet but the bug shouldn&apos;t be hard to reproduce in a test environment.&lt;/p&gt;</description>
                <environment></environment>
        <key id="44710">LU-9208</key>
            <summary>getcwd() sometimes fails</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="3">Duplicate</resolution>
                                        <assignee username="bobijam">Zhenyu Xu</assignee>
                                    <reporter username="nedbass">Ned Bass</reporter>
                        <labels>
                    </labels>
                <created>Mon, 13 Mar 2017 22:20:52 +0000</created>
                <updated>Thu, 5 Apr 2018 19:07:01 +0000</updated>
                            <resolved>Thu, 5 Apr 2018 19:07:01 +0000</resolved>
                                    <version>Lustre 2.5.3</version>
                    <version>Lustre 2.8.0</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>7</watches>
                                                                            <comments>
                            <comment id="188355" author="bobijam" created="Wed, 15 Mar 2017 02:37:31 +0000"  >&lt;p&gt;Would you please grab some debug logs and dmesg when you rehit it?&lt;/p&gt;</comment>
                            <comment id="188523" author="nedbass" created="Wed, 15 Mar 2017 21:57:14 +0000"  >&lt;p&gt;I attached a -1 debug log from the client when it hit this bug.&lt;/p&gt;</comment>
                            <comment id="188526" author="nedbass" created="Wed, 15 Mar 2017 22:07:23 +0000"  >&lt;p&gt;No Lustre messages appear in dmesg at the time.&lt;/p&gt;</comment>
                            <comment id="189071" author="bobijam" created="Tue, 21 Mar 2017 07:36:27 +0000"  >&lt;p&gt;What system call returns what error in this case? I see a file ioctrl cmd (0x5401 TCGETS) upon miranda.log failure in the log.&lt;/p&gt;</comment>
                            <comment id="189161" author="nedbass" created="Tue, 21 Mar 2017 20:53:20 +0000"  >&lt;p&gt;The getcwd() system call returns ENOENT.&lt;/p&gt;</comment>
                            <comment id="189162" author="nedbass" created="Tue, 21 Mar 2017 20:57:16 +0000"  >&lt;p&gt;I was appending miranda_io output to miranda.log. I suspect the ioctl failure is unrelated to the getcwd() problem.&lt;/p&gt;</comment>
                            <comment id="189549" author="bobijam" created="Fri, 24 Mar 2017 09:20:33 +0000"  >&lt;p&gt;I didn&apos;t see miranda.log in the attachment in this ticket.&lt;/p&gt;</comment>
                            <comment id="189593" author="nedbass" created="Fri, 24 Mar 2017 16:53:54 +0000"  >&lt;p&gt;There&apos;s nothing useful in miranda.log other than the error message that I included in the description.&lt;/p&gt;</comment>
                            <comment id="189595" author="nedbass" created="Fri, 24 Mar 2017 16:56:50 +0000"  >&lt;p&gt;Our testing team just reported that they may have reproduced this bug on an NFS filesystem. So this may not be a Lustre-specific bug. I&apos;ll close the ticket for now, and will re-open it if we get any further evidence that it&apos;s a Lustre problem.&lt;/p&gt;

&lt;p&gt;Thanks for your time on this issue.&lt;/p&gt;</comment>
                            <comment id="198962" author="nedbass" created="Mon, 12 Jun 2017 22:34:29 +0000"  >&lt;p&gt;Reopening to allow attaching a file.&lt;/p&gt;</comment>
                            <comment id="198969" author="nedbass" created="Mon, 12 Jun 2017 23:44:08 +0000"  >&lt;p&gt;I attached a test case Red Hat provided for a similar issue seen on NFS.&#160;They are interested to know if we can reproduce this on Lustre and NFS. LLNL doesn&apos;t have cycles to spend on this right now so I am uploading here in case others want to take a stab at it. I&apos;d be happy to feed any results back to Red Hat.&lt;/p&gt;</comment>
                            <comment id="200800" author="mhanafi" created="Fri, 30 Jun 2017 19:24:07 +0000"  >&lt;p&gt;We have recently upgraded to Sles12sp2 and lustre 2.9 clients and started to see this issues.&#160;&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="207102" author="adilger" created="Thu, 31 Aug 2017 18:08:50 +0000"  >&lt;p&gt;Ned, are you also running Intel MPI when this problem is hit?  According to &lt;a href=&quot;https://jira.hpdd.intel.com/browse/LU-9735?focusedCommentId=206925&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;the comment&lt;/a&gt; in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9735&quot; title=&quot;Sles12Sp2 and 2.9 getcwd() sometimes fails&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9735&quot;&gt;&lt;del&gt;LU-9735&lt;/del&gt;&lt;/a&gt;, this problem only appears to be relevant to Intel MPI.&lt;/p&gt;</comment>
                            <comment id="207106" author="nedbass" created="Thu, 31 Aug 2017 18:57:26 +0000"  >&lt;blockquote&gt;&lt;p&gt;Ned, are you also running Intel MPI when this problem is hit? According to the comment in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9735&quot; title=&quot;Sles12Sp2 and 2.9 getcwd() sometimes fails&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9735&quot;&gt;&lt;del&gt;LU-9735&lt;/del&gt;&lt;/a&gt;, this problem only appears to be relevant to Intel MPI.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;No. I can reproduce it on a single node non-MPI program. I wrote a simple C program to simulate what Miranda is doing and just ran a bunch of copies of it in the background. Eventually getcwd() fails with errno=ENOENT. I&apos;ll dig up the program and attach it here.&lt;/p&gt;</comment>
                            <comment id="225235" author="adilger" created="Thu, 5 Apr 2018 19:07:01 +0000"  >&lt;p&gt;It appears that the &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9735&quot; title=&quot;Sles12Sp2 and 2.9 getcwd() sometimes fails&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9735&quot;&gt;&lt;del&gt;LU-9735&lt;/del&gt;&lt;/a&gt; fix is relevant to all kernels after 2.6.37.  That patch was landed for 2.11 and 2.10.3.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                                                <inwardlinks description="is duplicated by">
                                        <issuelink>
            <issuekey id="47106">LU-9735</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                                        </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="26976" name="LU-9208-Red-Hat-test-case.tar.gz" size="2631" author="nedbass" created="Mon, 12 Jun 2017 22:35:38 +0000"/>
                            <attachment id="25888" name="LU-9208-lctk_dk-1489609467.log.gz" size="30087498" author="nedbass" created="Wed, 15 Mar 2017 21:56:29 +0000"/>
                            <attachment id="25835" name="miranda_io.F" size="10741" author="nedbass" created="Mon, 13 Mar 2017 22:20:49 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzz6tb:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>