<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:23:51 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-2276] is open() idempotent in regards to being restarted after a signal interrupts it?</title>
                <link>https://jira.whamcloud.com/browse/LU-2276</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;On our EL6 jenkins builder, we do all of the build work on a Lustre 2.1.3 system.&lt;/p&gt;

&lt;p&gt;Occasionally and sporadically we will see the following from a git checkout command:&lt;/p&gt;

&lt;p&gt;error: git checkout-index: unable to create file foo (File exists)&lt;/p&gt;

&lt;p&gt;Through a very basic grepping and following of the source it seems that the core of the error message is coming from write_entry() in entry.c:&lt;/p&gt;

&lt;p&gt;		fd = open_output_fd(path, ce, to_tempfile);&lt;br/&gt;
		if (fd &amp;lt; 0) &lt;/p&gt;
{
			free(new);
			return error(&quot;unable to create file %s (%s)&quot;,
				path, strerror(errno));
		}

&lt;p&gt;So looking into open_output_fd() there is a call to create_file() which does:&lt;/p&gt;

&lt;p&gt;	return open(path, O_WRONLY | O_CREAT | O_EXCL, mode);&lt;/p&gt;

&lt;p&gt;I am able to prevent the problem from happening with 100% success by simply giving the git checkout a &quot;-q&quot; argument to prevent it from emitting progress reports.  This would seem to indicate that the problem likely revolves around the fact that the progress reporting uses SIGALRM.&lt;/p&gt;

&lt;p&gt;Given that O_CREAT | O_EXCL are used in the open() call and that SIGALRM (along with SA_RESTART) is being used frequently to do progress updates, it seems reasonable to suspect that the problem is that open() is being interrupted (but only after it creates the file and before completing) by the progress reporting mechanism&apos;s SIGALRM and when the progress reporting is done, open() is restarted automatically (due to the use of SA_RESTART) and fails because the file exists and O_CREAT | O_EXCL are used in the open() call.&lt;/p&gt;

&lt;p&gt;Does this seem like a reasonable hypothesis?&lt;/p&gt;

&lt;p&gt;If it does, where does the problem lie here?  Is it that SA_RESTART should not be used since it&apos;s not safe with open() and O_CREAT | O_EXCL (and every system call caller should be handling EINTR) or should the open() be idempotent so that it can be restarted automatically with SA_RESTART?  If open() is not required to be idempotent and this failure is legal and expected, a citation would be useful in getting the git folks to fix their code.&lt;/p&gt;

&lt;p&gt;If open() is not required to be idempotent, it&apos;s use with O_CREAT | O_EXCL and SA_RESTART seems fatally flawed and I&apos;d like to be able to point that out to the git maintainers, but as above, it would be useful to be able to present some proof that idempotency is not required and they would need to be able to hand this EINTR themselves rather than relying on SA_RESTART.&lt;/p&gt;</description>
                <environment>EL6</environment>
        <key id="16567">LU-2276</key>
            <summary>is open() idempotent in regards to being restarted after a signal interrupts it?</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="5">Cannot Reproduce</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="brian">Brian Murrell</reporter>
                        <labels>
                    </labels>
                <created>Mon, 5 Nov 2012 11:27:19 +0000</created>
                <updated>Thu, 9 Jan 2020 06:22:37 +0000</updated>
                            <resolved>Thu, 9 Jan 2020 06:22:37 +0000</resolved>
                                    <version>Lustre 2.1.3</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>3</watches>
                                                                            <comments>
                            <comment id="47525" author="rhenwood" created="Wed, 7 Nov 2012 09:20:19 +0000"  >&lt;p&gt;I have received a report of similar sounding behaviour of git against 1.8. It would be great if a reproducer was available.&lt;/p&gt;</comment>
                            <comment id="47526" author="brian" created="Wed, 7 Nov 2012 09:35:52 +0000"  >&lt;p&gt;A reproducer would probably not be terribly difficult to write.  I would start with a loop of opening files with &lt;tt&gt;O_CREAT | O_EXCL&lt;/tt&gt; with perhaps a pause between and at the same time having a series of &lt;tt&gt;SIGALRM&lt;/tt&gt; s going off, again possibly with some pause between.&lt;/p&gt;

&lt;p&gt;You could either set the &lt;tt&gt;SA_RESTART&lt;/tt&gt; flag and wait for the &lt;tt&gt;open()&lt;/tt&gt; to fail or not set the &lt;tt&gt;SA_RESTART&lt;/tt&gt; flag and test your &lt;tt&gt;open()&lt;/tt&gt; s for failure and &lt;tt&gt;errno == EINTR&lt;/tt&gt;.  It would be neat to examine the filesystem state when the &lt;tt&gt;EINTR&lt;/tt&gt; happens to see if the namespace entry from the &lt;tt&gt;open()&lt;/tt&gt; is created and thus would cause another call to &lt;tt&gt;open()&lt;/tt&gt; with the same name to fail with &lt;tt&gt;EEXIST&lt;/tt&gt;, which is essentially what is happening with &lt;tt&gt;SA_RESTART&lt;/tt&gt; set.&lt;/p&gt;

&lt;p&gt;I suppose some understanding of how signal handling and restartable system calls work would be a prerequisite to writing such a reproducer though.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="16864">LU-2440</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzvbov:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>5437</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>