<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:35:08 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-3581] Recurrence of LU-3020: Lustre returns EINTR during writes when SA_RESTART is set </title>
                <link>https://jira.whamcloud.com/browse/LU-3581</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;This is the same issue as described in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3020&quot; title=&quot;Lustre returns EINTR during writes when SA_RESTART is set&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3020&quot;&gt;&lt;del&gt;LU-3020&lt;/del&gt;&lt;/a&gt;, where EINTR is returned instead of ERESTARTSYS during writes.  This issue is caught by the same reproducer as for &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3020&quot; title=&quot;Lustre returns EINTR during writes when SA_RESTART is set&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3020&quot;&gt;&lt;del&gt;LU-3020&lt;/del&gt;&lt;/a&gt;, but the cause is different.&lt;/p&gt;

&lt;p&gt;As I did not hit this issue while testing the fix for &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3020&quot; title=&quot;Lustre returns EINTR during writes when SA_RESTART is set&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3020&quot;&gt;&lt;del&gt;LU-3020&lt;/del&gt;&lt;/a&gt;, I suspect this has been introduced by some subsequent patch.  We are seeing this against 2.4 release branch.&lt;/p&gt;

&lt;p&gt;This issue is easy to hit without debugging enabled, and very hard to hit with debugging enabled.&lt;/p&gt;

&lt;p&gt;Here is the relevant portion of the trace logs:&lt;br/&gt;
&amp;#8212;&lt;br/&gt;
00000008:00000001:4.0:1372452012.457494:0:13003:0:(osc_cache.c:2206:osc_queue_async_io()) Process entered&lt;br/&gt;
00000008:00000001:4.0:1372452012.457495:0:13003:0:(osc_cache.c:543:osc_extent_release()) Process entered&lt;br/&gt;
00000008:00000001:4.0:1372452012.457496:0:13003:0:(osc_cache.c:240:osc_extent_sanity_check0()) Process leaving via out (rc=0 : 0 : 0x0)&lt;br/&gt;
00000008:00000001:4.0:1372452012.457498:0:13003:0:(osc_cache.c:1616:osc_makes_rpc()) Process entered&lt;br/&gt;
00000008:00000001:4.0:1372452012.457499:0:13003:0:(osc_cache.c:1662:osc_makes_rpc()) Process leaving (rc=0 : 0 : 0)&lt;br/&gt;
00000008:00000001:4.0:1372452012.457500:0:13003:0:(osc_cache.c:1616:osc_makes_rpc()) Process entered&lt;br/&gt;
00000008:00000001:4.0:1372452012.457501:0:13003:0:(osc_cache.c:1652:osc_makes_rpc()) Process leaving (rc=0 : 0 : 0)&lt;br/&gt;
00000008:00000001:4.0:1372452012.457502:0:13003:0:(osc_cache.c:575:osc_extent_release()) Process leaving (rc=0 : 0 : 0)&lt;br/&gt;
00000008:00000001:4.0:1372452012.457503:0:13003:0:(osc_cache.c:1506:osc_enter_cache()) Process entered&lt;br/&gt;
00000100:00000001:1.0F:1372452012.457511:0:5940:0:(ptlrpcd.c:293:ptlrpcd_check()) Process entered&lt;br/&gt;
00000008:00000001:4.0:1372452012.457512:0:13003:0:(osc_cache.c:1549:osc_enter_cache()) Process leaving via out (rc=18446744073709551612 : -4 : 0xfffffffffffffffc)&lt;br/&gt;
00000100:00000001:0.0F:1372452012.457512:0:5941:0:(ptlrpcd.c:293:ptlrpcd_check()) Process entered&lt;br/&gt;
00000100:00000001:1.0:1372452012.457513:0:5940:0:(client.c:1486:ptlrpc_check_set()) Process entered&lt;br/&gt;
00000100:00000001:1.0:1372452012.457513:0:5940:0:(client.c:1561:ptlrpc_check_set()) Process leaving via interpret (rc=0 : 0 : 0x0)&lt;br/&gt;
00000008:00000001:4.0:1372452012.457514:0:13003:0:(osc_cache.c:1564:osc_enter_cache()) Process leaving (rc=18446744073709551612 : -4 : fffffffffffffffc)&lt;br/&gt;
00000100:00000001:0.0:1372452012.457514:0:5941:0:(ptlrpcd.c:395:ptlrpcd_check()) Process leaving (rc=0 : 0 : 0)&lt;br/&gt;
00000008:00000001:4.0:1372452012.457515:0:13003:0:(osc_cache.c:2352:osc_queue_async_io()) Process leaving (rc=18446744073709551612 : -4 : fffffffffffffffc)&lt;br/&gt;
&amp;#8212;&lt;/p&gt;

&lt;p&gt;This is hit during writes, specifically during ll_commit_write.  I will be attaching the full log.&lt;/p&gt;

&lt;p&gt;This is happening due to a signal arriving during the following l_wait_event call, in osc_enter_cache:&lt;br/&gt;
                CDEBUG(D_CACHE, &quot;%s: sleeping for cache space @ %p for %p\n&quot;,&lt;br/&gt;
                       cli-&amp;gt;cl_import-&amp;gt;imp_obd-&amp;gt;obd_name, &amp;amp;ocw, oap);&lt;/p&gt;

&lt;p&gt;                rc = l_wait_event(ocw.ocw_waitq, ocw_granted(cli, &amp;amp;ocw), &amp;amp;lwi);&lt;/p&gt;

&lt;p&gt;                client_obd_list_lock(&amp;amp;cli-&amp;gt;cl_loi_list_lock);&lt;/p&gt;

&lt;p&gt;                /* l_wait_event is interrupted by signal */&lt;br/&gt;
                if (rc &amp;lt; 0) &lt;/p&gt;
{
                        cfs_list_del_init(&amp;amp;ocw.ocw_entry);
                        GOTO(out, rc);
                }
&lt;p&gt;&amp;#8212;&lt;/p&gt;

&lt;p&gt;I will attach full trace logs.  Search for -4 in the log to find the EINTR.&lt;/p&gt;

&lt;p&gt;The question is: Is it safe to return ERESTARTSYS here, instead of EINTR?  &lt;/p&gt;

&lt;p&gt;More generally, Lustre&apos;s default behavior in l_wait_event is to return EINTR.  Should we consider changing this to ERESTARTSYS and making EINTR the exceptional case?  (This may be a terrible idea - I&apos;m just floating it out of curiositiy.)&lt;/p&gt;</description>
                <environment></environment>
        <key id="19797">LU-3581</key>
            <summary>Recurrence of LU-3020: Lustre returns EINTR during writes when SA_RESTART is set </summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="pjones">Peter Jones</assignee>
                                    <reporter username="paf">Patrick Farrell</reporter>
                        <labels>
                            <label>mn4</label>
                            <label>yuc2</label>
                    </labels>
                <created>Fri, 12 Jul 2013 21:25:37 +0000</created>
                <updated>Mon, 23 Dec 2013 06:10:43 +0000</updated>
                            <resolved>Tue, 3 Sep 2013 13:36:22 +0000</resolved>
                                    <version>Lustre 2.4.0</version>
                                    <fixVersion>Lustre 2.5.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>6</watches>
                                                                            <comments>
                            <comment id="62310" author="paf" created="Mon, 15 Jul 2013 18:10:31 +0000"  >&lt;p&gt;Attached is the reproducer for this.  This is a slightly improved version of the reproducer for &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3020&quot; title=&quot;Lustre returns EINTR during writes when SA_RESTART is set&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3020&quot;&gt;&lt;del&gt;LU-3020&lt;/del&gt;&lt;/a&gt;, which randomizes the timing of the operations slightly.&lt;/p&gt;

&lt;p&gt;The shell script just helps to attempt to gather debug information.&lt;/p&gt;

&lt;p&gt;With debug set to 0, the reproducer hits the problem most of the time.  With debugging set to +trace, it is very difficult to hit and may need to be looped for an hour or more.&lt;/p&gt;</comment>
                            <comment id="62325" author="pjones" created="Mon, 15 Jul 2013 20:05:20 +0000"  >&lt;p&gt;Niu&lt;/p&gt;

&lt;p&gt;Could you please look into this one?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="62341" author="niu" created="Tue, 16 Jul 2013 02:03:26 +0000"  >&lt;p&gt;Yes, seems we&apos;d return -ERESTARTSYS in such case too (interrupted by signal in osc_enter_cache()). I think we should return -ERESTARTSYS only in read/write path but not in l_wait_event(). Patrick, would you post a patch for this? Thanks. &lt;/p&gt;</comment>
                            <comment id="62369" author="paf" created="Tue, 16 Jul 2013 14:16:33 +0000"  >&lt;p&gt;Niu,&lt;/p&gt;

&lt;p&gt;I&apos;ll put the patch up shortly.  You said in the read/write path, so would you like it changed in osc_enter_cache?  Or somewhere else in the path?&lt;/p&gt;

&lt;p&gt;This is what I was thinking:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt; &lt;span class=&quot;code-comment&quot;&gt;/* l_wait_event is interrupted by signal */&lt;/span&gt;
                &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (rc &amp;lt; 0) {
                        &lt;span class=&quot;code-comment&quot;&gt;/* Ensures restartability - LU-3581 */&lt;/span&gt;
                        &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt;(rc == -EINTR)
                                rc = -ERESTARTSYS;
                        cfs_list_del_init(&amp;amp;ocw.ocw_entry);
                        GOTO(out, rc);
                }
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="62370" author="niu" created="Tue, 16 Jul 2013 14:23:41 +0000"  >&lt;p&gt;Patrick, yes, that&apos;s exactly what I meant.&lt;/p&gt;</comment>
                            <comment id="62371" author="paf" created="Tue, 16 Jul 2013 14:27:04 +0000"  >&lt;p&gt;I&apos;ve put a patch with that in it up here:&lt;br/&gt;
&lt;a href=&quot;http://review.whamcloud.com/7002&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/7002&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I&apos;ll be testing whether it fixes my reproducer soon and will report back.&lt;/p&gt;</comment>
                            <comment id="62375" author="paf" created="Tue, 16 Jul 2013 15:31:15 +0000"  >&lt;p&gt;Testing with the reproducer indicates this change resolves the issue.&lt;/p&gt;

&lt;p&gt;Thank you, niu.&lt;/p&gt;</comment>
                            <comment id="62901" author="paf" created="Wed, 24 Jul 2013 15:19:54 +0000"  >&lt;p&gt;Niu,&lt;/p&gt;

&lt;p&gt;I&apos;ve added you as a reviewer; could you also re-start the testing process?  The test failure doesn&apos;t appear to be related to the patch.&lt;/p&gt;

&lt;p&gt;Thanks,&lt;br/&gt;
paf&lt;/p&gt;</comment>
                            <comment id="65596" author="pjones" created="Tue, 3 Sep 2013 13:36:22 +0000"  >&lt;p&gt;Landed for 2.5&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="18064">LU-3020</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="13158" name="eintrlog.tail" size="1149718" author="paf" created="Fri, 12 Jul 2013 21:25:37 +0000"/>
                            <attachment id="13169" name="new_eintr_test.c" size="4288" author="paf" created="Mon, 15 Jul 2013 18:10:31 +0000"/>
                            <attachment id="13170" name="test.sh" size="367" author="paf" created="Mon, 15 Jul 2013 18:10:31 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzvvan:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9074</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>