<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:18:03 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-8494] IOR Aborts with EINTR in 2.7 but not 2.5</title>
                <link>https://jira.whamcloud.com/browse/LU-8494</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Patrick Farrel did the analysis on this one. He&apos;ll be submitting a fix for this issue. The following is his analysis:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;From log 33 (Lustre 2.7):&lt;br/&gt;
00010000:00010000:16.0:1470249261.958417:0:8202:0:(ldlm_resource.c:1244:ldlm_resource_add_lock()) ### About to add this lock:&lt;br/&gt;
 ns: husk-OST0004-osc-ffff88082a5d7800 lock: ffff8807cc3d13c0/0xe5c877bd577e5377 lrc: 4/1,0 mode: -&lt;del&gt;/PR res: &lt;span class=&quot;error&quot;&gt;&amp;#91;0x3409618:0x0:0x0&amp;#93;&lt;/span&gt;.0 rrc: 14 type: EXT &lt;span class=&quot;error&quot;&gt;&amp;#91;6710886400-&amp;gt;6715080703&amp;#93;&lt;/span&gt; (req 6710886400&lt;/del&gt;&amp;gt;6715080703) flags: 0x10000000020000 nid: local remote: 0x7df4b36cbdd565b2 expref: -99 pid: 8202 timeout: 0 lvb_type: 1&lt;br/&gt;
00010000:00000001:16.0:1470249261.958420:0:8202:0:(ldlm_lock.c:1859:ldlm_lock_enqueue()) Process leaving via out (rc=0 : 0 : 0x0)&lt;br/&gt;
00010000:00000001:16.0:1470249261.958421:0:8202:0:(ldlm_request.c:253:ldlm_completion_ast()) Process entered&lt;br/&gt;
00010000:00010000:16.0:1470249261.958421:0:8202:0:(ldlm_request.c:266:ldlm_completion_ast()) ### client-side enqueue returned a blocked lock, sleeping ns: husk-OST0004-osc-ffff88082a5d7800 lock: ffff8807cc3d13c0/0xe5c877bd577e5377 lrc: 4/1,0 mode: -&lt;del&gt;/PR res: &lt;span class=&quot;error&quot;&gt;&amp;#91;0x3409618:0x0:0x0&amp;#93;&lt;/span&gt;.0 rrc: 14 type: EXT &lt;span class=&quot;error&quot;&gt;&amp;#91;6710886400-&amp;gt;6715080703&amp;#93;&lt;/span&gt; (req 6710886400&lt;/del&gt;&amp;gt;6715080703) flags: 0x20000 nid: local remote: 0x7df4b36cbdd565b2 expref: -99 pid: 8202 timeout: 0 lvb_type: 1&lt;br/&gt;
00010000:00010000:16.0:1470249261.958424:0:8202:0:(ldlm_request.c:283:ldlm_completion_ast()) ### waiting indefinitely because of NO_TIMEOUT ns: husk-OST0004-osc-ffff88082a5d7800 lock: ffff8807cc3d13c0/0xe5c877bd577e5377 lrc: 4/1,0 mode: -&lt;del&gt;/PR res: &lt;span class=&quot;error&quot;&gt;&amp;#91;0x3409618:0x0:0x0&amp;#93;&lt;/span&gt;.0 rrc: 14 type: EXT &lt;span class=&quot;error&quot;&gt;&amp;#91;6710886400-&amp;gt;6715080703&amp;#93;&lt;/span&gt; (req 6710886400&lt;/del&gt;&amp;gt;6715080703) flags: 0x20000 nid: local remote: 0x7df4b36cbdd565b2 expref: -99 pid: 8202 timeout: 0 lvb_type: 1&lt;/p&gt;

&lt;p&gt;And, much later:&lt;br/&gt;
00010000:00010000:16.0:1470249264.458654:0:8202:0:(ldlm_request.c:310:ldlm_completion_ast()) ### client-side enqueue waking up: failed (&lt;del&gt;4) ns: husk-OST0004-osc-ffff88082a5d7800 lock: ffff8807cc3d13c0/0xe5c877bd577e5377 lrc: 4/1,0 mode: --/PR res: &lt;span class=&quot;error&quot;&gt;&amp;#91;0x3409618:0x0:0x0&amp;#93;&lt;/span&gt;.0 rrc: 9 type: EXT &lt;span class=&quot;error&quot;&gt;&amp;#91;6710886400-&amp;gt;6715080703&amp;#93;&lt;/span&gt; (req 6710886400&lt;/del&gt;&amp;gt;6715080703) flags: 0x20000 nid: local remote: 0x7df4b36cbdd565b2 expref: -99 pid: 8202 timeout: 0 lvb_type: 1&lt;br/&gt;
00010000:00000001:16.0:1470249264.458659:0:8202:0:(ldlm_request.c:311:ldlm_completion_ast()) Process leaving (rc=18446744073709551612 : -4 : fffffffffffffffc)&lt;/p&gt;

&lt;p&gt;Note that in this case, there&apos;s only one user process here.  In several other examples, the ldlm_completion_ast messages are only present for the wakeup with -4, not the going to sleep.  (Just not in the logs.)&lt;/p&gt;

&lt;p&gt;But in those cases, all of the processes wake up at the same time.  I think we&apos;re sending a signal and waking them up - This is expected behavior.&lt;/p&gt;

&lt;p&gt;And now I see why it doesn&apos;t happen on Lustre 2.5.  Lustre 2.5 uses ldlm_completion_ast_async for OST locks like this.  ldlm_completion_ast_async doesn&apos;t sleep (it&apos;s called from the ptlrpcd thread anyway).&lt;/p&gt;

&lt;p&gt;Instead, the userspace threads in Lustre 2.5 are sleeping in cl_lock_state_wait.  That code returns -ERESTARTSYS when interrupted.&lt;/p&gt;

&lt;p&gt;Here&apos;s it being interrupted from the 2.5 logs:&lt;br/&gt;
32.dklog:00000020:00010000:18.0:1470253848.163773:0:8262:0:(cl_lock.c:151:cl_lock_trace0()) state wait lock: ffff880834494c40@(3 ffff88082fe5a3c0 1 1 0 1 1 0)(ffff880830a0e680/0/1) at cl_lock_state_wait():968&lt;br/&gt;
32.dklog:00000020:00000001:19.0:1470253851.931837:0:8264:0:(cl_lock.c:999:cl_lock_state_wait()) Process leaving (rc=18446744073709551104 : -512 : fffffffffffffe00)&lt;br/&gt;
32.dklog:00000020:00000001:3.0:1470253851.931837:0:8263:0:(cl_lock.c:999:cl_lock_state_wait()) Process leaving (rc=18446744073709551104 : -512 : fffffffffffffe00)&lt;br/&gt;
32.dklog:00000020:00000001:18.0:1470253851.931840:0:8262:0:(cl_lock.c:999:cl_lock_state_wait()) Process leaving (rc=18446744073709551104 : -512 : fffffffffffffe00)&lt;br/&gt;
32.dklog:00000020:00000001:17.0:1470253851.931843:0:8260:0:(cl_lock.c:999:cl_lock_state_wait()) Process leaving (rc=18446744073709551104 : -512 : fffffffffffffe00)&lt;br/&gt;
32.dklog:00000020:00000001:16.0:1470253851.931855:0:8258:0:(cl_lock.c:999:cl_lock_state_wait()) Process leaving (rc=18446744073709551104 : -512 : fffffffffffffe00)&lt;br/&gt;
32.dklog:00000020:00000001:18.0:1470253853.940209:0:8262:0:(cl_lock.c:962:cl_lock_state_wait()) Process entered&lt;/p&gt;


&lt;p&gt;Getting back to Lustre 2.7 and ldlm_completion_ast:&lt;br/&gt;
So...  We could return -ERESTARTSYS here, like 2.5 does from the cl_lock_state_wait code?&lt;/p&gt;

&lt;p&gt;Considering other choices: There is no timeout here.  So we can&apos;t do a non-interruptible wait.&lt;/p&gt;

&lt;p&gt;So we probably need to follow the model from osc_enter_cache (look at the l_wait_event there), and translate -EINTR from l_wait_event here in to -ERESTARTSYS.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;If anyone is interested in seeing the logs referenced by patrick I can attach them to the ticket.&lt;/p&gt;</description>
                <environment></environment>
        <key id="38766">LU-8494</key>
            <summary>IOR Aborts with EINTR in 2.7 but not 2.5</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="5">Cannot Reproduce</resolution>
                                        <assignee username="paf">Patrick Farrell</assignee>
                                    <reporter username="hornc">Chris Horn</reporter>
                        <labels>
                    </labels>
                <created>Wed, 10 Aug 2016 18:17:30 +0000</created>
                <updated>Sun, 6 Feb 2022 17:09:57 +0000</updated>
                            <resolved>Sun, 6 Feb 2022 17:09:57 +0000</resolved>
                                                                        <due></due>
                            <votes>0</votes>
                                    <watches>6</watches>
                                                                            <comments>
                            <comment id="161467" author="gerrit" created="Wed, 10 Aug 2016 18:50:33 +0000"  >&lt;p&gt;Patrick Farrell (paf@cray.com) uploaded a new patch: &lt;a href=&quot;http://review.whamcloud.com/21863&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/21863&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-8494&quot; title=&quot;IOR Aborts with EINTR in 2.7 but not 2.5&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-8494&quot;&gt;&lt;del&gt;LU-8494&lt;/del&gt;&lt;/a&gt; ldlm: Return -ERESTARTSYS from wait&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 565e4c46d7e00d344de22ffb7277236bf624031d&lt;/p&gt;</comment>
                            <comment id="161468" author="paf" created="Wed, 10 Aug 2016 18:51:17 +0000"  >&lt;p&gt;I&apos;ve tested the above patch with a reproducer for this problem.&lt;/p&gt;</comment>
                            <comment id="323678" author="spitzcor" created="Mon, 24 Jan 2022 13:53:13 +0000"  >&lt;p&gt;I think it is OK to resolve this and abandon &lt;a href=&quot;https://review.whamcloud.com/#/c/21863/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/#/c/21863/&lt;/a&gt;.  Is it?&lt;/p&gt;</comment>
                            <comment id="325388" author="paf0186" created="Sun, 6 Feb 2022 17:09:36 +0000"  >&lt;p&gt;I agree, Cory - Nothing&apos;s been reported here in an age and beyond.&#160; I&apos;m not sure it&apos;s fixed but there have been a bunch of changes in this area, so I&apos;m comfortable abandoning it and picking it back up if it turns out to be needed.&lt;/p&gt;</comment>
                            <comment id="325389" author="paf0186" created="Sun, 6 Feb 2022 17:09:57 +0000"  >&lt;p&gt;Doesn&apos;t seem to be a problem in newer versions.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="45497">LU-9340</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="18064">LU-3020</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzykdb:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>