<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:13:35 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-7981] double read of lli_trunc_sem in ll_page_mkwrite and vvp_io_fault_start leads to deadlock</title>
                <link>https://jira.whamcloud.com/browse/LU-7981</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;After applying the patch for &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7927&quot; title=&quot;Deadlock between ll_setattr() and ll_file_write()-&amp;gt;ll_fsync()&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7927&quot;&gt;&lt;del&gt;LU-7927&lt;/del&gt;&lt;/a&gt; to our code, another deadlock was exposed.  It does not look like this was CAUSED by &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7927&quot; title=&quot;Deadlock between ll_setattr() and ll_file_write()-&amp;gt;ll_fsync()&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7927&quot;&gt;&lt;del&gt;LU-7927&lt;/del&gt;&lt;/a&gt;, it just seems the timing change caused by &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7927&quot; title=&quot;Deadlock between ll_setattr() and ll_file_write()-&amp;gt;ll_fsync()&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7927&quot;&gt;&lt;del&gt;LU-7927&lt;/del&gt;&lt;/a&gt; allowed this bug to be observed.  (Or possibly this code was deadlocking there first - It&apos;s hard to say precisely)&lt;/p&gt;

&lt;p&gt;The lli_trunc_sem is taken in &apos;read&apos; mode in both ll_page_mkwrite and vvp_io_fault_start.  This can lead to a deadlock with another thread which asks for the semaphore in write mode before that time.&lt;/p&gt;

&lt;p&gt;&amp;#8212;&lt;br/&gt;
The issue is a double down_read on lli_trunc_sem:&lt;/p&gt;

&lt;p&gt;PID: 35117  TASK: ffff8807c26e9680  CPU: 6   COMMAND: &quot;fsx-linux-aio&quot;&lt;br/&gt;
  #0 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff8807c29f7ac0&amp;#93;&lt;/span&gt; schedule at ffffffff8149cf35&lt;br/&gt;
  #1 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff8807c29f7b40&amp;#93;&lt;/span&gt; rwsem_down_read_failed at ffffffff8149ed25&lt;br/&gt;
  #2 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff8807c29f7b90&amp;#93;&lt;/span&gt; call_rwsem_down_read_failed at ffffffff81271f64&lt;br/&gt;
  #3 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff8807c29f7be8&amp;#93;&lt;/span&gt; vvp_io_fault_start at ffffffffa08f2526 &lt;span class=&quot;error&quot;&gt;&amp;#91;lustre&amp;#93;&lt;/span&gt;&lt;br/&gt;
  #4 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff8807c29f7c58&amp;#93;&lt;/span&gt; cl_io_start at ffffffffa0522115 &lt;span class=&quot;error&quot;&gt;&amp;#91;obdclass&amp;#93;&lt;/span&gt;&lt;br/&gt;
  #5 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff8807c29f7c80&amp;#93;&lt;/span&gt; cl_io_loop at ffffffffa0525705 &lt;span class=&quot;error&quot;&gt;&amp;#91;obdclass&amp;#93;&lt;/span&gt;&lt;br/&gt;
  #6 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff8807c29f7cb0&amp;#93;&lt;/span&gt; ll_page_mkwrite at ffffffffa08d2a2a &lt;span class=&quot;error&quot;&gt;&amp;#91;lustre&amp;#93;&lt;/span&gt;&lt;br/&gt;
  #7 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff8807c29f7d30&amp;#93;&lt;/span&gt; __do_fault at ffffffff81148c70&lt;br/&gt;
  #8 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff8807c29f7db8&amp;#93;&lt;/span&gt; handle_mm_fault at ffffffff8114c2cf&lt;br/&gt;
  #9 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff8807c29f7e40&amp;#93;&lt;/span&gt; __do_page_fault at ffffffff814a3420&lt;br/&gt;
#10 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff8807c29f7f40&amp;#93;&lt;/span&gt; do_page_fault at ffffffff814a37de&lt;br/&gt;
#11 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff8807c29f7f50&amp;#93;&lt;/span&gt; page_fault at ffffffff8149ff62&lt;br/&gt;
     RIP: 000000002002551b  RSP: 00007fffffff64c8  RFLAGS: 00010212&lt;/p&gt;


&lt;p&gt;Done in ll_page_mkwrite, then again in vvp_io_fault_start.&lt;/p&gt;

&lt;p&gt;This is a problem because a waiting writer takes priority over any&lt;br/&gt;
future readers.  Here&apos;s an example of one:&lt;br/&gt;
PID: 35131  TASK: ffff8807c4ecf1c0  CPU: 13  COMMAND: &quot;fsx-linux-aio&quot;&lt;br/&gt;
  #0 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff8807c3555b58&amp;#93;&lt;/span&gt; schedule at ffffffff8149cf35&lt;br/&gt;
  #1 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff8807c3555bd8&amp;#93;&lt;/span&gt; rwsem_down_write_failed at ffffffff8149ef45&lt;br/&gt;
  #2 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff8807c3555c50&amp;#93;&lt;/span&gt; call_rwsem_down_write_failed at ffffffff81271f93&lt;br/&gt;
  #3 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff8807c3555ca0&amp;#93;&lt;/span&gt; vvp_io_setattr_start at ffffffffa08f0cea &lt;span class=&quot;error&quot;&gt;&amp;#91;lustre&amp;#93;&lt;/span&gt;&lt;br/&gt;
  #4 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff8807c3555ce0&amp;#93;&lt;/span&gt; cl_io_start at ffffffffa0522115 &lt;span class=&quot;error&quot;&gt;&amp;#91;obdclass&amp;#93;&lt;/span&gt;&lt;br/&gt;
  #5 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff8807c3555d08&amp;#93;&lt;/span&gt; cl_io_loop at ffffffffa0525705 &lt;span class=&quot;error&quot;&gt;&amp;#91;obdclass&amp;#93;&lt;/span&gt;&lt;br/&gt;
  #6 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff8807c3555d38&amp;#93;&lt;/span&gt; cl_setattr_ost at ffffffffa08eb250 &lt;span class=&quot;error&quot;&gt;&amp;#91;lustre&amp;#93;&lt;/span&gt;&lt;br/&gt;
  #7 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff8807c3555d80&amp;#93;&lt;/span&gt; ll_setattr_raw at ffffffffa08be009 &lt;span class=&quot;error&quot;&gt;&amp;#91;lustre&amp;#93;&lt;/span&gt;&lt;br/&gt;
  #8 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff8807c3555e68&amp;#93;&lt;/span&gt; ll_setattr at ffffffffa08be313 &lt;span class=&quot;error&quot;&gt;&amp;#91;lustre&amp;#93;&lt;/span&gt;&lt;br/&gt;
  #9 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff8807c3555e78&amp;#93;&lt;/span&gt; notify_change at ffffffff8119d401&lt;br/&gt;
#10 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff8807c3555eb8&amp;#93;&lt;/span&gt; do_truncate at ffffffff8118066d&lt;br/&gt;
#11 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff8807c3555f28&amp;#93;&lt;/span&gt; do_sys_ftruncate.constprop.20 at ffffffff811809bb&lt;br/&gt;
#12 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff8807c3555f70&amp;#93;&lt;/span&gt; sys_ftruncate at ffffffff81180a4e&lt;br/&gt;
#13 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff8807c3555f80&amp;#93;&lt;/span&gt; system_call_fastpath at ffffffff814a7db2&lt;br/&gt;
     RIP: 0000000020152867  RSP: 00007fffffff6678  RFLAGS: 00010246&lt;br/&gt;
     RAX: 000000000000004d  RBX: ffffffff814a7db2  RCX: 0000010000081000&lt;br/&gt;
     RDX: 0000000000000000  RSI: 0000000000000000  RDI: 0000000000000005&lt;br/&gt;
     RBP: 00007fffffff6670   R8: 0000000000000000   R9: 0000000000000000&lt;br/&gt;
     R10: 0000000000000000  R11: 0000000000000246  R12: ffffffff81180a4e&lt;br/&gt;
     R13: ffff8807c3555f78  R14: 0000000000000000  R15: 00000000201028b0&lt;br/&gt;
     ORIG_RAX: 000000000000004d  CS: 0033  SS: 002b&lt;/p&gt;

&lt;p&gt;Just to make clear, here&apos;s the sequence of events:&lt;br/&gt;
Thread 1 (pid 35117 above): down_read() &amp;lt;-- SUCCEEDS&lt;br/&gt;
Thread 2 (pid 35131 above): down_write() &amp;lt;-- FAILS, starts waiting&lt;br/&gt;
Thread 1: down_read() &lt;span class=&quot;error&quot;&gt;&amp;#91;again&amp;#93;&lt;/span&gt; &amp;lt;-- Fails, stuck behind thread 2 (which is&lt;br/&gt;
stuck behind thread 1)&lt;/p&gt;</description>
                <environment></environment>
        <key id="35798">LU-7981</key>
            <summary>double read of lli_trunc_sem in ll_page_mkwrite and vvp_io_fault_start leads to deadlock</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="bobijam">Zhenyu Xu</assignee>
                                    <reporter username="paf">Patrick Farrell</reporter>
                        <labels>
                    </labels>
                <created>Mon, 4 Apr 2016 18:13:28 +0000</created>
                <updated>Tue, 3 May 2016 17:57:55 +0000</updated>
                            <resolved>Tue, 3 May 2016 17:57:55 +0000</resolved>
                                                    <fixVersion>Lustre 2.9.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>3</watches>
                                                                            <comments>
                            <comment id="147751" author="gerrit" created="Mon, 4 Apr 2016 18:35:03 +0000"  >&lt;p&gt;Patrick Farrell (paf@cray.com) uploaded a new patch: &lt;a href=&quot;http://review.whamcloud.com/19315&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/19315&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7981&quot; title=&quot;double read of lli_trunc_sem in ll_page_mkwrite and vvp_io_fault_start leads to deadlock&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7981&quot;&gt;&lt;del&gt;LU-7981&lt;/del&gt;&lt;/a&gt; llite: fix double read of lli_trunc_sem&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 74e14456f64cbe4840da90f4713add01a8a461ed&lt;/p&gt;</comment>
                            <comment id="150780" author="gerrit" created="Mon, 2 May 2016 23:57:56 +0000"  >&lt;p&gt;Oleg Drokin (oleg.drokin@intel.com) merged in patch &lt;a href=&quot;http://review.whamcloud.com/19315/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/19315/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7981&quot; title=&quot;double read of lli_trunc_sem in ll_page_mkwrite and vvp_io_fault_start leads to deadlock&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7981&quot;&gt;&lt;del&gt;LU-7981&lt;/del&gt;&lt;/a&gt; llite: take trunc_sem only at vvp layer&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 8d795c416da2e10f56b12ead8fe1b8b2b15b7dc9&lt;/p&gt;</comment>
                            <comment id="150866" author="jgmitter" created="Tue, 3 May 2016 17:57:55 +0000"  >&lt;p&gt;Landed to master for 2.9.0&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzy6q7:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>