<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:45:10 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-4710] Deadlock on lli_trunc_sem in ll_setattr_raw()</title>
                <link>https://jira.whamcloud.com/browse/LU-4710</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Several application processes hang trying to get a write lock on ll_inode_info.lli_trunc_sem in ll_setattr_raw(). Looks like the processes are each deadlocked on themselves. The call to ll_file_io_generic() earlier in the call stack acquires a read lock on the same semaphore, which prevents the write lock from being granted in ll_setattr_raw().&lt;/p&gt;

&lt;p&gt;This bug was introduced by &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3321&quot; title=&quot;2.x single thread/process throughput degraded from 1.8&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3321&quot;&gt;&lt;del&gt;LU-3321&lt;/del&gt;&lt;/a&gt;, review.whamcloud.com/7893.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;&amp;gt; crash&amp;gt; bt
&amp;gt; PID: 10475  TASK: ffff880837ae67f0  CPU: 0   COMMAND: &quot;nsystst&quot;
&amp;gt;  #0 [ffff88083cf05698] schedule at ffffffff8144947f
&amp;gt;  #1 [ffff88083cf057f0] rwsem_down_failed_common at ffffffff8144b6d5
&amp;gt;  #2 [ffff88083cf05860] rwsem_down_write_failed at ffffffff8144b783
&amp;gt;  #3 [ffff88083cf05870] call_rwsem_down_write_failed at ffffffff81219c43
&amp;gt;  #4 [ffff88083cf058d0] ll_setattr_raw at ffffffffa07ed590 [lustre]
&amp;gt;  #5 [ffff88083cf059b0] ll_setattr at ffffffffa07ee557 [lustre]
&amp;gt;  #6 [ffff88083cf059c0] notify_change at ffffffff8116e1f0
&amp;gt;  #7 [ffff88083cf05a30] file_remove_suid at ffffffff810fa3e1
&amp;gt;  #8 [ffff88083cf05ab0] __generic_file_aio_write at ffffffff810fcd29
&amp;gt;  #9 [ffff88083cf05b60] generic_file_aio_write at ffffffff810fcfc9
&amp;gt; #10 [ffff88083cf05ba0] vvp_io_write_start at ffffffffa0825cb0 [lustre]
&amp;gt; #11 [ffff88083cf05c00] cl_io_start at ffffffffa0365682 [obdclass]
&amp;gt; #12 [ffff88083cf05c30] cl_io_loop at ffffffffa0369204 [obdclass]
&amp;gt; #13 [ffff88083cf05c60] ll_file_io_generic at ffffffffa07c3062 [lustre]
&amp;gt; #14 [ffff88083cf05ce0] ll_file_aio_write at ffffffffa07c355e [lustre]
&amp;gt; #15 [ffff88083cf05d30] do_sync_readv_writev at ffffffff811539cb
&amp;gt; #16 [ffff88083cf05e40] do_readv_writev at ffffffff811548d4
&amp;gt; #17 [ffff88083cf05f30] vfs_writev at ffffffff81154a28
&amp;gt; #18 [ffff88083cf05f40] sys_writev at ffffffff81154b65
&amp;gt; #19 [ffff88083cf05f80] system_call_fastpath at ffffffff8145376b

&amp;gt; crash&amp;gt; files | egrep &quot;PID|husk1&quot;
&amp;gt; PID: 10475  TASK: ffff880837ae67f0  CPU: 0   COMMAND: &quot;nsystst&quot;
&amp;gt;   3 ffff880835e43bc0 ffff8808000206c0 ffff880837e05178 REG  /dsl/lus/husk1/ostest.vers/CL_nsystst03.2672/nsys_base.2

lli_trunc_sem info:

&amp;gt; crash&amp;gt; eval 0xffff880837e05178 - 248 | grep hex
&amp;gt; hexadecimal: ffff880837e05080  
&amp;gt; crash&amp;gt; ll_inode_info ffff880837e05080 | grep -A 15 trunc_sem
&amp;gt;       f_trunc_sem = {
&amp;gt;         count = -4294967295, = 0xffffffff00000001
&amp;gt;         wait_lock = {
&amp;gt;           {
&amp;gt;             rlock = {
&amp;gt;               raw_lock = {
&amp;gt;                 slock = 2313
&amp;gt;               }
&amp;gt;             }
&amp;gt;           }
&amp;gt;         }, 
&amp;gt;         wait_list = {
&amp;gt;           next = 0xffff88083cf057f8, 
&amp;gt;           prev = 0xffff88083cf057f8
&amp;gt;         }
&amp;gt;       }, 
&amp;gt; crash&amp;gt; semaphore_waiter 0xffff88083cf057f8
&amp;gt; struct semaphore_waiter {
&amp;gt;   list = {
&amp;gt;     next = 0xffff880837e05440, 
&amp;gt;     prev = 0xffff880837e05440
&amp;gt;   }, 
&amp;gt;   task = 0xffff880837ae67f0, 
&amp;gt;   up = 2
&amp;gt; }
&amp;gt; crash&amp;gt; ps | grep ffff880837ae67f0
&amp;gt;   10475      1    0 ffff880837ae67f0  UN   0.0  131484   5112  nsystst
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3321&quot; title=&quot;2.x single thread/process throughput degraded from 1.8&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3321&quot;&gt;&lt;del&gt;LU-3321&lt;/del&gt;&lt;/a&gt;/7893 changed the logic in ll_file_io_generic to always acquire the lli_trunc_sem semaphore in the IO_NORMAL case. Formerly, the semaphore was only acquired in the read path, when ll_setattr would not be called.&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;From lustre/llite/file.c:ll_file_io_generic:
&amp;gt;                 case IO_NORMAL:
&amp;gt;                         cio-&amp;gt;cui_iov = args-&amp;gt;u.normal.via_iov;
&amp;gt;                         cio-&amp;gt;cui_nrsegs = args-&amp;gt;u.normal.via_nrsegs;
&amp;gt;                         cio-&amp;gt;cui_tot_nrsegs = cio-&amp;gt;cui_nrsegs;
&amp;gt;                         cio-&amp;gt;cui_iocb = args-&amp;gt;u.normal.via_iocb;
&amp;gt;                          if ((iot == CIT_WRITE) &amp;amp;&amp;amp;
&amp;gt;                              !(cio-&amp;gt;cui_fd-&amp;gt;fd_flags &amp;amp; LL_FILE_GROUP_LOCKED)) {
&amp;gt;                                 if (mutex_lock_interruptible(&amp;amp;lli-&amp;gt;
&amp;gt; -                                                               lli_write_mutex))
&amp;gt; -                                        GOTO(out, result = -ERESTARTSYS);
&amp;gt; -                                write_mutex_locked = 1;
&amp;gt; -                        } else if (iot == CIT_READ) {
&amp;gt; -                               down_read(&amp;amp;lli-&amp;gt;lli_trunc_sem);
&amp;gt; -                        }
&amp;gt; +                                                       lli_write_mutex))
&amp;gt; +                                       GOTO(out, result = -ERESTARTSYS);
&amp;gt; +                               write_mutex_locked = 1;
&amp;gt; +                       }
&amp;gt; +                       down_read(&amp;amp;lli-&amp;gt;lli_trunc_sem);
&amp;gt;                          break;
&amp;gt;                  case IO_SENDFILE:
&amp;gt;                          vio-&amp;gt;u.sendfile.cui_actor = args-&amp;gt;u.sendfile.via_actor;
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
</description>
                <environment>Bug occurred during IOStress testing using code from master on SLES11 SP3. I assume the bug is in 2.6 because &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3321&quot; title=&quot;2.x single thread/process throughput degraded from 1.8&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3321&quot;&gt;&lt;strike&gt;LU-3321&lt;/strike&gt;&lt;/a&gt; landed to that version.</environment>
        <key id="23458">LU-4710</key>
            <summary>Deadlock on lli_trunc_sem in ll_setattr_raw()</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="3">Duplicate</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="amk">Ann Koehler</reporter>
                        <labels>
                    </labels>
                <created>Tue, 4 Mar 2014 19:53:00 +0000</created>
                <updated>Wed, 5 Mar 2014 01:36:01 +0000</updated>
                            <resolved>Wed, 5 Mar 2014 01:36:01 +0000</resolved>
                                    <version>Lustre 2.6.0</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>2</watches>
                                                                            <comments>
                            <comment id="78383" author="amk" created="Tue, 4 Mar 2014 20:09:29 +0000"  >&lt;p&gt;Dump uploaded to ftp.whamcloud.com:/uploads/&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4710&quot; title=&quot;Deadlock on lli_trunc_sem in ll_setattr_raw()&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4710&quot;&gt;&lt;del&gt;LU-4710&lt;/del&gt;&lt;/a&gt;/&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4710&quot; title=&quot;Deadlock on lli_trunc_sem in ll_setattr_raw()&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4710&quot;&gt;&lt;del&gt;LU-4710&lt;/del&gt;&lt;/a&gt;_lli_trunc_sem_hang.tgz&lt;br/&gt;
I used the dump from node c0-0cs8n0 for my analysis.&lt;/p&gt;</comment>
                            <comment id="78430" author="bobijam" created="Wed, 5 Mar 2014 01:36:01 +0000"  >&lt;p&gt;dup of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4627&quot; title=&quot;Client deadlock on ll_setattr_raw&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4627&quot;&gt;&lt;del&gt;LU-4627&lt;/del&gt;&lt;/a&gt;&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                            <outwardlinks description="duplicates">
                                        <issuelink>
            <issuekey id="23142">LU-4627</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzwgqf:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>12947</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>