<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:16:25 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-1414] ll_delete_inode() LBUG</title>
                <link>https://jira.whamcloud.com/browse/LU-1414</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;We hit the following LBUG on the cient at our customer site.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;May 7 01:42:31 alps1-36 kernel: [5827658.348638] LustreError: 23918:0:(llite_lib.c:1382:ll_delete_inode())
ASSERTION(inode-&amp;gt;i_data.nrpages == 0) failed: inode=139398948/2615229393(ffff8818d28e6d50) nrpages=1199638,
see http://jira.whamcloud.com/browse/LU-118
May 7 01:42:31 alps1-36 kernel: [5827658.348688] LustreError: 23918:0:(llite_lib.c:1382:ll_delete_inode()) LBUG
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;As far as I can see &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-118&quot; title=&quot;clear_inode: BUG_ON(inode-&amp;gt;i_data.nrpages)&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-118&quot;&gt;&lt;del&gt;LU-118&lt;/del&gt;&lt;/a&gt;, this is fixed and the patches were landed in lustre-1.8.6, but we hit again.&lt;br/&gt;
&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-267&quot; title=&quot;Kernel race could cause LU-118, clear_inode: BUG_ON(inode-&amp;gt;i_data.nrpages)&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-267&quot;&gt;&lt;del&gt;LU-267&lt;/del&gt;&lt;/a&gt; seems to be related ticket of this issue. Is &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-118&quot; title=&quot;clear_inode: BUG_ON(inode-&amp;gt;i_data.nrpages)&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-118&quot;&gt;&lt;del&gt;LU-118&lt;/del&gt;&lt;/a&gt; workaround patch? do we need to fix the kernel to completely solve this issue?&lt;/p&gt;</description>
                <environment>SLES11SP1 - Client&lt;br/&gt;
CentOS5.6 - Servers</environment>
        <key id="14456">LU-1414</key>
            <summary>ll_delete_inode() LBUG</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="5">Cannot Reproduce</resolution>
                                        <assignee username="jay">Jinshan Xiong</assignee>
                                    <reporter username="ihara">Shuichi Ihara</reporter>
                        <labels>
                    </labels>
                <created>Wed, 16 May 2012 09:41:55 +0000</created>
                <updated>Mon, 29 May 2017 03:44:37 +0000</updated>
                            <resolved>Mon, 29 May 2017 03:44:37 +0000</resolved>
                                    <version>Lustre 1.8.6</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>7</watches>
                                                                            <comments>
                            <comment id="38910" author="pjones" created="Wed, 16 May 2012 09:52:00 +0000"  >&lt;p&gt;Jinshan&lt;/p&gt;

&lt;p&gt;Does this seem to be related to the issues previously handled under LU118?&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="38930" author="jay" created="Wed, 16 May 2012 13:06:29 +0000"  >&lt;p&gt;This definitely is not &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-118&quot; title=&quot;clear_inode: BUG_ON(inode-&amp;gt;i_data.nrpages)&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-118&quot;&gt;&lt;del&gt;LU-118&lt;/del&gt;&lt;/a&gt; where a race occurred so only 1 page is leaving after truncate. But here it had 1199638 pages.&lt;/p&gt;

&lt;p&gt;Can I have a backtrace and recent error messages in lustre log?&lt;/p&gt;</comment>
                            <comment id="39006" author="ihara" created="Thu, 17 May 2012 13:29:18 +0000"  >&lt;p&gt;client&apos;s /var/log/messages.&lt;/p&gt;

&lt;p&gt;will get /tmp/lustre-log.1336326151.23918&lt;/p&gt;</comment>
                            <comment id="47831" author="aboyko" created="Thu, 15 Nov 2012 02:57:56 +0000"  >&lt;p&gt;We saw the same issue. My point of view and analyze below.&lt;br/&gt;
1) LASSERT at ll_delete_inode, after truncate we have non zero -&amp;gt;nrpages for address_space.&lt;br/&gt;
2) File for this inode is already closed.&lt;br/&gt;
3) issue happened in the file per process access, so this is no race between the client threads. (form psfio.c)&lt;br/&gt;
4) nrpages have decreased between LASSERT and crash time(for example crash &#8470;3 have 381 and 0), so someone do truncate pages.&lt;br/&gt;
5) The page which exist at address space after truncate and LASSERT looks valid for lustre&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;kmem -g 200000000001058 
FLAGS: 200000000001058
  PAGE-FLAG       BIT  VALUE
  PG_referenced     3  0000008
  PG_uptodate       4  0000010
  PG_lru            6  0000040
  PG_private       12  0001000
  PG_slob_free     12  0001000

struct ll_async_page {
  llap_magic = 98764321, 
  llap_write_queued = 0, 
  llap_defer_uptodate = 0, 
  llap_origin = 3, 
  llap_ra_used = 0, 
  llap_ignore_quota = 0, 
  llap_reserved = 0, 
  llap_pglist_cpu = 0, 
  llap_cookie = 0xffff880106be60e8, 
  llap_page = 0xffffea0004f58908, 
  llap_pending_write = {
    next = 0x0, 
    prev = 0x0
  }, 
  llap_pglist_item = {
    next = 0xffff880106be7e28, 
    prev = 0xffff880106be6218
  }, 
  llap_checksum = 0, 
  llap_lockh_granted = {
    cookie = 0
  }
}
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;No dirty, no lock, no writeback.&lt;br/&gt;
lapp_origin = 3 LLAP_ORIGIN_COMMIT_WRITE - simple write.&lt;/p&gt;

&lt;p&gt;Root cause.&lt;/p&gt;

&lt;p&gt;The first thread with LASSERT is ldlm_bl_61 - trace&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;LustreError: 30477:0:(llite_lib.c:1384:ll_delete_inode()) ASSERTION(inode-&amp;gt;i_data.nrpages == 0) failed: inode=326383344/3842751679(ffff88011f44c210) nrpages=2583694, see http://jira.whamcloud.com/browse/LU-118
LustreError: 30477:0:(llite_lib.c:1384:ll_delete_inode()) LBUG
Pid: 30477, comm: ldlm_bl_61
Call Trace:
 [&amp;lt;ffffffff810072e9&amp;gt;] try_stack_unwind+0x149/0x190
 [&amp;lt;ffffffff81005ca0&amp;gt;] dump_trace+0x90/0x300
 [&amp;lt;ffffffffa01349b2&amp;gt;] libcfs_debug_dumpstack+0x52/0x80 [libcfs]
 [&amp;lt;ffffffffa0134f21&amp;gt;] lbug_with_loc+0x71/0xe0 [libcfs]
 [&amp;lt;ffffffffa04c32ef&amp;gt;] ll_delete_inode+0x14f/0x150 [lustre]
 [&amp;lt;ffffffff8111ac23&amp;gt;] generic_delete_inode+0xb3/0x180
 [&amp;lt;ffffffff8111ad4d&amp;gt;] generic_drop_inode+0x5d/0x80
 [&amp;lt;ffffffff811197cd&amp;gt;] iput+0x5d/0x70
 [&amp;lt;ffffffffa04e7d46&amp;gt;] ll_mdc_blocking_ast+0x1b6/0x5a0 [lustre]
 [&amp;lt;ffffffffa026758d&amp;gt;] ldlm_cancel_callback+0x5d/0xe0 [ptlrpc]
 [&amp;lt;ffffffffa027ef0c&amp;gt;] ldlm_cli_cancel_local+0x6c/0x340 [ptlrpc]
 [&amp;lt;ffffffffa0283b3e&amp;gt;] ldlm_cli_cancel+0x5e/0x3f0 [ptlrpc]
 [&amp;lt;ffffffffa04e7d87&amp;gt;] ll_mdc_blocking_ast+0x1f7/0x5a0 [lustre]
 [&amp;lt;ffffffffa028763c&amp;gt;] ldlm_handle_bl_callback+0xbc/0x270 [ptlrpc]
 [&amp;lt;ffffffffa0287981&amp;gt;] ldlm_bl_thread_main+0x191/0x3e0 [ptlrpc]
 [&amp;lt;ffffffff810035ba&amp;gt;] child_rip+0xa/0x20
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;the second is ldlm_bl_67 - trace&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;PID: 31380  TASK: ffff88069c4c90c0  CPU: 28  COMMAND: &quot;ldlm_bl_67&quot;
 #0 [ffff8807f5055ad0] schedule at ffffffff812bb7f5
 #1 [ffff8807f5055ba8] __cond_resched at ffffffff8103ea65
 #2 [ffff8807f5055bc8] __cond_resched_lock at ffffffff8103eab7
 #3 [ffff8807f5055be8] cache_remove_lock at ffffffffa03c9eae [osc]
 #4 [ffff8807f5055c48] osc_extent_blocking_cb at ffffffffa03b3606 [osc]
 #5 [ffff8807f5055cb8] ldlm_cancel_callback at ffffffffa026758d [ptlrpc]
 #6 [ffff8807f5055cd8] ldlm_cli_cancel_local at ffffffffa027ef0c [ptlrpc]
 #7 [ffff8807f5055d38] ldlm_cli_cancel at ffffffffa0283b3e [ptlrpc]
 #8 [ffff8807f5055db8] osc_extent_blocking_cb at ffffffffa03b34cf [osc]
 #9 [ffff8807f5055e28] ldlm_handle_bl_callback at ffffffffa028763c [ptlrpc]
#10 [ffff8807f5055e78] ldlm_bl_thread_main at ffffffffa0287981 [ptlrpc]
#11 [ffff8807f5055f48] kernel_thread at ffffffff810035ba
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;From the stack of ldlm_bl_67 I got ldlm_lock pointer&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;bt -f
....
#1 [ffff8807f5055ba8] __cond_resched at ffffffff8103ea65
    ffff8807f5055bb0: ffffffffffffff10 ffff88010456b568    &amp;lt;-- this is spinlock_t l_extents_list_lock;
    ffff8807f5055bc0: ffff8807f5055be0 ffffffff8103eab7 
 #2 [ffff8807f5055bc8] __cond_resched_lock at ffffffff8103eab7
    ffff8807f5055bd0: ffff88061718e640 ffff88061718e640 
    ffff8807f5055be0: ffff8807f5055c40 ffffffffa03c9eae 
 #3 [ffff8807f5055be8] cache_remove_lock at ffffffffa03c9eae [osc]
    ffff8807f5055bf0: ffff880815d647c0 ffff88010456b400 
    ffff8807f5055c00: ffff880815d647d8 ffff88010456b570 
    ffff8807f5055c10: ffff88010456b568 ffff88011f44c210
.... 
crash&amp;gt; p/x (0xffff88010456b568 -360)
$18 = 0xffff88010456b400
crash&amp;gt; ldlm_lock 0xffff88010456b400
struct ldlm_lock {
  l_handle = {
    h_link = {
      next = 0xffffc90010719050, 
      prev = 0xffffc90010719050
    }, 
    h_cookie = 17652665684890613763, 
    h_addref = 0xffffffffa0262210 &amp;lt;lock_handle_addref&amp;gt;, 
    h_lock = {
      raw_lock = {
        slock = 1542
      }
    },  
...
  l_lvb_len = 40, 
  l_lvb_data = 0xffff88012d525e40, 
  l_lvb_swabber = 0xffffffffa02b1c20, 
  l_ast_data = 0xffff88011f44c210,    &amp;lt;--- this is inode
  l_extents_list_lock = {
    raw_lock = {
      slock = 6425
    }
  },  
  .....
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;So we can see inode from ldlm_lock is the same as LASSERT inode ffff88011f44c210.&lt;br/&gt;
Now, how race happend:&lt;br/&gt;
1) ll_mdc_blocking_ast()&amp;gt;iput()&amp;gt;...&amp;gt;ll_delete_inode()&amp;gt;truncate_inode_pages()&lt;br/&gt;
2) osc_extent_blocking_cb()&amp;gt;...&amp;gt;ll_page_removal_cb()&amp;gt;ll_truncate_complete_page()&amp;gt;truncate_complete_page()&lt;/p&gt;</comment>
                            <comment id="197371" author="adilger" created="Mon, 29 May 2017 03:44:37 +0000"  >&lt;p&gt;Close old ticket.&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="11393" name="alps1-36.log-1" size="18252" author="ihara" created="Thu, 17 May 2012 13:29:18 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10490" key="com.atlassian.jira.plugin.system.customfieldtypes:datepicker">
                        <customfieldname>End date</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Thu, 15 Nov 2012 09:41:55 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzw14f:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>10263</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10493" key="com.atlassian.jira.plugin.system.customfieldtypes:datepicker">
                        <customfieldname>Start date</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Wed, 16 May 2012 09:41:55 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    </customfields>
    </item>
</channel>
</rss>