<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:16:49 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-8355] VFS: Busy inodes after unmount of md0 ... causes kernel panic or at least memory leak</title>
                <link>https://jira.whamcloud.com/browse/LU-8355</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;If we try to start mdt when using read only device or device with disabled journal we will have following message:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;VFS: Busy inodes after unmount of md0. Self-destruct in 5 seconds.  Have a nice day... &lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;In such case one of the inode&apos;s counter has extra increment. Further it may cause kernel panic or at least memory leak.&lt;br/&gt;
According to my investigation this inode is s_buddy_cache (created in ldiskfs_mb_init_backend).&lt;br/&gt;
I used kprobe(added handler_pre to __iget) and found that extra increment comes from fsnotify_unmount_inodes:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;kprobe handler_pre __iget: inode ffff880079a09528 i_ino 21328 i_count 2 i_state 8
Pid: 12211, comm: mount.lustre Tainted: G        W  ---------------    2.6.32-431.17.1.x1.6.39.x86_64 #1
Call Trace:
 &amp;lt;#DB&amp;gt;  [&amp;lt;ffffffffa0552071&amp;gt;] ? handler_pre+0x41/0x44 [kprobe_example]
 [&amp;lt;ffffffff8152b455&amp;gt;] ? kprobe_exceptions_notify+0x3d5/0x430
 [&amp;lt;ffffffff8152b6c5&amp;gt;] ? notifier_call_chain+0x55/0x80
 [&amp;lt;ffffffff8152b72a&amp;gt;] ? atomic_notifier_call_chain+0x1a/0x20
 [&amp;lt;ffffffff810a12be&amp;gt;] ? notify_die+0x2e/0x30
 [&amp;lt;ffffffff81528ff5&amp;gt;] ? do_int3+0x35/0xb0
 [&amp;lt;ffffffff815288c3&amp;gt;] ? int3+0x33/0x40
 [&amp;lt;ffffffff811a4f01&amp;gt;] ? __iget+0x1/0x70
 &amp;lt;&amp;lt;EOE&amp;gt;&amp;gt;  [&amp;lt;ffffffff811ccc7f&amp;gt;] ? fsnotify_unmount_inodes+0x10f/0x120
 [&amp;lt;ffffffff811a620b&amp;gt;] ? invalidate_inodes+0x5b/0x190
 [&amp;lt;ffffffff811c5a14&amp;gt;] ? __sync_blockdev+0x24/0x50
 [&amp;lt;ffffffff8118b34c&amp;gt;] ? generic_shutdown_super+0x4c/0xe0
 [&amp;lt;ffffffff8118b411&amp;gt;] ? kill_block_super+0x31/0x50
 [&amp;lt;ffffffffa058f686&amp;gt;] ? ldiskfs_kill_block_super+0x16/0x60 [ldiskfs]
 [&amp;lt;ffffffff8118bbe7&amp;gt;] ? deactivate_super+0x57/0x80
 [&amp;lt;ffffffff811aabef&amp;gt;] ? mntput_no_expire+0xbf/0x110
 [&amp;lt;ffffffffa09f6515&amp;gt;] ? osd_mount+0x6f5/0xcb0 [osd_ldiskfs]
 [&amp;lt;ffffffffa09f93ff&amp;gt;] ? osd_device_alloc+0x4cf/0x970 [osd_ldiskfs]
 [&amp;lt;ffffffffa11fb33f&amp;gt;] ? obd_setup+0x1bf/0x290 [obdclass]
 [&amp;lt;ffffffffa11fb618&amp;gt;] ? class_setup+0x208/0x870 [obdclass]
 [&amp;lt;ffffffffa1203edc&amp;gt;] ? class_process_config+0xc6c/0x1ad0 [obdclass]
 [&amp;lt;ffffffffa10d54d8&amp;gt;] ? libcfs_log_return+0x28/0x40 [libcfs]
 [&amp;lt;ffffffffa1208dc2&amp;gt;] ? lustre_cfg_new+0x312/0x690 [obdclass]
 [&amp;lt;ffffffffa1209298&amp;gt;] ? do_lcfg+0x158/0x440 [obdclass]
 [&amp;lt;ffffffffa1209614&amp;gt;] ? lustre_start_simple+0x94/0x200 [obdclass]
 [&amp;lt;ffffffffa124503d&amp;gt;] ? server_fill_super+0x97d/0x1a7c [obdclass]
 [&amp;lt;ffffffffa10d54d8&amp;gt;] ? libcfs_log_return+0x28/0x40 [libcfs]
 [&amp;lt;ffffffffa120f0b0&amp;gt;] ? lustre_fill_super+0x1d0/0x5b0 [obdclass]
 [&amp;lt;ffffffffa120eee0&amp;gt;] ? lustre_fill_super+0x0/0x5b0 [obdclass]
 [&amp;lt;ffffffff8118c2af&amp;gt;] ? get_sb_nodev+0x5f/0xa0
 [&amp;lt;ffffffffa1206dd5&amp;gt;] ? lustre_get_sb+0x25/0x30 [obdclass]
 [&amp;lt;ffffffff8118b90b&amp;gt;] ? vfs_kern_mount+0x7b/0x1b0
 [&amp;lt;ffffffff8118bab2&amp;gt;] ? do_kern_mount+0x52/0x130
 [&amp;lt;ffffffff811aca8b&amp;gt;] ? do_mount+0x2fb/0x930
 [&amp;lt;ffffffff811ad150&amp;gt;] ? sys_mount+0x90/0xe0
 [&amp;lt;ffffffff8100b072&amp;gt;] ? system_call_fastpath+0x16/0x1b&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Reading /proc/slabinfo after unloading ldiskfs may cause kernel panic because ldiskfs_inode_cache was not cleaned correctly(it still has busy inodes).&lt;br/&gt;
fsnotify_unmount_inodes and inotify_unmount_inodes have bugs in handling of inodes with I_NEW.&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;               /* In case the dropping of a reference would nuke next_i. */                                           
                if ((&amp;amp;next_i-&amp;gt;i_sb_list != list) &amp;amp;&amp;amp;
                    atomic_read(&amp;amp;next_i-&amp;gt;i_count) &amp;amp;&amp;amp;
                    !(next_i-&amp;gt;i_state &amp;amp; (I_CLEAR | I_FREEING | I_WILL_FREE))) {                                
                        __iget(next_i);
                        need_iput = next_i;                                                                            
                }&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;We do __iget for next_i but on the next loop iteration we pass iput because inode has I_NEW state:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;               /*
                 * We cannot __iget() an inode in state I_CLEAR, I_FREEING,
                 * I_WILL_FREE, or I_NEW which is fine because by that point                                           
                 * the inode cannot have any associated watches.                                                       
                 */
                if (inode-&amp;gt;i_state &amp;amp; (I_CLEAR|I_FREEING|I_WILL_FREE|I_NEW))                                            
                        continue;       &lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The problem faced on 2.6.32-431.17.1. But as I see another kernels also have this bug.&lt;/p&gt;</description>
                <environment></environment>
        <key id="37918">LU-8355</key>
            <summary>VFS: Busy inodes after unmount of md0 ... causes kernel panic or at least memory leak</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="scherementsev">Sergey Cheremencev</reporter>
                        <labels>
                    </labels>
                <created>Thu, 30 Jun 2016 12:12:21 +0000</created>
                <updated>Fri, 1 Jul 2016 17:38:03 +0000</updated>
                                                                                <due></due>
                            <votes>0</votes>
                                    <watches>3</watches>
                                                                            <comments>
                            <comment id="157378" author="sergey" created="Thu, 30 Jun 2016 12:14:01 +0000"  >&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;delete extra iget in fsnotify_unmount_inodes 

Don&apos;t increment i_next counter in fsnotify_unmount_inodes 
if inode has I_NEW state. Next loop iteration is skipped 
because of I_NEW and iput(need_iput_tmp) is not called. 
But if I_NEW inode is last in s_inodes list extra increment 
will never be decremented. 
The problem occured when mount lustre FS(server-side) 
on RO device(in such case mount should fail). 
Mount fails after creating 4 inodes. One of them is 
s_buddy_cache that has I_NEW state. This inode is not 
freed after generic_shutdown_super and it causes msg: 
VFS: Busy inodes after unmount of md0. Self-dectruct ... 
In such case ldiskfs_inode_cache can not be cleared and 
&quot;cat /proc/slabinfo&quot; after ldiskfs module unloading may 
cause kernel panic.

--- fs/notify/inotify/inotify.c	2013-07-30 04:16:44.000000000 +0400
+++ inotify.c	2014-12-16 17:36:23.000000000 +0400
@@ -404,7 +404,7 @@
 		if ((&amp;amp;next_i-&amp;gt;i_sb_list != list) &amp;amp;&amp;amp;
 				atomic_read(&amp;amp;next_i-&amp;gt;i_count) &amp;amp;&amp;amp;
 				!(next_i-&amp;gt;i_state &amp;amp; (I_CLEAR | I_FREEING |
-					I_WILL_FREE))) {
+					I_WILL_FREE | I_NEW))) {
 			__iget(next_i);
 			need_iput = next_i;
 		}
--- fs/notify/inode_mark.c	2013-07-30 04:16:42.000000000 +0400
+++ inode_mark.c	2014-12-16 17:36:40.000000000 +0400
@@ -398,7 +398,7 @@
 		/* In case the dropping of a reference would nuke next_i. */
 		if ((&amp;amp;next_i-&amp;gt;i_sb_list != list) &amp;amp;&amp;amp;
 		    atomic_read(&amp;amp;next_i-&amp;gt;i_count) &amp;amp;&amp;amp;
-		    !(next_i-&amp;gt;i_state &amp;amp; (I_CLEAR | I_FREEING | I_WILL_FREE))) {
+		    !(next_i-&amp;gt;i_state &amp;amp; (I_CLEAR | I_FREEING | I_WILL_FREE | I_NEW))) {
 			__iget(next_i);
 			need_iput = next_i;
 		}
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
</comment>
                            <comment id="157560" author="cheneva1" created="Fri, 1 Jul 2016 17:38:03 +0000"  >&lt;p&gt;Per triage call, this seems to be a corner case; decrease from &quot;Major&quot; to &quot;Minor&quot; &lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzygan:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>