<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:03:08 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-39] ASSERTION(atomic_read(&amp;client_stat-&gt;nid_exp_ref_count) == 0) failed: count 1</title>
                <link>https://jira.whamcloud.com/browse/LU-39</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;A sysadmin was shutting down an MDS node cleanly in preparation for scheduled upgrades.  During the umount of the MGS device, we hit the following assertion:&lt;/p&gt;

&lt;p&gt;LustreError ... (lprocfs_status.c:1060:lprocfs_free_client_stats())&lt;br/&gt;
ASSERTION(atomic_read(&amp;amp;client_stat-&amp;gt;nid_exp_ref_count) == 0) failed: count 1&lt;/p&gt;

&lt;p&gt;And the stack trace was:&lt;/p&gt;

&lt;p&gt;:obdclass:lprocfs_free_client_stats&lt;br/&gt;
:obdclass:lprocfs_free_per_client_stats&lt;br/&gt;
:mgs:lproc_mgs_cleanup&lt;br/&gt;
:mgs:mgs_cleanup&lt;br/&gt;
:obdclass:class_decref&lt;br/&gt;
:obdclass:class_export_destroy&lt;br/&gt;
:obdclass:obd_zombie_impexp_cull&lt;br/&gt;
:obdclass:obd_zombie_impexp_thread&lt;/p&gt;

&lt;p&gt;We have seen this same assertion from OSTs as well.  Some investigation was done in &lt;a href=&quot;https://bugzilla.lustre.org/show_bug.cgi?id=23499&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;bug 23499&lt;/a&gt;, but there is not yet a solution.&lt;/p&gt;</description>
                <environment>lustre 1.8.3.0-6chaos</environment>
        <key id="10165">LU-39</key>
            <summary>ASSERTION(atomic_read(&amp;client_stat-&gt;nid_exp_ref_count) == 0) failed: count 1</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="bobijam">Zhenyu Xu</assignee>
                                    <reporter username="morrone">Christopher Morrone</reporter>
                        <labels>
                    </labels>
                <created>Tue, 11 Jan 2011 11:24:42 +0000</created>
                <updated>Fri, 25 May 2012 02:42:37 +0000</updated>
                            <resolved>Wed, 23 Mar 2011 00:42:48 +0000</resolved>
                                    <version>Lustre 1.8.6</version>
                                    <fixVersion>Lustre 2.1.0</fixVersion>
                    <fixVersion>Lustre 1.8.6</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>4</watches>
                                                                            <comments>
                            <comment id="10418" author="dferber" created="Tue, 11 Jan 2011 15:27:01 +0000"  >&lt;p&gt;Bobijam is taking this bug. &lt;/p&gt;</comment>
                            <comment id="10425" author="bobijam" created="Thu, 13 Jan 2011 08:58:25 +0000"  >&lt;p&gt;There is a race window between server obd cleanup and its handling&lt;br/&gt;
of client connection.&lt;/p&gt;

&lt;p&gt;In class_cleanup() before the server obd&apos;s stopping flag is set, a new&lt;br/&gt;
client connection request can be handled in target_handle_connect();&lt;br/&gt;
after the target obd (server obd) is found, and before its refcount is&lt;br/&gt;
increased, class_cleanup() can be processed and run through&lt;br/&gt;
class_decref(); it find that the server obd&apos;s refcount becomes to 0;&lt;br/&gt;
at this point, target_handle_connect() can go on and add target obd&lt;br/&gt;
(server obd) refcount and create the client export.&lt;/p&gt;

&lt;p&gt;So the race makes a supposed 0 referenced server obd has export on it,&lt;br/&gt;
thus the server obd&apos;s obd_nid_stats contains non 0 referenced client&lt;br/&gt;
nid stat.&lt;/p&gt;</comment>
                            <comment id="10426" author="bobijam" created="Thu, 13 Jan 2011 09:00:00 +0000"  >&lt;p&gt;patch is tracked at &lt;a href=&quot;http://review.whamcloud.com/211&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/211&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="10455" author="bobijam" created="Fri, 21 Jan 2011 00:11:17 +0000"  >&lt;p&gt;As Niu mentioned in Gerrit review &quot;The nid_stats_hash will be destroyed before the obd refcount drops to zero (see class_cleanup()), so the target_handle_reconnect() will not able to add any elements into the nid_stats_hash. I&apos;m afraid this crash is caused by other defects. Maybe in the lprocfs_exp_setup(), it handles the (old_stat != new_stat) case incorrectly?&quot;&lt;/p&gt;

&lt;p&gt;Yes, the patch I posted will solve another bug issue, which will not go this further to this assertion failure.&lt;/p&gt;

&lt;p&gt;I checked lprocfs_exp_setup() code, it handles the (old_stat != new_stat) case correctly, there should be other race place bring this extra nid_exp_ref_count at cleanup phase (in obd cleanup phase, obd_nid_stats_hash was destroyed in class_cleanup(), and mgs_cleanup() will not called until the obd&apos;s refcount becomes to 0 which means all client exports are dereferenced - class_export_destroy() decreases obd&apos;s refcount).&lt;/p&gt;</comment>
                            <comment id="10474" author="bobijam" created="Mon, 24 Jan 2011 06:41:34 +0000"  >&lt;p&gt;Christopher,&lt;/p&gt;

&lt;p&gt;Do some of the clients use multip NIDs to connect servers?&lt;/p&gt;

&lt;p&gt;If a client is configed with multiple NIDs, when the connected NID encounters problem, it will reconnect server with another new NID, the lprocfs_exp_setup() misses releasing the old NID&apos;s stats refcount.&lt;/p&gt;</comment>
                            <comment id="10477" author="marc@llnl.gov" created="Mon, 24 Jan 2011 14:06:03 +0000"  >&lt;p&gt;The OSS nodes and the MDS have multiple NIDs, and the clients only have a single NID that they use to talk to the servers.  Could this be caused by the MDS losing connection with the OSS nodes, and reconnecting using a different NID? The MDS would be similar to a client in this case.&lt;/p&gt;</comment>
                            <comment id="10478" author="bobijam" created="Mon, 24 Jan 2011 17:30:58 +0000"  >&lt;p&gt;yes, that could be the case.&lt;/p&gt;</comment>
                            <comment id="10491" author="liang" created="Tue, 25 Jan 2011 17:50:46 +0000"  >&lt;p&gt;bobi, I think we might want this patch for 2.x as well, I just made a quick look and found it&apos;s also in 2.x.&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;</comment>
                            <comment id="10492" author="bobijam" created="Tue, 25 Jan 2011 19:11:03 +0000"  >&lt;p&gt;HEAD version patch.&lt;/p&gt;</comment>
                            <comment id="11275" author="hudson" created="Mon, 21 Mar 2011 20:13:48 +0000"  >&lt;p&gt;Integrated in &lt;span class=&quot;image-wrap&quot; style=&quot;&quot;&gt;&lt;img src=&quot;http://build.whamcloud.com/images/16x16/blue.png&quot; style=&quot;border: 0px solid black&quot; /&gt;&lt;/span&gt; &lt;a href=&quot;http://build.whamcloud.com/job/reviews-centos5/529/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;reviews-centos5 #529&lt;/a&gt;&lt;br/&gt;
     &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-39&quot; title=&quot;ASSERTION(atomic_read(&amp;amp;client_stat-&amp;gt;nid_exp_ref_count) == 0) failed: count 1&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-39&quot;&gt;&lt;del&gt;LU-39&lt;/del&gt;&lt;/a&gt; ASSERTION(atomic_read(&amp;amp;client_stat-&amp;gt;nid_exp_ref_count) == 0)&lt;/p&gt;

&lt;p&gt;Bobi Jam : &lt;a href=&quot;http://git.whamcloud.com/gitweb/?p=fs/lustre-release.git&amp;amp;a=commit&amp;amp;h=8efdff9aeb1933f9b1e7320ff48ad84983e4daa3&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;8efdff9aeb1933f9b1e7320ff48ad84983e4daa3&lt;/a&gt;&lt;br/&gt;
Files : &lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;lustre/mgs/mgs_fs.c&lt;/li&gt;
	&lt;li&gt;lustre/mdt/mdt_fs.c&lt;/li&gt;
	&lt;li&gt;lustre/mdt/mdt_internal.h&lt;/li&gt;
	&lt;li&gt;lustre/include/lprocfs_status.h&lt;/li&gt;
	&lt;li&gt;lustre/mgs/mgs_handler.c&lt;/li&gt;
	&lt;li&gt;lustre/obdfilter/filter.c&lt;/li&gt;
	&lt;li&gt;lustre/obdclass/lprocfs_status.c&lt;/li&gt;
	&lt;li&gt;lustre/mdt/mdt_handler.c&lt;/li&gt;
	&lt;li&gt;lustre/mgs/mgs_internal.h&lt;/li&gt;
&lt;/ul&gt;
</comment>
                            <comment id="11292" author="hudson" created="Tue, 22 Mar 2011 21:46:50 +0000"  >&lt;p&gt;Integrated in &lt;span class=&quot;image-wrap&quot; style=&quot;&quot;&gt;&lt;img src=&quot;http://build.whamcloud.com/images/16x16/red.png&quot; style=&quot;border: 0px solid black&quot; /&gt;&lt;/span&gt; &lt;a href=&quot;http://build.whamcloud.com/job/lustre-master-centos5/159/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;lustre-master-centos5 #159&lt;/a&gt;&lt;br/&gt;
     &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-39&quot; title=&quot;ASSERTION(atomic_read(&amp;amp;client_stat-&amp;gt;nid_exp_ref_count) == 0) failed: count 1&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-39&quot;&gt;&lt;del&gt;LU-39&lt;/del&gt;&lt;/a&gt; ASSERTION(atomic_read(&amp;amp;client_stat-&amp;gt;nid_exp_ref_count) == 0)&lt;/p&gt;

&lt;p&gt;Oleg Drokin : &lt;a href=&quot;http://git.whamcloud.com/gitweb/?p=fs/lustre-release.git&amp;amp;a=commit&amp;amp;h=2a6045403fbd46bb6501df907f0321f5401924ba&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;2a6045403fbd46bb6501df907f0321f5401924ba&lt;/a&gt;&lt;br/&gt;
Files : &lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;lustre/mdt/mdt_fs.c&lt;/li&gt;
	&lt;li&gt;lustre/mgs/mgs_internal.h&lt;/li&gt;
	&lt;li&gt;lustre/mdt/mdt_internal.h&lt;/li&gt;
	&lt;li&gt;lustre/include/lprocfs_status.h&lt;/li&gt;
	&lt;li&gt;lustre/mgs/mgs_fs.c&lt;/li&gt;
	&lt;li&gt;lustre/mgs/mgs_handler.c&lt;/li&gt;
	&lt;li&gt;lustre/obdfilter/filter.c&lt;/li&gt;
	&lt;li&gt;lustre/obdclass/lprocfs_status.c&lt;/li&gt;
	&lt;li&gt;lustre/mdt/mdt_handler.c&lt;/li&gt;
&lt;/ul&gt;
</comment>
                    </comments>
                    <attachments>
                            <attachment id="10100" name="lu-39v2-master.patch" size="10736" author="bobijam" created="Tue, 25 Jan 2011 19:11:03 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                    <customfield id="customfield_10020" key="com.atlassian.jira.plugin.system.customfieldtypes:float">
                        <customfieldname>Bugzilla ID</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>23499.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzv9mf:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>5093</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>