<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:26:30 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-2591] race between mount/umount and lov_notify</title>
                <link>https://jira.whamcloud.com/browse/LU-2591</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;I&apos;ve found a race between mount/umount and lov_notify().&lt;br/&gt;
The race is caused by the following scenario&lt;/p&gt;

&lt;p&gt;1) mount/umount: mount runs and fails to communicate with some OSTs. Then import objects are registered to a pinger list.&lt;br/&gt;
2) pinger      : pinger succeeds to communicate with one OST&lt;br/&gt;
3) ptrprc_rcv  : ptlrpc_rcv calls lov_set_osc_active() and holds lov_refcount. &lt;br/&gt;
4) mount/umount: For some reason, possibly mount fails or umount runs, ll_put_super() is called.&lt;br/&gt;
5) mount/umount: lov_disconnect() calls lov_del_target() and set all target-&amp;gt;ltd_reap to actually disconnect OSCs.&lt;br/&gt;
6) ptlrpc_rcv  : puts lov_refcount which ptlrpc_rcv has gotten at (3). If lov_refcount becomes 0 here, ptlrpc_rcv tries to disconnect all OSCs at __lov_del_target()&lt;br/&gt;
7) ptlrpc_rcv  : some OSC&apos;s imports have still been LUSTRE_IMP_CONNECTING state, so ptlrpc_rcv thread have to wait for these imports&apos; state to be changed to non-recovery state at ptlrpc_disconnect_import().&lt;/p&gt;

&lt;p&gt;Now that ptlrpc_rcv thread is waiting for the import state to be changed to non-recovery state but ptlrpc_rcv is the one who is supposed to change a recovery state to a non-recovery state. So ptlrpc_rcv must hung. And that&apos;s why this node is not able to use ptlrpc_rcv thread anymore.&lt;/p&gt;


&lt;p&gt;-------------&lt;br/&gt;
At first, I saw this problem in Lustre-1.8.8 and this scenario ended up only in mount/umount command hung.&lt;br/&gt;
But I&apos;ve seen a lot more kinds of problems in Lustre-2.3.x than in Lustre-1.8.8 by my reproducer. And one of them looks like the same as &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2070&quot; title=&quot;lu_object.c:1114:lu_device_fini()) ASSERTIO N( cfs_atomic_read(&amp;amp;d-&amp;gt;ld_ref) == 0 ) failed: Refcount is 6&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2070&quot;&gt;&lt;del&gt;LU-2070&lt;/del&gt;&lt;/a&gt;. So I assume that &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2070&quot; title=&quot;lu_object.c:1114:lu_device_fini()) ASSERTIO N( cfs_atomic_read(&amp;amp;d-&amp;gt;ld_ref) == 0 ) failed: Refcount is 6&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2070&quot;&gt;&lt;del&gt;LU-2070&lt;/del&gt;&lt;/a&gt; may be a duplicate of this case, not &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-1480&quot; title=&quot;failure on replay-single test_74: ASSERTION( cfs_atomic_read(&amp;amp;d-&amp;gt;ld_ref) == 0 ) failed: Refcount is 1&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-1480&quot;&gt;&lt;del&gt;LU-1480&lt;/del&gt;&lt;/a&gt;, although I&apos;m not so sure.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Jan  8 14:00:05 rx200-088 kernel: Lustre: Lustre: Build Version: 2.1.56-gf394dce-CHANGED-2.6.32-279.5.1.el6_lustre.gb16fe80.x86_64
Jan  8 14:00:06 rx200-088 kernel: LNet: Added LNI 192.168.128.88@o2ib [8/256/0/180]
Jan  8 14:00:06 rx200-088 kernel: Lustre: MGC192.168.128.86@o2ib: Reactivating import

----- This is because of my reproducer -----
Jan  8 14:00:06 rx200-088 kernel: LustreError: 2726:0:(llite_lib.c:562:client_common_fill_super()) lustre: can&apos;t make root dentry
--------------------------------------------

Jan  8 14:00:11 rx200-088 kernel: LustreError: 2773:0:(lov_obd.c:465:lov_set_osc_active()) ===== ACTIVE WAIT =====
Jan  8 14:00:16 rx200-088 kernel: LustreError: 2753:0:(lu_object.c:1114:lu_device_fini()) ASSERTION( cfs_atomic_read(&amp;amp;d-&amp;gt;ld_ref) == 0 ) failed: Refcount is 1
Jan  8 14:00:16 rx200-088 kernel: LustreError: 2773:0:(lov_obd.c:469:lov_set_osc_active()) ===== ACTIVE END =====
Jan  8 14:00:16 rx200-088 kernel: LustreError: 2773:0:(lov_obd.c:503:lov_notify()) event(2) of lustre-OST0001_UUID failed: -22
Jan  8 14:00:16 rx200-088 kernel: LustreError: 2753:0:(lu_object.c:1114:lu_device_fini()) LBUG
Jan  8 14:00:16 rx200-088 kernel: Pid: 2753, comm: obd_zombid
Jan  8 14:00:16 rx200-088 kernel:
Jan  8 14:00:16 rx200-088 kernel: Call Trace:
Jan  8 14:00:16 rx200-088 kernel: [&amp;lt;ffffffffa03b5905&amp;gt;] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
Jan  8 14:00:16 rx200-088 kernel: [&amp;lt;ffffffffa03b5f17&amp;gt;] lbug_with_loc+0x47/0xb0 [libcfs]
Jan  8 14:00:16 rx200-088 kernel: [&amp;lt;ffffffffa0533ecc&amp;gt;] lu_device_fini+0xcc/0xd0 [obdclass]
Jan  8 14:00:16 rx200-088 kernel: [&amp;lt;ffffffffa0948b9e&amp;gt;] osc_device_free+0x6e/0x220 [osc]
Jan  8 14:00:16 rx200-088 kernel: [&amp;lt;ffffffffa0511d8d&amp;gt;] class_decref+0x46d/0x590 [obdclass]
Jan  8 14:00:17 rx200-088 kernel: [&amp;lt;ffffffffa04e9c78&amp;gt;] ? class_import_destroy+0x208/0x450 [obdclass]
Jan  8 14:00:17 rx200-088 kernel: [&amp;lt;ffffffffa04ede29&amp;gt;] obd_zombie_impexp_cull+0x309/0x610 [obdclass]
Jan  8 14:00:17 rx200-088 kernel: [&amp;lt;ffffffffa04ee1f5&amp;gt;] obd_zombie_impexp_thread+0xc5/0x1c0 [obdclass]
Jan  8 14:00:17 rx200-088 kernel: [&amp;lt;ffffffff81060250&amp;gt;] ? default_wake_function+0x0/0x20
Jan  8 14:00:17 rx200-088 kernel: [&amp;lt;ffffffffa04ee130&amp;gt;] ? obd_zombie_impexp_thread+0x0/0x1c0 [obdclass]
Jan  8 14:00:17 rx200-088 kernel: [&amp;lt;ffffffff8100c14a&amp;gt;] child_rip+0xa/0x20
Jan  8 14:00:17 rx200-088 kernel: [&amp;lt;ffffffffa04ee130&amp;gt;] ? obd_zombie_impexp_thread+0x0/0x1c0 [obdclass]
Jan  8 14:00:17 rx200-088 kernel: [&amp;lt;ffffffffa04ee130&amp;gt;] ? obd_zombie_impexp_thread+0x0/0x1c0 [obdclass]
Jan  8 14:00:17 rx200-088 kernel: [&amp;lt;ffffffff8100c140&amp;gt;] ? child_rip+0x0/0x20
Jan  8 14:00:17 rx200-088 kernel:
Jan  8 14:00:17 rx200-088 kernel: Kernel panic - not syncing: LBUG
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;-----------------------------&lt;/p&gt;

&lt;p&gt;To fix the problems, I&apos;ve added a new rw_semaphore into struct lov_obd to exclude lov_notify() and lov_del_target(). I think this is one of the right ways to fix the problems because I&apos;ve never seen all of the problems since I applied the patch. But I&apos;m wondering whether or not I&apos;m allowed to add a new menber into a basic object.&lt;/p&gt;

&lt;p&gt;I&apos;ll upload the patch soon, so could you please review it?&lt;br/&gt;
Thank you&lt;/p&gt;</description>
                <environment></environment>
        <key id="17114">LU-2591</key>
            <summary>race between mount/umount and lov_notify</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="1" iconUrl="https://jira.whamcloud.com/images/icons/priorities/blocker.svg">Blocker</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="yujian">Jian Yu</assignee>
                                    <reporter username="nozaki">Hiroya Nozaki</reporter>
                        <labels>
                            <label>LB</label>
                            <label>patch</label>
                    </labels>
                <created>Wed, 9 Jan 2013 03:08:39 +0000</created>
                <updated>Tue, 5 Mar 2013 23:23:54 +0000</updated>
                            <resolved>Mon, 25 Feb 2013 03:22:46 +0000</resolved>
                                    <version>Lustre 2.4.0</version>
                    <version>Lustre 2.1.5</version>
                    <version>Lustre 1.8.8</version>
                                    <fixVersion>Lustre 2.4.0</fixVersion>
                    <fixVersion>Lustre 2.1.5</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>4</watches>
                                                                            <comments>
                            <comment id="50191" author="nozaki" created="Wed, 9 Jan 2013 03:34:48 +0000"  >&lt;p&gt;patch for the master branch&lt;br/&gt;
&lt;a href=&quot;http://review.whamcloud.com/4979&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/4979&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="52785" author="yujian" created="Thu, 21 Feb 2013 00:29:09 +0000"  >&lt;p&gt;Hi Hiroya,&lt;/p&gt;

&lt;p&gt;Are you going to port the patch to Lustre b2_1 and b1_8 branches?&lt;/p&gt;</comment>
                            <comment id="52786" author="nozaki" created="Thu, 21 Feb 2013 00:42:04 +0000"  >&lt;p&gt;Hi Jian.&lt;br/&gt;
OK, I&apos;ll make patches for b2_1 ande b1_8 too. I&apos;ll upload them in a week. Please wait for a while.&lt;/p&gt;</comment>
                            <comment id="52948" author="nozaki" created="Mon, 25 Feb 2013 01:38:05 +0000"  >&lt;p&gt;patch for b1_8&lt;br/&gt;
&lt;a href=&quot;http://review.whamcloud.com/#change,5522&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#change,5522&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="52950" author="pjones" created="Mon, 25 Feb 2013 03:22:46 +0000"  >&lt;p&gt;Landed for 2.4&lt;/p&gt;</comment>
                            <comment id="52951" author="nozaki" created="Mon, 25 Feb 2013 03:48:02 +0000"  >&lt;p&gt;Hi, Jian.&lt;br/&gt;
I cannot upload the patch for b2_1 because of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2448&quot; title=&quot;No prepare-commit-msg and commit-msg files on b2_1&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2448&quot;&gt;&lt;del&gt;LU-2448&lt;/del&gt;&lt;/a&gt;. Could you tell me a workaround if it exists ?&lt;/p&gt;</comment>
                            <comment id="52954" author="yujian" created="Mon, 25 Feb 2013 05:18:47 +0000"  >&lt;blockquote&gt;&lt;p&gt;I cannot upload the patch for b2_1 because of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2448&quot; title=&quot;No prepare-commit-msg and commit-msg files on b2_1&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2448&quot;&gt;&lt;del&gt;LU-2448&lt;/del&gt;&lt;/a&gt;. Could you tell me a workaround if it exists ?&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;Hi Hiroya,&lt;br/&gt;
The patch for &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2448&quot; title=&quot;No prepare-commit-msg and commit-msg files on b2_1&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2448&quot;&gt;&lt;del&gt;LU-2448&lt;/del&gt;&lt;/a&gt; has been landed on b2_1 branch. Could you please rebase your local b2_1 branch on the latest one?&lt;/p&gt;</comment>
                            <comment id="52990" author="nozaki" created="Mon, 25 Feb 2013 20:55:48 +0000"  >&lt;p&gt;Thank you for your advice, Jian.&lt;br/&gt;
Here is the patch for b2_1.&lt;br/&gt;
&lt;a href=&quot;http://review.whamcloud.com/#change,5527&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#change,5527&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="52993" author="yujian" created="Mon, 25 Feb 2013 21:13:37 +0000"  >&lt;p&gt;Thank you, Hiroya.&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzvesv:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>6039</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>