<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:55:13 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-12739] Race with discovery thread completion and message queueing</title>
                <link>https://jira.whamcloud.com/browse/LU-12739</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;I noticed a message that was queued for discovery, and on the peer&apos;s queue, but the peer had already completed discovery.&lt;/p&gt;

&lt;p&gt;Here we see the send is queued because of discovery (this is the meaning of &quot;rc 2&quot;):&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;00000001:40000000:8.0:1567808737.644190:0:14279:0:(dvsipc_lnet.c:1280:lnet_tx_request()) LNetPut src 476@gni99 to 12345-106@gni99 mdh ffffc90008a93ce8(1741073) ptl 63 mb 2, ud 18446612201357105176
00000400:40000000:8.0:1567808737.644193:0:14279:0:(lib-move.c:4779:LNetPut()) LNetPut msg ffff881009f58000 md ffff88101ecb3880 mdh 1741073 -&amp;gt; 12345-106@gni99
00000400:40000000:8.0:1567808737.644196:0:14279:0:(lib-lnet.h:101:lnet_list_add_tail()) Adding msg ffff881009f58000(ffff881009f58010) to tail of list ffff881011830320
00000400:40000000:8.0:1567808737.644197:0:14279:0:(lib-move.c:2776:lnet_send()) msg ffff881009f58000 rc 2
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;From the debug output above we can see the list that the message was added to. This lets us easily find the lnet_peer object, and we can confirm both that the message is still on this peer&apos;s lp_dc_pendq at the time of the dump and that lp_state shows discovery has already completed.&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;crash_x86_64&amp;gt; struct -o lnet_peer | grep lp_dc_pendq
   [32] struct list_head lp_dc_pendq;
crash_x86_64&amp;gt; eval ffff881011830320 - 32
hexadecimal: ffff881011830300
    decimal: 18446612201327493888  (-131872382057728)
      octal: 1777774201002140601400
     binary: 1111111111111111100010000001000000010001100000110000001100000000
crash_x86_64&amp;gt; lnet_peer ffff881011830300 | grep lp_primary_nid
  lp_primary_nid = 3659599899000938,
crash_x86_64&amp;gt; epython nid2str 3659599899000938
106@gni99
crash_x86_64&amp;gt; struct -o lnet_peer ffff881011830300 | grep lp_dc_pendq
  [ffff881011830320] struct list_head lp_dc_pendq;
crash_x86_64&amp;gt; list -H ffff881011830320
ffff881009f58010
crash_x86_64&amp;gt; lnet_peer ffff881011830300 | grep lp_state
  lp_state = 273,
crash_x86_64&amp;gt;

*hornc@cflosbld09 190904221344 $ lpst2str.sh 273
LNET_PEER_MULTI_RAIL
LNET_PEER_DISCOVERED
LNET_PEER_NIDS_UPTODATE
*hornc@cflosbld09 190904221344 $
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Thus, we may have a race condition whereby we queue a message because discovery is in progress  (or we determine we need to start discovery on the destination peer), but discovery can complete by the time we actually add the message to the peer&apos;s queue. In this situation the message will sit on the queue until discovery is performed on that peer again. But that won&apos;t ever happen unless discovery is forced from user space, or there is some sort of configuration change.&lt;/p&gt;

&lt;p&gt;i.e. the message would potentially be stranded forever&lt;/p&gt;

&lt;p&gt;In Lustre 2.13, the relevant code path here is:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;LNetPut/LNetGet-&amp;gt;lnet_send-&amp;gt;lnet_select_pathway-&amp;gt;lnet_initiate_peer_discovery
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Here&apos;s an excerpt from lnet_initiate_peer_discovery():&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;        if (!lnet_msg_discovery(msg) || lnet_peer_is_uptodate(peer)) {
                lnet_peer_ni_decref_locked(lpni);
                return 0;
        }

        rc = lnet_discover_peer_locked(lpni, cpt, false);
        if (rc) {
                lnet_peer_ni_decref_locked(lpni);
                return rc;
        }
        /* The peer may have changed. */
        peer = lpni-&amp;gt;lpni_peer_net-&amp;gt;lpn_peer;
        /* queue message and return */
        msg-&amp;gt;msg_rtr_nid_param = rtr_nid;
        msg-&amp;gt;msg_sending = 0;
        msg-&amp;gt;msg_txpeer = NULL;
        spin_lock(&amp;amp;peer-&amp;gt;lp_lock);
        list_add_tail(&amp;amp;msg-&amp;gt;msg_list, &amp;amp;peer-&amp;gt;lp_dc_pendq);
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;In lnet_peer_is_uptodate() we take lp-&amp;gt;lp_lock and check lp_state to determine if the peer is &quot;up to date&quot; i.e. whether discovery needs to be performed on this peer. (lp_lock is released after we check this).&lt;/p&gt;

&lt;p&gt;Since the message is queued we know that this function returns false, and we continue on to the lnet_discover_peer_locked() function. From the code we can see that we expect that if lnet_discover_peer_locked() returns 0 then we need to add the message to the peer&apos;s lp_dc_pendq with the expectation that this message will later be sent by the discovery thread when discovery on this peer has completed. However, discovery on this peer could already be in progress.&lt;/p&gt;

&lt;p&gt;Looking at our synchronization mechanisms there are a lot of things at play.&lt;br/&gt;
1. lnet_select_pathway() takes the net lock for the current cpt:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;cpt = lnet_net_lock_current();
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;This net lock is held when lnet_initiate_peer_discovery() is called, and the cpt number is passed as an argument to that function.&lt;/p&gt;

&lt;p&gt;2. lnet_discovery_peer_locked() is also passed the cpt number, and it:&lt;br/&gt;
2.a drops the net lock being held&lt;br/&gt;
2.b acquires the net lock in exclusive mode (i.e. acquires the net locks for all cpts)&lt;br/&gt;
2.c if necessary, adds the peer to the discovery queue under the exclusive net lock&lt;br/&gt;
2.d releases the exclusive net lock&lt;br/&gt;
2.e reacquires the net lock dropped in 2.a&lt;br/&gt;
2.g returns&lt;/p&gt;

&lt;p&gt;3. The peer discovery thread:&lt;br/&gt;
3.a Holds the net lock in exclusive mode when manipulating its work queues&lt;br/&gt;
3.b Holds the lp_lock while manipulating the peer&apos;s lp_state.&lt;/p&gt;

&lt;p&gt;4. The lp_lock is taken by lnet_initiate_peer_discovery() when the message is added to the peer&apos;s lp_dc_pendq.&lt;/p&gt;

&lt;p&gt;5. The discovery event handler takes the lp_lock when it updates the lp_state based on the event information.&lt;/p&gt;

&lt;p&gt;Given all of the above I believe there are ample opportunities for the race scenario described above. I find it difficult to articulate a particular code path where this happens, but it is very easy to prove the theory correct.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;diff --git a/lnet/include/lnet/lib-types.h b/lnet/include/lnet/lib-types.h
index e0e82b504d..238be3d368 100644
--- a/lnet/include/lnet/lib-types.h
+++ b/lnet/include/lnet/lib-types.h
@@ -152,6 +152,7 @@ struct lnet_msg {
 	unsigned int          msg_peerrtrcredit:1; /* taken a peer router credit */
 	unsigned int          msg_onactivelist:1; /* on the activelist */
 	unsigned int	      msg_rdma_get:1;
+	unsigned 	      msg_lp_state;

 	struct lnet_peer_ni  *msg_txpeer;         /* peer I&apos;m sending to */
 	struct lnet_peer_ni  *msg_rxpeer;         /* peer I received from */
diff --git a/lnet/lnet/lib-move.c b/lnet/lnet/lib-move.c
index 15244d40ee..1ab595df83 100644
--- a/lnet/lnet/lib-move.c
+++ b/lnet/lnet/lib-move.c
@@ -2014,6 +2014,7 @@ lnet_initiate_peer_discovery(struct lnet_peer_ni *lpni,
 	msg-&amp;gt;msg_sending = 0;
 	msg-&amp;gt;msg_txpeer = NULL;
 	spin_lock(&amp;amp;peer-&amp;gt;lp_lock);
+	msg-&amp;gt;msg_lp_state = peer-&amp;gt;lp_state;
 	list_add_tail(&amp;amp;msg-&amp;gt;msg_list, &amp;amp;peer-&amp;gt;lp_dc_pendq);
 	spin_unlock(&amp;amp;peer-&amp;gt;lp_lock);
 	lnet_peer_ni_decref_locked(lpni);
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The above patch records the lp_state in the lnet_msg object under the lp_lock when the message is added to the peer&apos;s lp_dc_pendq. I reproduced this issue using the above patch and lo:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;crash_x86_64&amp;gt; lnet_msg ffff88100bf89000 | grep state
  msg_lp_state = 273,
crash_x86_64&amp;gt;
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</description>
                <environment></environment>
        <key id="56864">LU-12739</key>
            <summary>Race with discovery thread completion and message queueing</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="hornc">Chris Horn</assignee>
                                    <reporter username="hornc">Chris Horn</reporter>
                        <labels>
                            <label>patch</label>
                    </labels>
                <created>Mon, 9 Sep 2019 21:54:31 +0000</created>
                <updated>Tue, 20 Sep 2022 12:19:26 +0000</updated>
                            <resolved>Sat, 28 Sep 2019 03:43:16 +0000</resolved>
                                    <version>Lustre 2.13.0</version>
                                    <fixVersion>Lustre 2.13.0</fixVersion>
                    <fixVersion>Lustre 2.12.10</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>2</watches>
                                                                            <comments>
                            <comment id="254422" author="gerrit" created="Mon, 9 Sep 2019 21:58:03 +0000"  >&lt;p&gt;Chris Horn (hornc@cray.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/36139&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/36139&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-12739&quot; title=&quot;Race with discovery thread completion and message queueing&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-12739&quot;&gt;&lt;del&gt;LU-12739&lt;/del&gt;&lt;/a&gt; lnet: Don&apos;t queue msg when discovery has completed&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 61300a6cc4d682be093c465041c0bcc731f7a1f2&lt;/p&gt;</comment>
                            <comment id="255514" author="gerrit" created="Fri, 27 Sep 2019 23:12:26 +0000"  >&lt;p&gt;Oleg Drokin (green@whamcloud.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/36139/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/36139/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-12739&quot; title=&quot;Race with discovery thread completion and message queueing&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-12739&quot;&gt;&lt;del&gt;LU-12739&lt;/del&gt;&lt;/a&gt; lnet: Don&apos;t queue msg when discovery has completed&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 4ef62976448d6821df9aab3e720fd8d9d0bdefce&lt;/p&gt;</comment>
                            <comment id="255526" author="pjones" created="Sat, 28 Sep 2019 03:43:16 +0000"  >&lt;p&gt;Landed for 2.13&lt;/p&gt;</comment>
                            <comment id="343284" author="gerrit" created="Wed, 10 Aug 2022 22:06:20 +0000"  >&lt;p&gt;&quot;Serguei Smirnov &amp;lt;ssmirnov@whamcloud.com&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/48190&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/48190&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-12739&quot; title=&quot;Race with discovery thread completion and message queueing&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-12739&quot;&gt;&lt;del&gt;LU-12739&lt;/del&gt;&lt;/a&gt; lnet: Don&apos;t queue msg when discovery has completed&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_12&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 094442c95f12c55a4b62c72b35f9d23b261d5ad7&lt;/p&gt;</comment>
                            <comment id="347142" author="gerrit" created="Tue, 20 Sep 2022 03:35:02 +0000"  >&lt;p&gt;&quot;Oleg Drokin &amp;lt;green@whamcloud.com&amp;gt;&quot; merged in patch &lt;a href=&quot;https://review.whamcloud.com/48190/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/48190/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-12739&quot; title=&quot;Race with discovery thread completion and message queueing&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-12739&quot;&gt;&lt;del&gt;LU-12739&lt;/del&gt;&lt;/a&gt; lnet: Don&apos;t queue msg when discovery has completed&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_12&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 96c2b0d395ae9bd795277ae3a2607054bd9b65e6&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                                                <inwardlinks description="is duplicated by">
                                        <issuelink>
            <issuekey id="67186">LU-15234</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i00mh3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>