<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:15:00 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-15047] GSS and multi-rail incompatibility</title>
                <link>https://jira.whamcloud.com/browse/LU-15047</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;When I try to mount lustre with GSS (SSK) enabled I receive checksum errors using a multi-rail client where I do not when using only a single interface.&#160; My guess is the NID is encoded in the checksum though I haven&apos;t dug into the cause yet.&#160; I also had lots of errors when using GSS on multi-rail servers although the errors were different.&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[154311.786639] LustreError: 194908:0:(gss_sk_mech.c:388:sk_verify_hmac()) checksum mismatch
[154311.798154] LustreError: 194908:0:(sec_gss.c:242:gss_verify_msg()) mic verify error: 00060000
[154311.810015] LustreError: 194908:0:(sec_gss.c:2125:gss_svc_verify_request()) failed to verify request: 60000
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</description>
                <environment></environment>
        <key id="66329">LU-15047</key>
            <summary>GSS and multi-rail incompatibility</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="sebastien">Sebastien Buisson</assignee>
                                    <reporter username="jfilizetti">Jeremy Filizetti</reporter>
                        <labels>
                    </labels>
                <created>Thu, 30 Sep 2021 13:40:38 +0000</created>
                <updated>Sat, 17 Jun 2023 05:36:30 +0000</updated>
                            <resolved>Wed, 3 Nov 2021 03:31:55 +0000</resolved>
                                                    <fixVersion>Lustre 2.15.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>5</watches>
                                                                            <comments>
                            <comment id="314390" author="pjones" created="Thu, 30 Sep 2021 13:44:54 +0000"  >&lt;p&gt;What version of Lustre are you using here Jeremy?&lt;/p&gt;</comment>
                            <comment id="314423" author="jfilizetti" created="Thu, 30 Sep 2021 16:30:18 +0000"  >&lt;p&gt;Servers were running 2.12.7 client is tag 2.14.54&lt;/p&gt;</comment>
                            <comment id="314526" author="pjones" created="Fri, 1 Oct 2021 17:29:58 +0000"  >&lt;p&gt;Sebastien&lt;/p&gt;

&lt;p&gt;What are your thoughts here?&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="314578" author="sebastien" created="Mon, 4 Oct 2021 10:30:22 +0000"  >&lt;p&gt;Hi,&lt;/p&gt;

&lt;p&gt;&lt;tt&gt;gss_svc_verify_request&lt;/tt&gt; computes the checksum on &lt;tt&gt;req-&amp;gt;rq_reqbuf&lt;/tt&gt;, so even if it contains the NID, that should not be a problem for checksum calculation.&lt;br/&gt;
Are the log messages in the description of the ticket seen on client or server side? I can see that you are running Lustre master branch on your client (&lt;tt&gt;2.14.54&lt;/tt&gt;). Is this issue with multirail new? Have you been able to successfully mount a multirail client with older versions of Lustre, and if so, what is the most recent one that made it?&lt;br/&gt;
Also, which SK flavor are you running?&lt;/p&gt;

&lt;p&gt;Cheers,&lt;br/&gt;
Sebastien.&lt;/p&gt;</comment>
                            <comment id="314579" author="sebastien" created="Mon, 4 Oct 2021 11:11:24 +0000"  >&lt;p&gt;Also, I thought that in case of multirail, the primary NID was always used as the unique identifier of the connection. &lt;a href=&quot;https://jira.whamcloud.com/secure/ViewProfile.jspa?name=ssmirnov&quot; class=&quot;user-hover&quot; rel=&quot;ssmirnov&quot;&gt;ssmirnov&lt;/a&gt; are you aware of any change in this area in (recent) master?&lt;/p&gt;</comment>
                            <comment id="314597" author="ssmirnov" created="Mon, 4 Oct 2021 17:13:11 +0000"  >&lt;p&gt;Sebastien,&lt;/p&gt;

&lt;p&gt;Yes, there were LNet patches that went into 2.14.54 which could be related (&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14668&quot; title=&quot;LNet: do discovery in the background&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14668&quot;&gt;&lt;del&gt;LU-14668&lt;/del&gt;&lt;/a&gt;, &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14661&quot; title=&quot;Provide kernel API for adding peer/peer NI&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14661&quot;&gt;&lt;del&gt;LU-14661&lt;/del&gt;&lt;/a&gt;), but I just didn&apos;t think these patches would cause LNet to switch the peer&apos;s primary NID somehow. Perhaps &lt;a href=&quot;https://jira.whamcloud.com/secure/ViewProfile.jspa?name=ashehata&quot; class=&quot;user-hover&quot; rel=&quot;ashehata&quot;&gt;ashehata&lt;/a&gt;&#160;can confirm.&lt;/p&gt;

&lt;p&gt;Not sure if this can affect SSK, but another thing to check for a MR client would be the linux routing setup, to make sure that the intended interface is actually used for sending. For example:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
sysctl -w net.ipv4.conf.all.rp_filter=0
sysctl -w net.ipv4.conf.all.arp_filter=0
sysctl -w net.ipv4.conf.ib0.arp_ignore=1
sysctl -w net.ipv4.conf.ib0.arp_filter=0
sysctl -w net.ipv4.conf.ib0.arp_announce=2
sysctl -w net.ipv4.conf.ib0.rp_filter=0 &lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;If tcp is used, the routes also need to be added. Manual steps are described here: &lt;a href=&quot;https://wiki.whamcloud.com/display/LNet/MR+Cluster+Setup&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://wiki.whamcloud.com/display/LNet/MR+Cluster+Setup&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The following patch (still under review) automates adding the routes for tcp interfaces:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://review.whamcloud.com/#/c/44065/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/#/c/44065/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="314699" author="jfilizetti" created="Tue, 5 Oct 2021 11:27:30 +0000"  >&lt;p&gt;When I had the servers coming up with multirail they were even failing with GSS.&#160; So I had moved it to one interface and it came up fine.&#160; Then adding the client with multirail fails with the checksum error but with a single interface works fine.&#160;&lt;/p&gt;

&lt;p&gt;I&apos;m pretty sure the arp settings Whamcloud keeps recommending is wrong.&#160; I went through this with Amir a few months ago but arp_filter and rp_filter should be set to 1 for mutli-rail to function correctly.&#160; In every other case it was intermittent.&#160;&lt;/p&gt;

&lt;p&gt;Is GSS and multirail actually being tested together?&#160; I was somewhat assuming when I filed this that they were only being tested independently.&#160; If I get a chance this week I&apos;ll try to dig into it more to get to the bottom of what&apos;s happening.&#160;&lt;/p&gt;</comment>
                            <comment id="315574" author="sebastien" created="Thu, 14 Oct 2021 16:03:13 +0000"  >&lt;p&gt;I managed to reproduce a similar issue on my test cluster. After properly tuning Linux routing as explained on the wiki page at &lt;a href=&quot;https://wiki.whamcloud.com/display/LNet/MR+Cluster+Setup&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://wiki.whamcloud.com/display/LNet/MR+Cluster+Setup&lt;/a&gt; , I formatted a simple Lustre file system made of 3 servers (1 MGS, 1 MDS, 1 OSS) and 1 client. All nodes use Eth, and have the same network configuration:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;2 network interfaces, &lt;tt&gt;enp0s8&lt;/tt&gt; on &lt;tt&gt;192.168.56.0/24&lt;/tt&gt; and &lt;tt&gt;enp0s9&lt;/tt&gt; on &lt;tt&gt;192.168.57.0/24&lt;/tt&gt;;&lt;/li&gt;
	&lt;li&gt;&lt;tt&gt;/etc/modprobe.d/lustre.conf&lt;/tt&gt; contains &lt;tt&gt;options lnet networks=&quot;tcp0(enp0s8,enp0s9)&quot;&lt;/tt&gt;;&lt;/li&gt;
	&lt;li&gt;LNet auto discovery is enabled.&lt;br/&gt;
With this configuration, we enable LNet Multirail on tcp0.&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;This file system works fine without SSK enabled. When I enable SSK (&lt;tt&gt;skpi&lt;/tt&gt; flavor for &lt;tt&gt;cli2ost&lt;/tt&gt; connections), the client fails to mount, and we can see the following messages on OSS side:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]: handling request
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]: readline: read 1378 chars into buffer of size 2048:
                                                   \x02000200 \xce39a8c000000200 \xa7c7df3fb8a605b2 \x64656661756c7400 \x \x343a000000012c343a4f6825ee2c3235363affd19cfee97a53c7928649dd3245347ae724b53b59ccbb7fb50c1dfe93b726510a27a08dacf539bce044c62c56878719dab244396990d477
b6be879153a42e6529ca0d4192154592aebcea3e72709f2133b565229304974d243e36b5b176bccced176a280ee0727623871508406eff4120172ddf3601521fe8ce2c1a139234b17284242cd219189393a0cee481417ee89e0e2422a30e322c301077a07c917d5941bef9d942f164...
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]: handling req: svc 2, nid 00020000c0a839ce, idx b205a6b83fdfc7a7 nodemap default
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]: in_handle:
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]: length 0
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]:
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]: in_tok:
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]: length 652
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]:
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]:   0000: 343a 0000 0001 2c34 3a4f 6825 ee2c 3235  4:....,4:Oh%.,25
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]:   0010: 363a ffd1 9cfe e97a 53c7 9286 49dd 3245  6:.....zS...I.2E
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]:   0020: 347a e724 b53b 59cc bb7f b50c 1dfe 93b7  4z.$.;Y.........
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]:   0030: 2651 0a27 a08d acf5 39bc e044 c62c 5687  &amp;amp;Q.&apos;....9..D.,V.
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]:   0040: 8719 dab2 4439 6990 d477 b6be 8791 53a4  ....D9i..w....S.
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]:   0050: 2e65 29ca 0d41 9215 4592 aebc ea3e 7270  .e)..A..E....&amp;gt;rp
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]:   0060: 9f21 33b5 6522 9304 974d 243e 36b5 b176  .!3.e&quot;...M$&amp;gt;6..v
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]:   0070: bccc ed17 6a28 0ee0 7276 2387 1508 406e  ....j(..rv#...@n
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]:   0080: ff41 2017 2ddf 3601 521f e8ce 2c1a 1392  .A .-.6.R...,...
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]:   0090: 34b1 7284 242c d219 1893 93a0 cee4 8141  4.r.$,.........A
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]:   00a0: 7ee8 9e0e 2422 a30e 322c 3010 77a0 7c91  ~...$&quot;..2,0.w.|.
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]:   00b0: 7d59 41be f9d9 42f1 64fd 6f67 5b3f 9c4c  }YA...B.d.og[?.L
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]:   00c0: 3ae0 dc89 568d 961a 6d85 fc70 f1da c3f8  :...V...m..p....
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]:   00d0: e5fd 035f a530 cbb8 5c9b 11ad 79c4 ff4d  ..._.0..\...y..M
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]:   00e0: 1d0c e95e 0e95 4725 06d5 5689 95a1 f765  ...^..G%..V....e
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]:   00f0: 0540 eb78 c5a6 4f69 ac1f fe30 024b 6dda  .@.x..Oi...0.Km.
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]:   0100: 5e6d 1457 d72a 2236 6b8a 97ca 52d1 ffcd  ^m.W.*&quot;6k...R...
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]:   0110: 98a3 2c32 3536 3afa ca0c 67e8 89c0 5aa7  ..,256:...g...Z.
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]:   0120: ca9c bc91 d844 be46 7fe4 59c0 abc0 9028  .....D.F..Y....(
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]:   0130: dd72 857b 98a8 8614 9fa4 ed1f fcc1 0c7a  .r.{...........z
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]:   0140: 6de1 e356 ed15 8f80 d717 ee8b 9be9 0783  m..V............
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]:   0150: e2ac 8e7a 5940 3aa4 7aaa ac32 df62 2bce  ...zY@:.z..2.b+.
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]:   0160: f208 1e1b 5f39 e22e c741 0c98 e5e5 c846  ...._9...A.....F
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]:   0170: 82c9 0832 7c08 7635 2b0b a5c9 ab60 cbeb  ...2|.v5+....`..
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]:   0180: 2212 c7de 3bfd dfd4 9eb2 e461 768f b1e6  &quot;...;......av...
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]:   0190: d592 a22e b9c6 3d2a 5f2c 7be5 4b57 d60f  ......=*_,{.KW..
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]:   01a0: 0134 78bc 2648 9854 e600 1e28 7197 d119  .4x.&amp;amp;H.T...(q...
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]:   01b0: 6c69 eeb0 e592 ea78 f9e1 509f 1ac9 6b06  li.....x..P...k.
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]:   01c0: b2b6 fe8c 59b5 59a2 8e5e 0a53 8403 f6e3  ....Y.Y..^.S....
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]:   01d0: c8eb 31b5 c0f6 1c28 a07c cdcd 6dbc 98c4  ..1....(.|..m...
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]:   01e0: bf51 8cb4 676d 8823 3224 eea1 7dfa 8c3d  .Q..gm.#2$..}..=
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]:   01f0: 462d 1775 aab0 a6f1 a01d 8cfe 8a5c f2c8  F-.u.........\..
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]:   0200: a092 ae4a 89c5 8d15 8529 614a 5af3 1f26  ...J.....)aJZ..&amp;amp;
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]:   0210: 7544 a6f4 e8ad 812c 3333 3a73 6562 2d4f  uD.....,33:seb-O
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]:   0220: 5354 3030 3030 2d6f 7363 2d66 6666 6639  ST0000-osc-ffff9
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]:   0230: 6532 3338 3433 3962 3830 3000 2c33 323a  e238439b800.,32:
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]:   0240: 37a8 eec1 ce19 687d 132f e290 51dc a629  7.....h}./..Q..)
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]:   0250: d164 e2c4 958b a141 d5f4 133a 33f0 688f  .d.....A...:3.h.
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]:   0260: 2c34 3a00 0000 812c 3332 3ad6 4c7d 91a5  ,4:....,32:.L}..
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]:   0270: b8f4 85f5 404e 4a47 a695 072a e965 5f74  ....@NJG...*.e_t
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]:   0280: b53b 5416 6ea8 3f10 692b bd2c            .;T.n.?.i+.,
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]: Handling sk request
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]: Decoded netstring of 652 bytes
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]: Creating credentials for target: seb-OST0000-osc-ffff9e238439b800 with nodemap: default
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]: Searching for key with description: lustre:seb:default
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]: Encoded netstring of 311 bytes
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]: Created netstring of 311 bytes
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]: Serialized buffer of 400 bytes for kernel
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]: doing downcall
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]: \xa7c7df3fb8a605b2 1634229849 0 1 0 0 -1 0 0 sk \x0100000073686132353600000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000006374722861657329000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000...
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]:
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]: sk returning success
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]: sending reply
Oct 14 17:44:09 lnode-vm4.makalu lsvcgssd[196651]: writing message: \x \x343a000000012c343a4f6825ee2c3235363affd19cfee97a53c7928649dd3245347ae724b53b59ccbb7fb50c1dfe93b726510a27a08dacf539bce044c62c56878719dab244396990d477b6be879153a42e6529ca0d4192154592aebcea3e72709f2133b565229304974d243e36b5b176bccced176a280ee0727623871508406eff4120172ddf3601521fe8ce2c1a139234b17284242cd219189393a0cee481417ee89e0e2422a30e322c301077a07c917d5941bef9d942f164fd6f675b3f9c4c3ae0dc89568d961a6d85fc70f1dac3f8e5fd035fa530cbb85c9b11ad79c4ff4d1d0ce95e0e95472506d556899...
Oct 14 17:44:09 lnode-vm4.makalu kernel: Lustre: 196603:0:(sec_gss.c:2066:gss_svc_handle_init()) create svc ctx 000000003a027476: user from 192.168.56.206@tcp authenticated as root
Oct 14 17:44:09 lnode-vm4.makalu kernel: LustreError: 196602:0:(gss_sk_mech.c:388:sk_verify_hmac()) checksum mismatch
Oct 14 17:44:09 lnode-vm4.makalu kernel: LustreError: 196602:0:(sec_gss.c:283:gss_unseal_msg()) unwrap message error: 00060000
Oct 14 17:44:09 lnode-vm4.makalu kernel: LustreError: 196602:0:(sec_gss.c:2196:gss_svc_unseal_request()) failed to unwrap request: d0000
Oct 14 17:44:09 lnode-vm4.makalu kernel: LustreError: 196602:0:(sec_gss.c:2288:gss_svc_handle_data()) svc 3 failed: major 0x000d0000: req xid 1713610397058304 ctx 000000003a027476 idx 0xb205a6b83fdfc7a7(0-&amp;gt;192.168.56.206@tcp)
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;After running &lt;tt&gt;git bisect&lt;/tt&gt; I identified the commit that introduces this problem:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;7d309d57fd LU-9121 lnet: select best peer and local net
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;This commit is part of the merge of the &lt;tt&gt;origin/multi-rail&lt;/tt&gt; branch just after &lt;tt&gt;2.14.50&lt;/tt&gt; tag was put. So basically we suffer from this behavior from very early on the master branch after 2.14.0 was released. Good news is that 2.14.0 is not impacted.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/ViewProfile.jspa?name=ashehata&quot; class=&quot;user-hover&quot; rel=&quot;ashehata&quot;&gt;ashehata&lt;/a&gt; &lt;a href=&quot;https://jira.whamcloud.com/secure/ViewProfile.jspa?name=ssmirnov&quot; class=&quot;user-hover&quot; rel=&quot;ssmirnov&quot;&gt;ssmirnov&lt;/a&gt; do you see how this patch could affect the way peers present themselves to others? My understanding was that the primary NID was always used as the unique identifier of the connection, do you think this commit could change this paradigm? Or maybe this commit could make the multi-rail implementation more effective, by switching between rails more often for instance?&lt;/p&gt;

&lt;p&gt;Thanks,&lt;br/&gt;
Sebastien.&lt;/p&gt;</comment>
                            <comment id="315603" author="ssmirnov" created="Thu, 14 Oct 2021 18:44:38 +0000"  >&lt;p&gt;Hi Sebastien,&lt;/p&gt;

&lt;p&gt;I think I misunderstood your question earlier. While primary NID is used as a node identifier, MR does introduce the ability to select the local net/NID as well as remote net/NID, so that if all options have equal health and priority, LNET is going to round-robin across them. The primary NID is going to stay the same on both sides of the transaction, but different NIDs are going to be actually sending/receiving. After discovery is complete, you can observe that behaviour by using &quot;lnetctl ping&quot; to initiate communication and &quot;lnetctl net show -v 4&quot; to see individual NID send/receive counts on both sides.&lt;/p&gt;

&lt;p&gt;How does encryption work in this case? If it is unaware of what LNet does with NID selection but depends on what NIDs are selected, that does sound like a problem.&lt;/p&gt;

&lt;p&gt;Thanks,&lt;/p&gt;

&lt;p&gt;Serguei&#160;&lt;/p&gt;</comment>
                            <comment id="315837" author="sebastien" created="Mon, 18 Oct 2021 11:39:42 +0000"  >&lt;p&gt;Indeed, GSS must make use of primary NIDs on both ends of the communication channel, so that the computed HMAC is based on these unique identifiers rather than the actual NIDs being used for the current request.&lt;/p&gt;</comment>
                            <comment id="315838" author="gerrit" created="Mon, 18 Oct 2021 11:39:52 +0000"  >&lt;p&gt;&quot;Sebastien Buisson &amp;lt;sbuisson@ddn.com&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/45277&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/45277&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-15047&quot; title=&quot;GSS and multi-rail incompatibility&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-15047&quot;&gt;&lt;del&gt;LU-15047&lt;/del&gt;&lt;/a&gt; gss: gss integrity check with multi-rail&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: bcc1cc38a2286b39c78464f7fd34f237a66fd2be&lt;/p&gt;</comment>
                            <comment id="316057" author="sebastien" created="Wed, 20 Oct 2021 12:00:35 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/ViewProfile.jspa?name=ssmirnov&quot; class=&quot;user-hover&quot; rel=&quot;ssmirnov&quot;&gt;ssmirnov&lt;/a&gt; &lt;a href=&quot;https://jira.whamcloud.com/secure/ViewProfile.jspa?name=ashehata&quot; class=&quot;user-hover&quot; rel=&quot;ashehata&quot;&gt;ashehata&lt;/a&gt; in patch &lt;a href=&quot;https://review.whamcloud.com/45277&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/45277&lt;/a&gt; Jeremy is asking about the possibility that the primary NID for a node is changed. What events could trigger such a change?&lt;/p&gt;</comment>
                            <comment id="316061" author="jfilizetti" created="Wed, 20 Oct 2021 12:24:23 +0000"  >&lt;p&gt;If the primary NID was deleted by a user command I think it would trigger (lnet_peer_del_nid()).&#160; In the code it looks like it could replace the lp_primary_nid but when I just ran &quot;lnetctl peer del&quot; it seemed to delete the whole peer not just the primary NID.&#160; Not sure if that is the intended way things work but this was my concern.&lt;/p&gt;</comment>
                            <comment id="317280" author="gerrit" created="Wed, 3 Nov 2021 02:37:31 +0000"  >&lt;p&gt;&quot;Oleg Drokin &amp;lt;green@whamcloud.com&amp;gt;&quot; merged in patch &lt;a href=&quot;https://review.whamcloud.com/45277/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/45277/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-15047&quot; title=&quot;GSS and multi-rail incompatibility&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-15047&quot;&gt;&lt;del&gt;LU-15047&lt;/del&gt;&lt;/a&gt; gss: gss integrity check with multi-rail&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: c8301a65c5672a1d081669343466746df983eabc&lt;/p&gt;</comment>
                            <comment id="317307" author="pjones" created="Wed, 3 Nov 2021 03:31:55 +0000"  >&lt;p&gt;Landed for 2.15&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i025x3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>