<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:31:33 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-16975] Automatically setup all interfaces for socklnd, o2iblnd</title>
                <link>https://jira.whamcloud.com/browse/LU-16975</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Currently, setting up LNet with socklnd or o2iblnd only automatically sets up the first network interface. Unless a user knows to manually setup the remaining interfaces, their node will experience subpar network performance. All interfaces should be automatically setup.&lt;/p&gt;</description>
                <environment></environment>
        <key id="77105">LU-16975</key>
            <summary>Automatically setup all interfaces for socklnd, o2iblnd</summary>
                <type id="4" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11310&amp;avatarType=issuetype">Improvement</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="timday">Tim Day</assignee>
                                    <reporter username="timday">Tim Day</reporter>
                        <labels>
                    </labels>
                <created>Sun, 23 Jul 2023 19:58:50 +0000</created>
                <updated>Fri, 18 Aug 2023 17:43:07 +0000</updated>
                                                                                <due></due>
                            <votes>0</votes>
                                    <watches>8</watches>
                                                                            <comments>
                            <comment id="379799" author="gerrit" created="Sun, 23 Jul 2023 20:03:02 +0000"  >&lt;p&gt;&quot;Timothy Day &amp;lt;timday@amazon.com&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/51748&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/51748&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-16975&quot; title=&quot;Automatically setup all interfaces for socklnd, o2iblnd&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-16975&quot;&gt;LU-16975&lt;/a&gt; lnet: setup all available interfaces&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 70e8174723687fa6adbf0c0298bbdcd907d01967&lt;/p&gt;</comment>
                            <comment id="379800" author="adilger" created="Sun, 23 Jul 2023 23:47:03 +0000"  >&lt;p&gt;This changes behavior fairly significantly, and it isn&apos;t clear that it is always the right thing to do. For example, on NVIDIA DGX machines there are 8 IB interfaces, but only 2 of them are normally used for storage traffic, while the rest of them are for compute traffic. &lt;/p&gt;

&lt;p&gt;Also, most large clusters have a dedicated administration Ethernet network that is intended for logging and remote IPMI traffic, and flooding this with Lustre traffic would make it difficult to manage the nodes. &lt;/p&gt;

&lt;p&gt;If anything like this is done, it should also be possible to disable the functionality in an easy manner. I&apos;m thinking that rather than building this into the LNDs  IMHO, it would be better to install clients with a default &lt;tt&gt;/etc/modprobe.d/lnet.conf&lt;/tt&gt; that matches all Ethernet interfaces or similar but would be replaced easily in large clusters by an appropriate &lt;tt&gt;ip2nets&lt;/tt&gt; line that matched the right interfaces. &lt;/p&gt;</comment>
                            <comment id="379806" author="JIRAUSER18433" created="Mon, 24 Jul 2023 03:07:34 +0000"  >&lt;p&gt;Currently, LNet automatically sets up the first Ethernet or IB interface, which I think is also often wrong. If an appropriate parameter is passed to the module (via /etc/modprobe.d/lnet.conf or otherwise), LNet uses that instead of the default behavior. This patch preserves that, so the default settings are easy to disable.&lt;/p&gt;

&lt;p&gt;Right now, default setup behavior is decided in LNet rather than the LNDs (this patch doesn&apos;t change this). It would be better if each LND could define it&apos;s own default behavior without having to have special logic in LNet (this would make things much more modular). But, I think that would need a larger refactor since lnd_startup only accepts one NI at a time.&lt;/p&gt;

&lt;p&gt;I think the goal any default setting is to pick the least wrong choice for the uninformed user. People already familiar with Lustre likely already know the best network config for their machines.&lt;/p&gt;</comment>
                            <comment id="379807" author="adilger" created="Mon, 24 Jul 2023 03:28:27 +0000"  >&lt;p&gt;This is probably broad enough reaching a change that it should be sent out to Lustre-discuss for comments. &lt;/p&gt;

&lt;p&gt;It is good to know that this doesn&apos;t enable all interfaces if one is explicitly specified, and that should be in the commit message. &lt;/p&gt;

&lt;p&gt;I can understand that in the cloud world it is likely that &quot;enable all visible interfaces&quot; is probably OK, because there are management interfaces not visible to the VM that can be used to control the system.  Possibly in this case, LNet would be listening on the other interfaces but not using them because the servers do not have NIDs there. That wouldn&apos;t be terrible, but a bit of an increased security risk if these interfaces are externally visible. &lt;/p&gt;

&lt;p&gt;I think if we used all interfaces for Lustre traffic on real hardware it would be considered a bug and we would be asked to change it back.  That said, maybe I&apos;m wrong and most clusters already have explicit interface selection and this will be a no-op. &lt;/p&gt;
</comment>
                            <comment id="379923" author="JIRAUSER16704" created="Mon, 24 Jul 2023 20:15:05 +0000"  >&lt;p&gt;I&apos;m a lowly peasant that only reads this stuff normally, but this change should be opt-in via a driver tunable or something.&lt;/p&gt;

&lt;p&gt;Andreas already covered most of it, but in our situation, we don&apos;t want Lustre using slower (read: not high speed data) interfaces, nor do we want Lustre to make these decisions for us. We specifically choose which interfaces to use with Lustre before we mark a node as &quot;production&quot;.&#160;&lt;/p&gt;

&lt;p&gt;I think an example of &quot;doing the wrong thing and assuming default behavior&quot; could be found by looking for &quot;skip_mr_route_setup&quot; in the source code. Without setting skip_mr_route_setup=1 as a ksocklnd module option, that broke things for us when it came out.&lt;/p&gt;</comment>
                            <comment id="382916" author="JIRAUSER18433" created="Fri, 18 Aug 2023 04:52:05 +0000"  >&lt;p&gt;I&apos;ll likely refactor this to make it opt-in. That way, if someone builds a custom client - they could enable it easily. If a lot of people use the flag, it&apos;d be easy to change the default in the future. I might look at implementing something like &quot;lnetctl add --net tcp --if *&quot; which would enable all interfaces (for a particular LND). That would be a QoL improvement, imo.&lt;/p&gt;</comment>
                            <comment id="382965" author="hornc" created="Fri, 18 Aug 2023 15:12:40 +0000"  >&lt;p&gt;Some kind of pattern matching would be nice, too. Our products often have a naming scheme for the HSN interfaces. e.g. hsn0, hsn1, ...  hsnX, heth0, heth1, ... hethX, etc.&lt;/p&gt;</comment>
                            <comment id="382990" author="simmonsja" created="Fri, 18 Aug 2023 17:43:07 +0000"  >&lt;p&gt;You can get pattern matching with glob_match() which the kernel provides. I plan to use it for some of the tunables with Netlink in the near future.&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i03r5z:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>