<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:21:56 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-8948] Cannot change conf_param settings after changing the NID of a Lustre OSD using lctl replace_nids</title>
                <link>https://jira.whamcloud.com/browse/LU-8948</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Attempting to use&#160;&lt;tt&gt;lctl replace_nids&lt;/tt&gt; per Lustre manual to change NIDs for OSDs. The intention is to convert servers that were formatted without servicenode entries, so that the targets can be configured for failover.&lt;/p&gt;

&lt;p&gt;The documentation is ambiguous, and my attempt to use this command fails, insofar as any attempt to use&#160;&lt;tt&gt;lctl conf_param&lt;/tt&gt; after using&#160;&lt;tt&gt;lctl replace_nids&lt;/tt&gt; fails. I have tried several experiments, without success. The following is an outline of the process followed, which covers several variations.&lt;/p&gt;

&lt;p&gt;If the replace_nids command is not suitable for this exercise, then the documentation should clarify the use cases for which it is suitable.&lt;/p&gt;

&lt;p&gt;A very simple test was also attempted, whereby the MDS NID was changed from 10.10.2.12@tcp0 to 10.10.2.14@tcp0. The result is the same (see last test case).&lt;/p&gt;

&lt;p&gt;Note: failure here constitutes an inability to alter Lustre parameters (in the example, changing the quota settings fails). The file system does mount and can be used by a client.&lt;/p&gt;

&lt;p&gt;I&apos;d like to have the documentation clarified with the exact syntax and process, as well as use cases for the&#160;&lt;tt&gt;lctl replace_nids&lt;/tt&gt; command, in case there is something I have missed.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;Format MGS, MDT0000, OST0000, OST0001 as lidskfs, no failover:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;mds1: mkfs.lustre --mgs /dev/sda
mds2: mkfs.lustre --mdt --index 0 --mgsnode 10.10.2.11@tcp0 --fsname demo /dev/sdb
oss1: mkfs.lustre --ost --index 0 --fsname demo --mgsnode 10.10.2.11@tcp0 /dev/sda
oss2: mkfs.lustre --ost --index 1 --fsname demo --mgsnode 10.10.2.11@tcp0 /dev/sdb



&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Mount on client and confirm FS is operating correctly (create files, check stripes, check &lt;tt&gt;lfs df&lt;/tt&gt;).&lt;/p&gt;

&lt;p&gt;Use a simple check that parameters can be set persistently:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;lctl conf_param demo.quota.ost=ug
...
lctl conf_param demo.quota.ost=none



&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Umount client, MDT0000, OST0000, OST0001. MGS remains online.&lt;/p&gt;

&lt;p&gt;Run&#160;&lt;tt&gt;tunefs.lustre&lt;/tt&gt; on MDT0000, adding servicenodes:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;tunefs.lustre --erase-params \
  --servicenode 10.10.2.12@tcp0:10.10.2.11@tcp0 \
  --mgsnode 10.10.2.11@tcp0 --mgsnode 10.10.2.12@tcp0 \
  /dev/sdb



&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;On MGS, run &lt;tt&gt;lctl replace_nids&lt;/tt&gt;:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;lctl replace_nids demo-MDT0000 10.10.2.12@tcp0:10.10.2.11@tcp0



&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Remount MDT00000&lt;/p&gt;

&lt;p&gt;MGS syslog contains:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;Dec  9 00:13:45 rh7z-mds1 kernel: Lustre: Found index 0 &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; demo-MDT0000, updating log



&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Remount OST0000, OST0001, client in sequence.&lt;/p&gt;

&lt;p&gt;Verify FS is online, files still accessible on client.&lt;/p&gt;

&lt;p&gt;Re-run a simple check that parameters can be set persistently:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;lctl conf_param demo.quota.ost=ug



&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Returns:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;error: conf_param: File exists



&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;MGS syslog reports errors:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;Dec  9 00:14:56 rh7z-mds1 kernel: LustreError: 4879:0:(llog.c:336:llog_init_handle()) MGS: llog uuid mismatch: config_uuid/
Dec  9 00:14:56 rh7z-mds1 kernel: LustreError: 4879:0:(mgs_llog.c:1446:record_start_log()) MGS: can&apos;t start log demo-MDT0000.bak: rc = -17
Dec  9 00:14:56 rh7z-mds1 kernel: LustreError: 4879:0:(mgs_llog.c:1543:mgs_write_log_direct_all()) MGS: writing log demo-MDT0000.bak: rc = -17
Dec  9 00:14:56 rh7z-mds1 kernel: LustreError: 4879:0:(mgs_llog.c:3626:mgs_write_log_param()) err -17 on param &lt;span class=&quot;code-quote&quot;&gt;&apos;quota.ost=none&apos;&lt;/span&gt;
Dec  9 00:14:56 rh7z-mds1 kernel: LustreError: 4879:0:(mgs_handler.c:993:mgs_iocontrol()) MGS: setparam err: rc = -17
Dec  9 00:14:56 rh7z-mds1 kernel: LustreError: 4879:0:(mgs_handler.c:993:mgs_iocontrol()) Skipped 1 previous similar message



&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;hr /&gt;
&lt;p&gt;Umount client, MDT0000, OST0000, OST0001. MGS remains online.&lt;/p&gt;

&lt;p&gt;Run&#160;&lt;tt&gt;tunefs.lustre&lt;/tt&gt; on MDT0000, adding servicenodes:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;tunefs.lustre --erase-params \
  --servicenode 10.10.2.12@tcp0:10.10.2.11@tcp0 \
  --mgsnode 10.10.2.11@tcp0 --mgsnode 10.10.2.12@tcp0 \
  /dev/sdb



&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;On MGS, run &lt;tt&gt;lctl replace_nids&lt;/tt&gt;, using comma separator instead of colon, following Lustre manual explicitly:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;lctl replace_nids demo-MDT0000 10.10.2.12@tcp0,10.10.2.11@tcp0



&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Remount MDT00000&lt;/p&gt;

&lt;p&gt;MGS syslog contains:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;Dec  9 00:33:00 rh7z-mds1 kernel: Lustre: Found index 0 &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; demo-MDT0000, updating log



&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Remount MDT0000, OST0000, OST0001, client&lt;/p&gt;

&lt;p&gt;Re-run a simple check that parameters can be set persistently:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;[root@rh7z-mds1 ~]# lctl conf_param demo.quota.ost=none
error: conf_param: File exists



&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;MGS syslog reports errors:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;Dec  9 00:33:52 rh7z-mds1 kernel: LustreError: 4969:0:(llog.c:336:llog_init_handle()) MGS: llog uuid mismatch: config_uuid/
Dec  9 00:33:52 rh7z-mds1 kernel: LustreError: 4969:0:(mgs_llog.c:1446:record_start_log()) MGS: can&apos;t start log demo-MDT0000.bak: rc = -17
Dec  9 00:33:52 rh7z-mds1 kernel: LustreError: 4969:0:(mgs_llog.c:1543:mgs_write_log_direct_all()) MGS: writing log demo-MDT0000.bak: rc = -17
Dec  9 00:33:52 rh7z-mds1 kernel: LustreError: 4969:0:(mgs_llog.c:3626:mgs_write_log_param()) err -17 on param &lt;span class=&quot;code-quote&quot;&gt;&apos;quota.ost=none&apos;&lt;/span&gt;
Dec  9 00:33:52 rh7z-mds1 kernel: LustreError: 4969:0:(mgs_handler.c:993:mgs_iocontrol()) MGS: setparam err: rc = -17



&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;hr /&gt;
&lt;p&gt;Umount client, MDT0000, OST0000, OST0001. MGS remains online.&lt;/p&gt;

&lt;p&gt;Run&#160;&lt;tt&gt;tunefs.lustre&lt;/tt&gt; on MDT0000 with a single servicenode NID and a single mgsnode:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;tunefs.lustre --erase-params --servicenode 10.10.2.12@tcp0 --mgsnode 10.10.2.11@tcp0 /dev/sdb



&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;On MGS, run &lt;tt&gt;lctl replace_nids&lt;/tt&gt;:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;[root@rh7z-mds1 ~]# lctl replace_nids demo-MDT0000 10.10.2.12@tcp0



&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Remount MDT0000, OST0000, OST0001, client&lt;/p&gt;

&lt;p&gt;MGS syslog contains:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;Dec  9 00:38:31 rh7z-mds1 kernel: Lustre: Found index 0 &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; demo-MDT0000, updating log



&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Try to change quota setting again, MGS reports the same error.&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;Umount client, MDT0000, OST0000, OST0001. MGS remains online.&lt;/p&gt;

&lt;p&gt;Run&#160;&lt;tt&gt;tunefs.lustre&lt;/tt&gt; on MDT0000 with the equivalent of the original settings:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;tunefs.lustre --erase-params --mgsnode 10.10.2.11@tcp0 /dev/sdb



&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Remount MDT0000, OST0000, OST0001, client&lt;/p&gt;

&lt;p&gt;Re-run a simple check that parameters can be set persistently:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;[root@rh7z-mds1 ~]# lctl conf_param demo.quota.ost=none
error: conf_param: File exists



&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;MGS syslog reports same error.&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;Umount all, reformat all targets to create new FS.&lt;/p&gt;

&lt;p&gt;Mount MGT, MDT0000, OST0000, OST0001&lt;/p&gt;

&lt;p&gt;Verify that client can mount the FS.&lt;/p&gt;

&lt;p&gt;Run quota test as before.&lt;/p&gt;

&lt;p&gt;Umount client, MDT0000, OST0000, OST0001&lt;/p&gt;

&lt;p&gt;Remove kernel modules on MDT0000 host.&lt;/p&gt;

&lt;p&gt;Change IPv4 address from 10.10.2.12 to 10.10.2.14, reload&#160;&lt;tt&gt;lnet&lt;/tt&gt; module and verify that new NID is applied.&lt;/p&gt;

&lt;p&gt;On MGT, run:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;lctl replace_nids demo-MDT0000 10.10.2.14@tcp0


&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Remount MDT0000, OST0000, OST0001, client&lt;/p&gt;

&lt;p&gt;Verify that FS is usable.&lt;/p&gt;

&lt;p&gt;Re-run quota conf_param test. Fails as before.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</description>
                <environment></environment>
        <key id="42341">LU-8948</key>
            <summary>Cannot change conf_param settings after changing the NID of a Lustre OSD using lctl replace_nids</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="malkolm">Malcolm Cowe</reporter>
                        <labels>
                    </labels>
                <created>Fri, 9 Dec 2016 06:20:40 +0000</created>
                <updated>Tue, 16 Jul 2019 09:13:04 +0000</updated>
                                                                                <due></due>
                            <votes>0</votes>
                                    <watches>4</watches>
                                                                            <comments>
                            <comment id="199412" author="adilger" created="Thu, 15 Jun 2017 23:02:27 +0000"  >&lt;p&gt;Hi Artem, could you please comment on this issue, since you wrote &lt;tt&gt;replace_nids&lt;/tt&gt; originally.&lt;/p&gt;</comment>
                            <comment id="199440" author="artem_blagodarenko" created="Fri, 16 Jun 2017 10:17:48 +0000"  >&lt;p&gt;Strange, .bak file is accessed during conference_param&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;(mgs_llog.c:1543:mgs_write_log_direct_all()) MGS: writing log demo-MDT0000.bak: rc = -17

&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;replace_nids create .bak file before changing configs, so probably there is some kind of conflict with configs_param. Investigating.&lt;/p&gt;</comment>
                            <comment id="199447" author="artem_blagodarenko" created="Fri, 16 Jun 2017 12:29:57 +0000"  >&lt;p&gt;Found the reason.&lt;br/&gt;
&#160;&lt;br/&gt;
mgs_write_log_direct_all() modifies all config logs in directory. It seems that .bak file is processed. &#160;This file should not be processed after 0bb49b2624827490ca3ea6a146d96cf7cf2b402f&#160;&quot;&#160;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7160&quot; title=&quot;Saved by change_nids .bak files on MGS should never be processed&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7160&quot;&gt;&lt;del&gt;LU-7160&lt;/del&gt;&lt;/a&gt; mgs: Skip processing .bak files on MGS&#8221; patch. What version is installed on MGS?&lt;br/&gt;
&#160;&lt;br/&gt;
Thanks.&lt;/p&gt;</comment>
                            <comment id="199524" author="adilger" created="Sat, 17 Jun 2017 00:22:14 +0000"  >&lt;p&gt;I think I commented in a separate patchthat we should not process all of the files in the CONFIGS directory, but instead only the files of the form &lt;tt&gt;fsname-&amp;lt;MDT,OST,client,sptlrpc,params&amp;gt;&lt;/tt&gt; and other known names. This avoids all sorts of problems if other backup file names are used. &lt;/p&gt;</comment>
                            <comment id="199535" author="artem_blagodarenko" created="Sat, 17 Jun 2017 11:53:00 +0000"  >&lt;p&gt;&amp;gt; This avoids all sorts of problems if other backup file names are used.&lt;br/&gt;
Commented to &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7160&quot; title=&quot;Saved by change_nids .bak files on MGS should never be processed&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7160&quot;&gt;&lt;del&gt;LU-7160&lt;/del&gt;&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="251457" author="artem_blagodarenko" created="Tue, 16 Jul 2019 09:13:04 +0000"  >&lt;p&gt;Can we close this issue as&#160;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7160&quot; title=&quot;Saved by change_nids .bak files on MGS should never be processed&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7160&quot;&gt;&lt;del&gt;LU-7160&lt;/del&gt;&lt;/a&gt;&#160;duplicate now?&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzyy1j:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>