<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:11:59 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-14695] New OST not visible by MDTs. MGS problem or corrupt catalog llog?</title>
                <link>https://jira.whamcloud.com/browse/LU-14695</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;On our Oak filesystem, running 2.12.6, we have a problem with either the MGS or a corrupt catalog somewhere.&lt;/p&gt;

&lt;p&gt;Active OSTs on this filesystem are from OST000c (12) to OST0137 (311). Today, we tried to add OST index 312 oak-OST0138. The new OST is visible from client, but not from MDTs: we have 6 MDTs (oak-MDT0000 to oak-MDT0005).&lt;/p&gt;

&lt;p&gt;Full disclosure... older OSTs 0-11 were previously removed with the experimental command lctl del_ost from &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7668&quot; title=&quot;permanently remove deactivated OSTs from configuration log&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7668&quot;&gt;&lt;del&gt;LU-7668&lt;/del&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The server logs when I started the new OST are available in &lt;span class=&quot;nobr&quot;&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/attachment/38771/38771_servers-logs.txt&quot; title=&quot;servers-logs.txt attached to LU-14695&quot;&gt;servers-logs.txt&lt;sup&gt;&lt;img class=&quot;rendericon&quot; src=&quot;https://jira.whamcloud.com/images/icons/link_attachment_7.gif&quot; height=&quot;7&quot; width=&quot;7&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;What is weird is the following:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;May 20 14:06:05 oak-md1-s2 kernel: Lustre: 108193:0:(obd_config.c:1641:class_config_llog_handler()) Skip config outside markers, (inst: 0000000000000000, uuid: , flags: 0x0)
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;and that it complains about other OSTs (not OST0138):&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;May 20 14:06:05 oak-md1-s2 kernel: LustreError: 108193:0:(genops.c:556:class_register_device()) oak-OST0134-osc-MDT0003: already exists, won&apos;t add
May 20 14:06:05 oak-md1-s2 kernel: LustreError: 108193:0:(obd_config.c:1835:class_config_llog_handler()) MGC10.0.2.51@o2ib5: cfg command failed: rc = -17
May 20 14:06:05 oak-md1-s2 kernel: Lustre:    cmd=cf001 0:oak-OST0134-osc-MDT0003  1:osp  2:oak-MDT0003-mdtlov_UUID  
May 20 14:06:05 oak-md1-s2 kernel: LustreError: 4061:0:(mgc_request.c:599:do_requeue()) failed processing log: -17
May 20 14:06:05 oak-md1-s2 kernel: Lustre:    cmd=cf001 0:oak-OST0134-osc-MDT0000  1:osp  2:oak-MDT0000-mdtlov_UUID  
May 20 14:06:07 oak-md2-s2 kernel: Lustre: 14846:0:(obd_config.c:1641:class_config_llog_handler()) Skip config outside markers, (inst: 0000000000000000, uuid: , flags: 0x0)
May 20 14:06:07 oak-md2-s2 kernel: LustreError: 14846:0:(genops.c:556:class_register_device()) oak-OST0136-osc-MDT0005: already exists, won&apos;t add
May 20 14:06:07 oak-md2-s2 kernel: LustreError: 14846:0:(obd_config.c:1835:class_config_llog_handler()) MGC10.0.2.51@o2ib5: cfg command failed: rc = -17
May 20 14:06:07 oak-md2-s2 kernel: Lustre:    cmd=cf001 0:oak-OST0136-osc-MDT0005  1:osp  2:oak-MDT0005-mdtlov_UUID  
May 20 14:06:07 oak-md2-s2 kernel: LustreError: 4291:0:(mgc_request.c:599:do_requeue()) failed processing log: -17
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;If I check the llog catalogs on the MGS, the new OST oak-OST0138 seems to have been added though:&lt;/p&gt;

&lt;p&gt;Client catalog on MGS:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@oak-md1-s1 ~]# lctl --device MGS llog_print oak-client | grep OST0138
- { index: 2716, event: attach, device: oak-OST0138-osc, type: osc, UUID: oak-clilov_UUID }
- { index: 2717, event: setup, device: oak-OST0138-osc, UUID: oak-OST0138_UUID, node: 10.0.2.103@o2ib5 }
- { index: 2719, event: add_conn, device: oak-OST0138-osc, node: 10.0.2.104@o2ib5 }
- { index: 2720, event: add_osc, device: oak-clilov, ost: oak-OST0138_UUID, index: 312, gen: 1 }
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;MDS catalogs on MGS:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@oak-md1-s1 ~]# lctl --device MGS llog_print oak-MDT0000 | grep OST0138
- { index: 2785, event: attach, device: oak-OST0138-osc-MDT0000, type: osc, UUID: oak-MDT0000-mdtlov_UUID }
- { index: 2786, event: setup, device: oak-OST0138-osc-MDT0000, UUID: oak-OST0138_UUID, node: 10.0.2.103@o2ib5 }
- { index: 2788, event: add_conn, device: oak-OST0138-osc-MDT0000, node: 10.0.2.104@o2ib5 }
- { index: 2789, event: add_osc, device: oak-MDT0000-mdtlov, ost: oak-OST0138_UUID, index: 312, gen: 1 }

[root@oak-md1-s1 ~]# lctl --device MGS llog_print oak-MDT0001 | grep OST0138
- { index: 2930, event: attach, device: oak-OST0138-osc-MDT0001, type: osc, UUID: oak-MDT0001-mdtlov_UUID }
- { index: 2931, event: setup, device: oak-OST0138-osc-MDT0001, UUID: oak-OST0138_UUID, node: 10.0.2.103@o2ib5 }
- { index: 2933, event: add_conn, device: oak-OST0138-osc-MDT0001, node: 10.0.2.104@o2ib5 }
- { index: 2934, event: add_osc, device: oak-MDT0001-mdtlov, ost: oak-OST0138_UUID, index: 312, gen: 1 }

[root@oak-md1-s1 ~]# lctl --device MGS llog_print oak-MDT0002 | grep OST0138
- { index: 3063, event: attach, device: oak-OST0138-osc-MDT0002, type: osc, UUID: oak-MDT0002-mdtlov_UUID }
- { index: 3064, event: setup, device: oak-OST0138-osc-MDT0002, UUID: oak-OST0138_UUID, node: 10.0.2.103@o2ib5 }
- { index: 3066, event: add_conn, device: oak-OST0138-osc-MDT0002, node: 10.0.2.104@o2ib5 }
- { index: 3067, event: add_osc, device: oak-MDT0002-mdtlov, ost: oak-OST0138_UUID, index: 312, gen: 1 }

[root@oak-md1-s1 ~]# lctl --device MGS llog_print oak-MDT0003 | grep OST0138
- { index: 3079, event: attach, device: oak-OST0138-osc-MDT0003, type: osc, UUID: oak-MDT0003-mdtlov_UUID }
- { index: 3080, event: setup, device: oak-OST0138-osc-MDT0003, UUID: oak-OST0138_UUID, node: 10.0.2.103@o2ib5 }
- { index: 3082, event: add_conn, device: oak-OST0138-osc-MDT0003, node: 10.0.2.104@o2ib5 }
- { index: 3083, event: add_osc, device: oak-MDT0003-mdtlov, ost: oak-OST0138_UUID, index: 312, gen: 1 }

[root@oak-md1-s1 ~]# lctl --device MGS llog_print oak-MDT0004 | grep OST0138
- { index: 3255, event: attach, device: oak-OST0138-osc-MDT0004, type: osc, UUID: oak-MDT0004-mdtlov_UUID }
- { index: 3256, event: setup, device: oak-OST0138-osc-MDT0004, UUID: oak-OST0138_UUID, node: 10.0.2.103@o2ib5 }
- { index: 3258, event: add_conn, device: oak-OST0138-osc-MDT0004, node: 10.0.2.104@o2ib5 }
- { index: 3259, event: add_osc, device: oak-MDT0004-mdtlov, ost: oak-OST0138_UUID, index: 312, gen: 1 }

[root@oak-md1-s1 ~]# lctl --device MGS llog_print oak-MDT0005 | grep OST0138
- { index: 3255, event: attach, device: oak-OST0138-osc-MDT0005, type: osc, UUID: oak-MDT0005-mdtlov_UUID }
- { index: 3256, event: setup, device: oak-OST0138-osc-MDT0005, UUID: oak-OST0138_UUID, node: 10.0.2.103@o2ib5 }
- { index: 3258, event: add_conn, device: oak-OST0138-osc-MDT0005, node: 10.0.2.104@o2ib5 }
- { index: 3259, event: add_osc, device: oak-MDT0005-mdtlov, ost: oak-OST0138_UUID, index: 312, gen: 1 }
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;However, this new OST is NOT visible from the MDTs:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@oak-md1-s2 CONFIGS]# llog_reader /mnt/ldiskfs/mdt/0/CONFIGS/oak-MDT0000 | grep 0138
[root@oak-md1-s2 CONFIGS]# 

[root@oak-md1-s2 ~]# lctl dl | grep OST0138
[root@oak-md1-s2 ~]# 
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;From a client, we can see the new OST but it&apos;s not filling up, which makes sense if the MDTs are not aware of it:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;oak-OST0133_UUID     108461852548 37418203104 69949699416  35% /oak[OST:307]
oak-OST0134_UUID     108461852548 38597230784 68770659804  36% /oak[OST:308]
oak-OST0135_UUID     108461852548 38483562644 68884328272  36% /oak[OST:309]
oak-OST0136_UUID     108461852548 41312045604 66055819468  39% /oak[OST:310]
oak-OST0137_UUID     108461852548 43196874132 64170973596  41% /oak[OST:311]
oak-OST0138_UUID     108461852548        1828 107368054308   1% /oak[OST:312]

&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Right now, we&apos;re up and running in that weird situation... not ideal.&lt;/p&gt;

&lt;p&gt;I&apos;m attaching the catalogs found on the 6 MDTs as &lt;span class=&quot;nobr&quot;&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/attachment/38770/38770_oak-MDT-CONFIGS-llog.tar&quot; title=&quot;oak-MDT-CONFIGS-llog.tar attached to LU-14695&quot;&gt;oak-MDT-CONFIGS-llog.tar&lt;sup&gt;&lt;img class=&quot;rendericon&quot; src=&quot;https://jira.whamcloud.com/images/icons/link_attachment_7.gif&quot; height=&quot;7&quot; width=&quot;7&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/span&gt; and a tarball of the CONFIGS directory on the MGS as &lt;span class=&quot;nobr&quot;&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/attachment/38769/38769_oak-MGS-CONFIGS.tar.gz&quot; title=&quot;oak-MGS-CONFIGS.tar.gz attached to LU-14695&quot;&gt;oak-MGS-CONFIGS.tar.gz&lt;sup&gt;&lt;img class=&quot;rendericon&quot; src=&quot;https://jira.whamcloud.com/images/icons/link_attachment_7.gif&quot; height=&quot;7&quot; width=&quot;7&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;Any idea of what is wrong or corrupt? We would really appreciate any help to avoid doing a full writeconf.&lt;/p&gt;</description>
                <environment>CentOS 7.9</environment>
        <key id="64355">LU-14695</key>
            <summary>New OST not visible by MDTs. MGS problem or corrupt catalog llog?</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="tappro">Mikhail Pershin</assignee>
                                    <reporter username="sthiell">Stephane Thiell</reporter>
                        <labels>
                    </labels>
                <created>Thu, 20 May 2021 21:46:07 +0000</created>
                <updated>Mon, 27 Sep 2021 15:49:14 +0000</updated>
                                            <version>Lustre 2.12.6</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>4</watches>
                                                                            <comments>
                            <comment id="302268" author="pjones" created="Fri, 21 May 2021 17:29:23 +0000"  >&lt;p&gt;Serguei&lt;/p&gt;

&lt;p&gt;Can you please advise?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="302271" author="pjones" created="Fri, 21 May 2021 17:37:34 +0000"  >&lt;p&gt;Sorry- wrong ticket&lt;/p&gt;</comment>
                            <comment id="302272" author="pjones" created="Fri, 21 May 2021 17:38:35 +0000"  >&lt;p&gt;Mike&lt;/p&gt;

&lt;p&gt;Could you please advise?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="302704" author="sthiell" created="Wed, 26 May 2021 17:56:37 +0000"  >&lt;p&gt;Also... lctl dk from a MDS (oak-md2-s1 serving oak-MDT0004):&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;00000100:02000000:4.0:1621544759.322774:0:5321:0:(import.c:1597:ptlrpc_import_recovery_state_machine()) oak-MDT0004: Connection restored to oak-MDT0004-lwp-OST0136_UUID (at 10.0.2.103@o2ib5)
00000020:00000400:51.0:1621544769.080003:0:37195:0:(obd_config.c:1641:class_config_llog_handler()) Skip config outside markers, (inst: 0000000000000000, uuid: , flags: 0x0)
00000020:00000400:51.0:1621544769.093563:0:37195:0:(obd_config.c:1641:class_config_llog_handler()) Skip config outside markers, (inst: 0000000000000000, uuid: , flags: 0x4)
00000020:00020000:51.0:1621544769.093605:0:37195:0:(genops.c:556:class_register_device()) oak-OST0136-osc-MDT0004: already exists, won&apos;t add
00000020:00020000:51.0:1621544769.104827:0:37195:0:(obd_config.c:1835:class_config_llog_handler()) MGC10.0.2.51@o2ib5: cfg command failed: rc = -17
00000020:02000400:51.0:1621544769.116657:0:37195:0:(obd_config.c:2068:class_config_dump_handler())    cmd=cf001 0:oak-OST0136-osc-MDT0004  1:osp  2:oak-MDT0004-mdtlov_UUID

10000000:00020000:19.0:1621544769.127093:0:4304:0:(mgc_request.c:599:do_requeue()) failed processing log: -17
00010000:02000400:22.0:1621544973.442080:0:4310:0:(ldlm_lib.c:816:target_handle_reconnect()) oak-MDT0004: Client 9458049c-ca8d-335b-3531-2606964e11c0 (at 10.51.2.31@o2ib3) reconnecting
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;What generates this error is:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;                if (!(cfg-&amp;gt;cfg_flags &amp;amp; CFG_F_MARKER) &amp;amp;&amp;amp;
                    (lcfg-&amp;gt;lcfg_command != LCFG_MARKER)) {
                        CWARN(&quot;Skip config outside markers, (inst: %016lx, uuid: %s, flags: %#x)\n&quot;,
                                cfg-&amp;gt;cfg_instance,
                                cfg-&amp;gt;cfg_uuid.uuid, cfg-&amp;gt;cfg_flags);
                        cfg-&amp;gt;cfg_flags |= CFG_F_SKIP;
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;but cfg-&amp;gt;cfg_instance is NULL and cfg-&amp;gt;cfg_uuid.uuid empty. Bug?&lt;/p&gt;</comment>
                            <comment id="302723" author="tappro" created="Wed, 26 May 2021 19:47:11 +0000"  >&lt;p&gt;Stephane, this looks like bug for me too, though can&apos;t say for sure is that corrupted llog or something else. I am still checking logs and existent tickets for something similar.&#160; You&apos;ve said that Lustre 2.12.6 is used, is that so also for newly added OST? Also I suppose that older servers were updated to 2.12.6 from older versions, am I right?&lt;/p&gt;</comment>
                            <comment id="302725" author="sthiell" created="Wed, 26 May 2021 19:59:26 +0000"  >&lt;p&gt;Hi Mike! Yes, Lustre 2.12.6 is used here on Oak, on all servers, including newly added OSTs. But yes, older OSTs were added using previous versions of Lustre. We started this filesystem with 2.9 in early 2017, then 2.10 for several years and upgraded to 2.12.x in October 2020. Then we have been upgrading Oak to the latest 2.12.x.&lt;/p&gt;

&lt;p&gt;I&apos;ve started to see random weird behaviors when adding the previous OSTs (like oak-OST0136). One other thing, we have seen a crash similar to&#160;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9699&quot; title=&quot;osp_obd_connect()) ASSERTION( osp-&amp;gt;opd_connects == 1 ) failed&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9699&quot;&gt;&lt;del&gt;LU-9699&lt;/del&gt;&lt;/a&gt; &quot;ASSERTION( osp-&amp;gt;opd_connects == 1 ) failed&quot; once or twice. I guess some llog corruption and/or bad llog buffer handling could be the cause but I can&apos;t find what. I wonder if there is a way to simulate the llog config processing.&lt;/p&gt;

&lt;p&gt;Otherwise, a drastic solution would be to do a full writeconf and remount all targets to regenerate a clean config, but I guess we would also need to stop all clients, which means a long down time as Oak is mounted on several clusters.&lt;/p&gt;</comment>
                            <comment id="303114" author="tappro" created="Mon, 31 May 2021 16:50:52 +0000"  >&lt;p&gt;Stephane, just couple questions, you&apos;ve mentioned before that you have added other OSTs previously, e.g. 0136, and these additions went well, right? Any chance to know what Lustre version was used at that time? My proposal right now is to collect MDT Lustre log on server start to see config llog processing in more details, is that possible? Please add &apos;config&apos; and &apos;info&apos; levels to debug.&lt;/p&gt;</comment>
                            <comment id="303273" author="tappro" created="Wed, 2 Jun 2021 14:34:02 +0000"  >&lt;p&gt;Stephane,&#160; as I see from config logs local copies on MDTs were not updated from main config on MGS, I am not sure why, so it would still be valuable to get server log during mount, it can be related somehow to the servers order in log - there are MDT0004 and MDT0005 were added after last OST0137, so probably that is log processing/copying bug, I am checking that&lt;/p&gt;

&lt;p&gt;As for solution, you could just try to remove (better move to other location just in case) local MDT log of one MDT, say 0003 and remount it. The config log should be copied from MGS and MDT0003 might see OST0138. I worry about that -17 error during log processing, maybe it will interfere, but config log on MGS looks OK and has OST0138&lt;/p&gt;</comment>
                            <comment id="303764" author="sthiell" created="Mon, 7 Jun 2021 18:45:10 +0000"  >&lt;p&gt;Hi Mike,&lt;/p&gt;

&lt;p&gt;Thanks! I will try to gather config llog processing in more details after disabling local MDT config log, at a next scheduled maintenance so I can restart MGS and MDTs. (I think the MGS might have a problem somehow, so better to restart it too). It might not be before 2 weeks though.&lt;/p&gt;

&lt;p&gt;&#160;&lt;br/&gt;
 As for OST 0136, I think it was added with Lustre 2.12.6.&lt;/p&gt;

&lt;p&gt;I can see the version of Lustre in the MGS&apos;s oak-MDT0000 config llog, for example:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;#2735 (224)marker 4700 (flags=0x01, v2.12.6.0) oak-OST0136 &#160; &#160; &apos;add osc&apos; Thu Feb 18 11:22:03 2021-
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&#160;&lt;br/&gt;
 And you&apos;re right that we added 2 new MDTs recently, MDT0004 and MDT0005. Perhaps this is the source of the issue here.&lt;/p&gt;</comment>
                            <comment id="304603" author="sthiell" created="Tue, 15 Jun 2021 18:12:07 +0000"  >&lt;p&gt;We had an opportunity to reboot the MDS in question, so both MDT0000 and MDT0003 restarted, which is a bit confusing in the log. I renamed the config for MDT0003 prior to mounting, but unfortunately I was only able to capture the config for MDT0000, I think, with an error on duplicate OST, this time OST0135 (super weird...). Anyway, we can see the part when the config is loaded (see &lt;span class=&quot;nobr&quot;&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/attachment/39064/39064_oak-md1-s2_dk_config%2Binfo-1.log&quot; title=&quot;oak-md1-s2_dk_config+info-1.log attached to LU-14695&quot;&gt;oak-md1-s2_dk_config+info-1.log&lt;sup&gt;&lt;img class=&quot;rendericon&quot; src=&quot;https://jira.whamcloud.com/images/icons/link_attachment_7.gif&quot; height=&quot;7&quot; width=&quot;7&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/span&gt; for full logs), this is just for OST0135:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;00000020:00000080:0.0:1623351596.339966:0:57227:0:(obd_config.c:1128:class_process_config()) processing cmd: cf010
00000020:00000080:0.0:1623351596.339966:0:57227:0:(obd_config.c:1198:class_process_config()) marker 4694 (0x1) oak-OST0135 add osc
00000020:00000080:0.0:1623351596.339967:0:57227:0:(obd_config.c:1128:class_process_config()) processing cmd: cf005
00000020:00000080:0.0:1623351596.339968:0:57227:0:(obd_config.c:1139:class_process_config()) adding mapping from uuid 10.0.2.104@o2ib5 to nid 0x500050a000268 (10.0.2.104@o2ib5)
00000100:00000040:0.0:1623351596.339969:0:57227:0:(lustre_peer.c:122:class_add_uuid()) found uuid 10.0.2.104@o2ib5 10.0.2.104@o2ib5 cnt=1
00000020:01000000:0.0:1623351596.339969:0:57227:0:(obd_config.c:1695:class_config_llog_handler()) For 2.x interoperability, rename obd type from osc to osp (oak-MDT0000)
00000020:00000080:0.0:1623351596.339970:0:57227:0:(obd_config.c:1128:class_process_config()) processing cmd: cf001
00000020:00000080:0.0:1623351596.339972:0:57227:0:(genops.c:451:class_newdev()) Allocate new device oak-OST0135-osc-MDT0000 (ffffa0ba056920f0)
00000020:00000040:0.0:1623351596.339972:0:57227:0:(lustre_handles.c:99:class_handle_hash()) added object ffffa0ab7c744c00 with handle 0x60ebddc04fb89991 to hash
00000020:00000040:0.0:1623351596.339973:0:57227:0:(genops.c:1018:class_export_put()) PUTting export ffffa0ab7c744c00 : new refcount 1
00000100:00000040:3.0:1623351596.339975:0:57124:0:(niobuf.c:905:ptl_send_rpc()) @@@ send flg=0  req@ffffa0ba05689b00 x1702207520042944/t0(0) o8-&amp;gt;oak-OST0134-osc-MDT0000@10.0.2.103@o2ib5:28/4 lens 520/544 e 0 to 0 dl 1623351601 ref 2 fl Rpc:N/0/ffffffff rc 0/-1
00000100:00000040:3.0:1623351596.339978:0:57124:0:(niobuf.c:57:ptl_send_buf()) peer_id 12345-10.0.2.103@o2ib5
00000020:00000080:0.0:1623351596.339993:0:57227:0:(obd_config.c:431:class_attach()) OBD: dev 307 attached type osp with refcount 1
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;and later when it tries to register duplicate OST (this time, it was OST0135), see &lt;span class=&quot;nobr&quot;&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/attachment/39065/39065_oak-md1-s2_dk_config%2Binfo-2.log&quot; title=&quot;oak-md1-s2_dk_config+info-2.log attached to LU-14695&quot;&gt;oak-md1-s2_dk_config+info-2.log&lt;sup&gt;&lt;img class=&quot;rendericon&quot; src=&quot;https://jira.whamcloud.com/images/icons/link_attachment_7.gif&quot; height=&quot;7&quot; width=&quot;7&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/span&gt; :&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;00000020:00000400:14.0:1623352197.724522:0:58995:0:(obd_config.c:1641:class_config_llog_handler()) Skip config outside markers, (inst: 0000000000000000, uuid: , flags: 0x0)
00000020:00000400:14.0:1623352197.739473:0:58995:0:(obd_config.c:1641:class_config_llog_handler()) Skip config outside markers, (inst: 0000000000000000, uuid: , flags: 0x4)
00000020:00000400:14.0:1623352197.739474:0:58995:0:(obd_config.c:1641:class_config_llog_handler()) Skip config outside markers, (inst: 0000000000000000, uuid: , flags: 0x4)
00000020:00020000:14.0:1623352197.739539:0:58995:0:(genops.c:556:class_register_device()) oak-OST0135-osc-MDT0000: already exists, won&apos;t add
00000020:00020000:14.0:1623352197.751872:0:58995:0:(obd_config.c:1835:class_config_llog_handler()) MGC10.0.2.51@o2ib5: cfg command failed: rc = -17
00000020:02000400:14.0:1623352197.764886:0:58995:0:(obd_config.c:2068:class_config_dump_handler())    cmd=cf001 0:oak-OST0135-osc-MDT0000  1:osp  2:oak-MDT0000-mdtlov_UUID  

10000000:00020000:23.0:1623352197.776197:0:57194:0:(mgc_request.c:599:do_requeue()) failed processing log: -17
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="65994">LU-15000</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="38770" name="oak-MDT-CONFIGS-llog.tar" size="2846720" author="sthiell" created="Thu, 20 May 2021 21:37:04 +0000"/>
                            <attachment id="38769" name="oak-MGS-CONFIGS.tar.gz" size="476292" author="sthiell" created="Thu, 20 May 2021 21:42:18 +0000"/>
                            <attachment id="39064" name="oak-md1-s2_dk_config+info-1.log" size="10857649" author="sthiell" created="Tue, 15 Jun 2021 18:09:05 +0000"/>
                            <attachment id="39065" name="oak-md1-s2_dk_config+info-2.log" size="1714685" author="sthiell" created="Tue, 15 Jun 2021 18:09:52 +0000"/>
                            <attachment id="38771" name="servers-logs.txt" size="6360" author="sthiell" created="Thu, 20 May 2021 21:25:15 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i01v3b:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>