<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:48:56 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-5148] OSTs won&apos;t mount following upgrade to 2.4.2</title>
                <link>https://jira.whamcloud.com/browse/LU-5148</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;A production lustre cluster &quot;porter&quot; was upgraded from 2.4.0-28chaos to lustre-2.4.2-11chaos today.  OSTs now will not start.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;# porter1 /root &amp;gt; /etc/init.d/lustre start
Stopping snmpd:                                            [  OK  ]
Shutting down cerebrod:                                    [  OK  ]
Mounting porter1/lse-ost0 on /mnt/lustre/local/lse-OST0001
mount.lustre: mount porter1/lse-ost0 at /mnt/lustre/local/lse-OST0001 failed: Input/output error
Is the MGS running?
# porter1 /root &amp;gt; 
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;


&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Lustre: Lustre: Build Version: 2.4.2-11chaos-11chaos--PRISTINE-2.6.32-431.17.2.1chaos.ch5.2.x86_64
LustreError: 137-5: lse-OST0002_UUID: not available for connect from 192.168.115.67@o2ib10 (no target)
LustreError: 137-5: lse-OST0002_UUID: not available for connect from 192.168.120.38@o2ib7 (no target)
LustreError: 137-5: lse-OST0002_UUID: not available for connect from 192.168.120.101@o2ib7 (no target)
LustreError: 5426:0:(client.c:1053:ptlrpc_import_delay_req()) @@@ send limit expired   req@ffff881026873800 x1470103660003336/t0(0) o253-&amp;gt;MGC172.19.1.165@o2ib100@172.19.1.165@o2ib100:26/25 lens 4768/4768 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
LustreError: 5426:0:(obd_mount_server.c:1140:server_register_target()) lse-OST0001: error registering with the MGS: rc = -5 (not fatal)
LustreError: 137-5: lse-OST0002_UUID: not available for connect from 192.168.116.205@o2ib5 (no target)
LustreError: 137-5: lse-OST0002_UUID: not available for connect from 192.168.114.162@o2ib5 (no target)
LustreError: Skipped 19 previous similar messages
LustreError: 5426:0:(client.c:1053:ptlrpc_import_delay_req()) @@@ send limit expired   req@ffff881026873800 x1470103660003340/t0(0) o101-&amp;gt;MGC172.19.1.165@o2ib100@172.19.1.165@o2ib100:26/25 lens 328/344 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
LustreError: 137-5: lse-OST0002_UUID: not available for connect from 192.168.120.162@o2ib7 (no target)
LustreError: Skipped 23 previous similar messages
LustreError: 5426:0:(client.c:1053:ptlrpc_import_delay_req()) @@@ send limit expired   req@ffff881026873800 x1470103660003344/t0(0) o101-&amp;gt;MGC172.19.1.165@o2ib100@172.19.1.165@o2ib100:26/25 lens 328/344 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
LustreError: 15c-8: MGC172.19.1.165@o2ib100: The configuration from log &apos;lse-OST0001&apos; failed (-5). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
LustreError: 5426:0:(obd_mount_server.c:1273:server_start_targets()) failed to start server lse-OST0001: -5
Lustre: lse-OST0001: Unable to start target: -5
LustreError: 5426:0:(obd_mount_server.c:865:lustre_disconnect_lwp()) lse-MDT0000-lwp-OST0001: Can&apos;t end config log lse-client.
LustreError: 5426:0:(obd_mount_server.c:1442:server_put_super()) lse-OST0001: failed to disconnect lwp. (rc=-2)
LustreError: 5426:0:(obd_mount_server.c:1472:server_put_super()) no obd lse-OST0001
Lustre: server umount lse-OST0001 complete
LustreError: 5426:0:(obd_mount.c:1290:lustre_fill_super()) Unable to mount  (-5)
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;# porter1 /root &amp;gt; lctl ping 172.19.1.165@o2ib100 # &amp;lt;-- MGS NID
12345-0@lo
12345-172.19.1.165@o2ib100
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</description>
                <environment></environment>
        <key id="25041">LU-5148</key>
            <summary>OSTs won&apos;t mount following upgrade to 2.4.2</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="5">Cannot Reproduce</resolution>
                                        <assignee username="hongchao.zhang">Hongchao Zhang</assignee>
                                    <reporter username="nedbass">Ned Bass</reporter>
                        <labels>
                            <label>llnl</label>
                    </labels>
                <created>Thu, 5 Jun 2014 20:30:48 +0000</created>
                <updated>Mon, 18 Jul 2016 21:50:19 +0000</updated>
                            <resolved>Fri, 29 Apr 2016 00:38:07 +0000</resolved>
                                                                        <due></due>
                            <votes>0</votes>
                                    <watches>8</watches>
                                                                            <comments>
                            <comment id="85911" author="nedbass" created="Thu, 5 Jun 2014 20:52:57 +0000"  >&lt;p&gt;Attached -1 debug logs from OSS &lt;span class=&quot;nobr&quot;&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/attachment/15120/15120_lustre.log.porter1.1402001330.gz&quot; title=&quot;lustre.log.porter1.1402001330.gz attached to LU-5148&quot;&gt;lustre.log.porter1.1402001330.gz&lt;sup&gt;&lt;img class=&quot;rendericon&quot; src=&quot;https://jira.whamcloud.com/images/icons/link_attachment_7.gif&quot; height=&quot;7&quot; width=&quot;7&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/span&gt; and MDS &lt;span class=&quot;nobr&quot;&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/attachment/15121/15121_lustre.log.porter-mds1.1402001323.gz&quot; title=&quot;lustre.log.porter-mds1.1402001323.gz attached to LU-5148&quot;&gt;lustre.log.porter-mds1.1402001323.gz&lt;sup&gt;&lt;img class=&quot;rendericon&quot; src=&quot;https://jira.whamcloud.com/images/icons/link_attachment_7.gif&quot; height=&quot;7&quot; width=&quot;7&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/span&gt; for a failed OSS mount.&lt;/p&gt;</comment>
                            <comment id="85989" author="bfaccini" created="Fri, 6 Jun 2014 09:08:02 +0000"  >&lt;p&gt;Hello Ned,&lt;br/&gt;
I had a quick look to the logs you provided, and it seems to me that at least some MGS requests from OSS, have readched it but are not handled (trashed?) by MGS. And some others requests are delayed until time-out (with EIO/-5 ending error code) due to MGC import being in CONNECTING state.&lt;br/&gt;
Could you also check MDS/MGS dmesg/syslog and see if no interesting msg/issue can be found there ?&lt;/p&gt;</comment>
                            <comment id="85995" author="pjones" created="Fri, 6 Jun 2014 12:23:26 +0000"  >&lt;p&gt;Hongchao&lt;/p&gt;

&lt;p&gt;Is there anything else that you can add?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="86010" author="nedbass" created="Fri, 6 Jun 2014 14:23:09 +0000"  >&lt;p&gt;Hi Bruno,&lt;br/&gt;
I&apos;m not at work today, but as I recall some mgs service threads were delayed on startup.  Inactivity watchdogs logged stack traces waiting in zfs in &lt;tt&gt;txg_wait_synced&lt;/tt&gt; or something like that.  So its possible the OSTs tried to connect before the MGS was fully initialized, then the failed connections weren&apos;t properly cleaned up.  Just speculating here.&lt;/p&gt;</comment>
                            <comment id="86011" author="nedbass" created="Fri, 6 Jun 2014 14:27:04 +0000"  >&lt;p&gt;Also, I was finally able to get the OSTs to mount by unmounting and remounting the MDT, but leaving the MGT (which is a separate dataset) mounted.&lt;/p&gt;</comment>
                            <comment id="86014" author="hongchao.zhang" created="Fri, 6 Jun 2014 14:40:43 +0000"  >&lt;p&gt;there is no &quot;MGS_CONNECT&quot; request found in MGS/MGS in the log &quot;lustre.log.porter-mds1.1402001323.gz&quot;, and there is even no&lt;br/&gt;
&quot;mgs_xxx&quot; logs in the MGS/MDS log file (there should be some &quot;ENTRY&quot;, &quot;RETURN&quot; logs at least),  then the MGS should be stuck&lt;br/&gt;
in some way.&lt;/p&gt;

&lt;p&gt;Hi Ned, could you please attach the logs containing the stack traces of the mgs service mentioned above, thanks!&lt;/p&gt;</comment>
                            <comment id="86159" author="nedbass" created="Mon, 9 Jun 2014 21:21:22 +0000"  >&lt;p&gt;Hongchao Zhang, I&apos;ve attached the MDS console log: &lt;span class=&quot;nobr&quot;&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/attachment/15130/15130_porter-mds1.console.txt&quot; title=&quot;porter-mds1.console.txt attached to LU-5148&quot;&gt;porter-mds1.console.txt&lt;sup&gt;&lt;img class=&quot;rendericon&quot; src=&quot;https://jira.whamcloud.com/images/icons/link_attachment_7.gif&quot; height=&quot;7&quot; width=&quot;7&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/span&gt;.&lt;/p&gt;</comment>
                            <comment id="86296" author="hongchao.zhang" created="Wed, 11 Jun 2014 03:40:12 +0000"  >&lt;p&gt;there is a similar issue of ZFS in &lt;a href=&quot;https://github.com/zfsonlinux/zfs/issues/542&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/zfsonlinux/zfs/issues/542&lt;/a&gt;, what is the version of ZFS installed on your site?&lt;/p&gt;</comment>
                            <comment id="86377" author="morrone" created="Wed, 11 Jun 2014 21:21:02 +0000"  >&lt;p&gt;The version of ZFS installed at our site is quite a bit newer than 0.6.0.*.  Our version of ZFS is very close to the tip of master, and what will soon be tagged as 0.6.3.&lt;/p&gt;</comment>
                            <comment id="87574" author="hongchao.zhang" created="Thu, 26 Jun 2014 15:56:02 +0000"  >&lt;p&gt;Hi&lt;/p&gt;

&lt;p&gt;Could you please print the actual code lines of the following address,&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;2014-06-05 11:25:55  [&amp;lt;ffffffffa0db44b4&amp;gt;] mgs_ir_update+0x244/0xb00 [mgs]
2014-06-05 11:25:55  [&amp;lt;ffffffffa0d9287c&amp;gt;] mgs_handle_target_reg+0x40c/0xe30 [mgs]
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;And I can find the related codes in these functions in &lt;a href=&quot;https://github.com/chaos/lustre/blob/2.4.2-11chaos/lustre/mgs/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/chaos/lustre/blob/2.4.2-11chaos/lustre/mgs/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Thanks very much!&lt;/p&gt;</comment>
                            <comment id="87648" author="nedbass" created="Thu, 26 Jun 2014 23:53:27 +0000"  >&lt;p&gt;Here you go.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;(gdb) l *(mgs_handle_target_reg+0x40c)
0x18ac is in mgs_handle_target_reg (/usr/src/debug/lustre-2.4.2/lustre/mgs/mgs_handler.c:322).
317
318             if (opc == LDD_F_OPC_READY) {
319                     CDEBUG(D_MGS, &quot;fs: %s index: %d is ready to reconnect.\n&quot;,
320                            mti-&amp;gt;mti_fsname, mti-&amp;gt;mti_stripe_index);
321                     rc = mgs_ir_update(env, mgs, mti);
322                     if (rc) {
323                             LASSERT(!(mti-&amp;gt;mti_flags &amp;amp; LDD_F_IR_CAPABLE));
324                             CERROR(&quot;Update IR return with %d(ignore and IR &quot;
325                                    &quot;disabled)\n&quot;, rc);
326                     }
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;(gdb) l *( mgs_ir_update+0x244 )
0x234e4 is in mgs_ir_update (/usr/src/debug/lustre-2.4.2/lustre/mgs/mgs_nids.c:270).
265             rc = dt_record_write(env, fsdb, &amp;amp;buf, &amp;amp;off, th);
266
267     out:
268             dt_trans_stop(env, mgs-&amp;gt;mgs_bottom, th);
269     out_put:
270             lu_object_put(env, &amp;amp;fsdb-&amp;gt;do_lu);
271             RETURN(rc);
272     }
273
274     #define MGS_NIDTBL_VERSION_INIT 2
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
</comment>
                            <comment id="87679" author="hongchao.zhang" created="Fri, 27 Jun 2014 11:34:38 +0000"  >&lt;p&gt;Hi, &lt;/p&gt;

&lt;p&gt;Could you please try the debug patch at &lt;a href=&quot;http://review.whamcloud.com/#/c/10869/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/10869/&lt;/a&gt; to check whether this issue occurs again?&lt;/p&gt;

&lt;p&gt;Thanks very much!&lt;/p&gt;</comment>
                            <comment id="87845" author="pjones" created="Mon, 30 Jun 2014 22:00:47 +0000"  >&lt;p&gt;Hongchao&lt;/p&gt;

&lt;p&gt;Could you please elaborate as to how this patch works?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="87866" author="hongchao.zhang" created="Tue, 1 Jul 2014 08:19:23 +0000"  >&lt;p&gt;this debug patch changed the IR (Imperative Recovery) operations in MGS to update asynchronously, and if the issue won&apos;t occur again, we can isolate the problem&lt;br/&gt;
as related to the slow synchronization of ZFS, just as the problem shown in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2887&quot; title=&quot;sanity-quota test_12a: slow due to ZFS VMs sharing single disk&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2887&quot;&gt;&lt;del&gt;LU-2887&lt;/del&gt;&lt;/a&gt;, then we can try to create corresponding patches to fix it.&lt;br/&gt;
Thanks.&lt;/p&gt;</comment>
                            <comment id="89001" author="green" created="Tue, 15 Jul 2014 03:19:09 +0000"  >&lt;p&gt;Looking at the stacktraces in the logs, it seems everybody is either blocked on the transaction commit wait inside zfs or on the semaphore that is held by somebody that waits on the transaction commit.&lt;/p&gt;

&lt;p&gt;So it really looks like some sort of in-zfs wait for me. There&apos;s no dump of all threads stacks in here, so I wonder if you have one where it is visible there&apos;s lustre induced deadlock of some sort above zfs?&lt;/p&gt;</comment>
                            <comment id="92329" author="cliffw" created="Mon, 25 Aug 2014 17:38:05 +0000"  >&lt;p&gt;I attempted to reproduce this on Hyperion, by starting with 2.4.0 and upgrading to 2.4.2 after running some IO tests. I could not create the failure, however was using the whamcloud 2.4.2 release, which may be different&lt;/p&gt;</comment>
                            <comment id="150522" author="jfc" created="Fri, 29 Apr 2016 00:33:13 +0000"  >&lt;p&gt;Hello Ned,&lt;/p&gt;

&lt;p&gt;Do you have any update for us on this elderly ticket? Has this issue been resolved by use of later versions, for example?&lt;/p&gt;

&lt;p&gt;We would like to mark it as resolved, if you have no objection?&lt;/p&gt;

&lt;p&gt;Thanks,&lt;br/&gt;
~ jfc.&lt;/p&gt;
</comment>
                            <comment id="150523" author="nedbass" created="Fri, 29 Apr 2016 00:38:07 +0000"  >&lt;p&gt;Closing as stale.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="17731">LU-2887</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="15121" name="lustre.log.porter-mds1.1402001323.gz" size="267" author="nedbass" created="Thu, 5 Jun 2014 20:52:57 +0000"/>
                            <attachment id="15120" name="lustre.log.porter1.1402001330.gz" size="4428835" author="nedbass" created="Thu, 5 Jun 2014 20:52:57 +0000"/>
                            <attachment id="15130" name="porter-mds1.console.txt" size="248825" author="nedbass" created="Mon, 9 Jun 2014 21:21:22 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzwntr:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>14209</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>