<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:40:00 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-4136] MDT temporarily unhealthy when restarting</title>
                <link>https://jira.whamcloud.com/browse/LU-4136</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Hi,&lt;/p&gt;

&lt;p&gt;When restarting an MDT, we consistently see that its status under /proc/fs/lustre/health_check is temporarily unhealthy.&lt;/p&gt;

&lt;p&gt;Here are some logs:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;00000004:02000400:1.0F:Tue Oct 22 15:23:52 CEST 2013:0:11263:0:(mdt_recovery.c:233:mdt_server_data_init()) fs1-MDT0000: used disk, loading
00000020:02000000:8.0F:Tue Oct 22 15:23:52 CEST 2013:0:11086:0:(obd_mount_server.c:1776:server_calc_timeout()) fs1-MDT0000: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-450
00000100:00000400:4.0:Tue Oct 22 15:23:57 CEST 2013:0:5640:0:(client.c:1869:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1382448232/real 0]  req@ffff8810789a9000 x1449588289516076/t0(0) o38-&amp;gt;fs1-MDT0000-lwp-MDT0000@10.3.0.11@o2ib:12/10 lens 400/544 e 0 to 1 dl 1382448237 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
00000100:00000400:4.0:Tue Oct 22 15:23:57 CEST 2013:0:5640:0:(client.c:1869:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1382448232/real 0]  req@ffff881071d6f400 x1449588289513784/t0(0) o8-&amp;gt;fs1-OST0006-osc-MDT0000@10.4.0.6@o2ib1:28/4 lens 400/544 e 0 to 1 dl 1382448237 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
00000100:00000400:4.0:Tue Oct 22 15:23:57 CEST 2013:0:5640:0:(client.c:1869:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1382448232/real 0]  req@ffff8808653cb800 x1449588289513644/t0(0) o8-&amp;gt;fs1-OST0005-osc-MDT0000@10.3.0.6@o2ib:28/4 lens 400/544 e 0 to 1 dl 1382448237 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
00000100:00000400:1.0:Tue Oct 22 15:23:59 CEST 2013:0:6207:0:(client.c:1869:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1382448232/real 0]  req@ffff880876dbc000 x1449588289516292/t0(0) o104-&amp;gt;MGS@10.3.0.11@o2ib:15/16 lens 296/224 e 0 to 1 dl 1382448239 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
00010000:02020000:1.0:Tue Oct 22 15:23:59 CEST 2013:0:6207:0:(ldlm_lockd.c:641:ldlm_failed_ast()) 138-a: MGS: A client on nid 10.3.0.11@o2ib was evicted due to a lock blocking callback time out: rc -107
00000100:02000000:9.0F:Tue Oct 22 15:24:18 CEST 2013:0:5640:0:(import.c:1407:ptlrpc_import_recovery_state_machine()) fs1-OST0006-osc-MDT0000: Connection restored to fs1-OST0006 (at 10.4.0.3@o2ib1)
00010000:02000400:5.0F:Tue Oct 22 15:24:29 CEST 2013:0:11219:0:(ldlm_lib.c:1581:target_start_recovery_timer()) fs1-MDT0000: Will be in recovery for at least 2:30, or until 1 client reconnects
00010000:02000000:3.0F:Tue Oct 22 15:24:29 CEST 2013:0:11285:0:(ldlm_lib.c:1420:target_finish_recovery()) fs1-MDT0000: Recovery over after 0:01, of 1 clients 1 recovered and 0 were evicted.
00000100:02000000:9.0:Tue Oct 22 15:24:55 CEST 2013:0:5640:0:(import.c:1407:ptlrpc_import_recovery_state_machine()) fs1-OST0005-osc-MDT0000: Connection restored to fs1-OST0005 (at 10.3.0.3@o2ib)
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;As we can see, as soon as MDT is started it has troubles connecting to several OSTs. Moreover a recovery is beginning, but it finishes soon. However, the MDT becomes healthy only when the connection to all OSTs is restored, ie at 15:24:55. Indeed, from 15:23:52 when it is started to 15:24:55 when the connection to last OST is restored, the MDT is reporting unhealthy status.&lt;/p&gt;

&lt;p&gt;We can understand that an MDT that has not been able to connect to its OSTs is unhealthy, but we do not understand why it has troubles connecting to them, as there are no errors on the network.&lt;br/&gt;
It seems that with Lustre 2.4 the connection between MDT and OSTs is hard to establish, and takes some time before being restored (we have other examples where it took more than 2 minutes to do so).&lt;/p&gt;

&lt;p&gt;The problem with this situation is that we monitor Lustre MDT and OST health status for HA purpose. If a target is seen as unhealthy, the node hosting this resource can be fenced.&lt;/p&gt;

&lt;p&gt;Thanks,&lt;br/&gt;
Sebastien.&lt;/p&gt;</description>
                <environment></environment>
        <key id="21604">LU-4136</key>
            <summary>MDT temporarily unhealthy when restarting</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="bobijam">Zhenyu Xu</assignee>
                                    <reporter username="sebastien.buisson">Sebastien Buisson</reporter>
                        <labels>
                            <label>mn4</label>
                    </labels>
                <created>Wed, 23 Oct 2013 17:04:47 +0000</created>
                <updated>Tue, 31 Dec 2013 15:42:10 +0000</updated>
                            <resolved>Mon, 23 Dec 2013 21:27:06 +0000</resolved>
                                    <version>Lustre 2.4.1</version>
                                    <fixVersion>Lustre 2.6.0</fixVersion>
                    <fixVersion>Lustre 2.5.1</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>9</watches>
                                                                            <comments>
                            <comment id="69703" author="pjones" created="Thu, 24 Oct 2013 00:06:15 +0000"  >&lt;p&gt;Bobijam&lt;/p&gt;

&lt;p&gt;Could you please comment on this one?&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="69704" author="green" created="Thu, 24 Oct 2013 00:10:56 +0000"  >&lt;p&gt;Also it would be useful if you get a debug log from mds when this happens with some increased debug to see what&apos;s going on, as otherwise it&apos;s just wild unsubstantiated guessing.&lt;/p&gt;</comment>
                            <comment id="69730" author="sebastien.buisson" created="Thu, 24 Oct 2013 10:57:02 +0000"  >&lt;p&gt;Here is the full MDS debug log of a new occurrence of the problem.&lt;/p&gt;

&lt;p&gt;At 10:00:03 the MDT begins loading. The connection to OST0004 times out at 10:00:08, and it is restored only at 10:01:18.&lt;/p&gt;</comment>
                            <comment id="69872" author="bobijam" created="Fri, 25 Oct 2013 09:54:21 +0000"  >&lt;p&gt;In the log, during attach osp fs1-OST0004-osc-MDT0000 period, it was using 10.3.0.5@o2ib&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;00010000:00080000:3.0:Thu Oct 24 10:00:03 CEST 2013:0:6741:0:(ldlm_lib.c:134:import_set_conn()) imp ffff881064b85800@fs1-OST0004-osc-MDT0000: add connection 10.3.0.5@o2ib at tail
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;which I guess is not correct. &lt;font color=&quot;blue&quot;&gt;(Can you pst the nid of OST0003 and OST0004, also their mkfs/mount options here?)&lt;/font&gt;&lt;/p&gt;

&lt;p&gt;And later it process llog record get from MGS, it added another correct nid of OST0004&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;00000040:00001000:4.0:Thu Oct 24 10:00:03 CEST 2013:0:6745:0:(llog.c:386:llog_process_thread()) skipping lrh_index 53
00010000:00080000:3.0:Thu Oct 24 10:00:03 CEST 2013:0:6741:0:(ldlm_lib.c:134:import_set_conn()) imp ffff881064b85800@fs1-OST0004-osc-MDT0000: add connection 10.3.0.6@o2ib at tail
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;It used the 10.3.0.5@o2ib to connect OST0004 at first&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;00000100:00080000:3.0:Thu Oct 24 10:00:03 CEST 2013:0:6741:0:(import.c:625:ptlrpc_connect_import()) ffff881064b85800 fs1-OST0004_UUID: changing import state from NEW to CONNECTING
00000100:00000001:3.0:Thu Oct 24 10:00:03 CEST 2013:0:6741:0:(import.c:467:import_select_connection()) Process entered
00000100:00080000:3.0:Thu Oct 24 10:00:03 CEST 2013:0:6741:0:(import.c:482:import_select_connection()) fs1-OST0004-osc-MDT0000: connect to NID 10.3.0.5@o2ib last attempt 0
00000100:00080000:3.0:Thu Oct 24 10:00:03 CEST 2013:0:6741:0:(import.c:560:import_select_connection()) fs1-OST0004-osc-MDT0000: import ffff881064b85800 using connection 10.3.0.5@o2ib/10.3.0.5@o2ib
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;and failed&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;00000100:00000400:8.0:Thu Oct 24 10:00:08 CEST 2013:0:5640:0:(client.c:1869:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1382601603/real 0]  req@ffff881064bfc400 x1449588289650924/t0(0) o8-&amp;gt;fs1-OST0004-osc-MDT0000@10.3.0.5@o2ib:28/4 lens 400/544 e 0 to 1 dl 1382601608 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;after several failure, it tries the correct nid&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;00000100:00080000:17.0:Thu Oct 24 10:00:35 CEST 2013:0:5673:0:(import.c:482:import_select_connection()) fs1-OST0004-osc-MDT0000: connect to NID 10.3.0.5@o2ib last attempt 4461368396
00000100:00080000:17.0:Thu Oct 24 10:00:35 CEST 2013:0:5673:0:(import.c:482:import_select_connection()) fs1-OST0004-osc-MDT0000: connect to NID 10.3.0.6@o2ib last attempt 0
00000100:00080000:17.0:Thu Oct 24 10:00:35 CEST 2013:0:5673:0:(import.c:552:import_select_connection()) fs1-OST0004-osc-MDT0000: Connection changing to fs1-OST0004 (at 10.3.0.6@o2ib)
00000100:00080000:17.0:Thu Oct 24 10:00:35 CEST 2013:0:5673:0:(import.c:560:import_select_connection()) fs1-OST0004-osc-MDT0000: import ffff881064b85800 using connection 10.3.0.5@o2ib/10.3.0.6@o2ib
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;and succeeded&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;00000100:00080000:8.0:Thu Oct 24 10:00:35 CEST 2013:0:5640:0:(import.c:816:ptlrpc_connect_interpret()) fs1-OST0004-osc-MDT0000: connect to target with instance 23
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;My guess is that it could be related to failnode parameters order.&lt;/p&gt;</comment>
                            <comment id="69882" author="kalpak" created="Fri, 25 Oct 2013 12:01:00 +0000"  >&lt;p&gt;We have also seen this issue with Lustre 2.4.1. We updated the HA scripts to ignore some of the unhealthy errors that are seen after MDT mount or restart.&lt;/p&gt;

&lt;p&gt;As Sebastien has pointed out, these unhealthy warnings are seen until the first OST connects with the MDS. &lt;/p&gt;</comment>
                            <comment id="70854" author="sebastien.buisson" created="Wed, 6 Nov 2013 15:59:40 +0000"  >&lt;p&gt;Hi,&lt;/p&gt;

&lt;p&gt;Here are the information regarding OST0003 and OST0004 targets:&lt;/p&gt;

&lt;p&gt;OST0003:&lt;br/&gt;
nid: 10.3.0.4@o2ib0&lt;br/&gt;
mkfs options: mgsnode=10.3.0.10@o2ib mgsnode=10.4.0.10@o2ib1 mgsnode=10.3.0.11@o2ib mgsnode=10.4.0.11@o2ib1 failover.node=10.3.0.5@o2ib failover.node=10.3.0.6@o2ib failover.node=10.3.0.3@o2ib network=o2ib0&lt;/p&gt;

&lt;p&gt;OST0004:&lt;br/&gt;
nid:10.3.0.5@o2ib0&lt;br/&gt;
mkfs options: mgsnode=10.3.0.10@o2ib mgsnode=10.4.0.10@o2ib1 mgsnode=10.3.0.11@o2ib mgsnode=10.4.0.11@o2ib1 failover.node=10.3.0.6@o2ib failover.node=10.3.0.3@o2ib failover.node=10.3.0.4@o2ib network=o2ib0&lt;/p&gt;



&lt;p&gt;I have carried out some more tests, and this issue seems to be related to recovery.&lt;/p&gt;


&lt;p&gt;Here is what happens when we try to start the MDT (debug log extract from MDS node):&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;00010000:00080000:1.0:Wed Nov  6 13:52:55 CET 2013:0:28549:0:(ldlm_lib.c:134:import_set_conn()) imp ffff88085b658800@fs1-OST0000-osc-MDT0000: add connection 10.4.0.3@o2ib1 at tail
00010000:00080000:1.0:Wed Nov  6 13:52:55 CET 2013:0:28549:0:(ldlm_lib.c:134:import_set_conn()) imp ffff88085b658800@fs1-OST0000-osc-MDT0000: add connection 10.4.0.3@o2ib1 at tail
00010000:00080000:1.0:Wed Nov  6 13:52:55 CET 2013:0:28549:0:(ldlm_lib.c:134:import_set_conn()) imp ffff88085b658800@fs1-OST0000-osc-MDT0000: add connection 10.4.0.3@o2ib1 at tail
00010000:00080000:1.0:Wed Nov  6 13:52:55 CET 2013:0:28549:0:(ldlm_lib.c:134:import_set_conn()) imp ffff88085b658800@fs1-OST0000-osc-MDT0000: add connection 10.4.0.4@o2ib1 at tail
00010000:00080000:1.0:Wed Nov  6 13:52:55 CET 2013:0:28549:0:(ldlm_lib.c:134:import_set_conn()) imp ffff88085b658800@fs1-OST0000-osc-MDT0000: add connection 10.4.0.4@o2ib1 at tail
00010000:00080000:1.0:Wed Nov  6 13:52:55 CET 2013:0:28549:0:(ldlm_lib.c:134:import_set_conn()) imp ffff88085b658800@fs1-OST0000-osc-MDT0000: add connection 10.4.0.4@o2ib1 at tail
00010000:00080000:1.0:Wed Nov  6 13:52:55 CET 2013:0:28549:0:(ldlm_lib.c:134:import_set_conn()) imp ffff88085b658800@fs1-OST0000-osc-MDT0000: add connection 10.4.0.4@o2ib1 at tail
00010000:00080000:1.0:Wed Nov  6 13:52:55 CET 2013:0:28549:0:(ldlm_lib.c:134:import_set_conn()) imp ffff88085b658800@fs1-OST0000-osc-MDT0000: add connection 10.4.0.4@o2ib1 at tail
00000100:00080000:1.0:Wed Nov  6 13:52:55 CET 2013:0:28549:0:(import.c:625:ptlrpc_connect_import()) ffff88085b658800 fs1-OST0000_UUID: changing import state from NEW to CONNECTING
00000100:00080000:1.0:Wed Nov  6 13:52:55 CET 2013:0:28549:0:(import.c:482:import_select_connection()) fs1-OST0000-osc-MDT0000: connect to NID 10.4.0.3@o2ib1 last attempt 0
00000100:00080000:1.0:Wed Nov  6 13:52:55 CET 2013:0:28549:0:(import.c:560:import_select_connection()) fs1-OST0000-osc-MDT0000: import ffff88085b658800 using connection 10.4.0.3@o2ib1/10.4.0.3@o2ib1
00000100:00000400:17.0:Wed Nov  6 13:53:00 CET 2013:0:28448:0:(client.c:1869:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1383742375/real 0]  req@ffff88085b661c00 x1450248722566572/t0(0) o8-&amp;gt;fs1-OST0000-osc-MDT0000@10.4.0.3@o2ib1:28/4 lens 400/544 e 0 to 1 dl 1383742380 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
00000100:00080000:17.0:Wed Nov  6 13:53:00 CET 2013:0:28448:0:(import.c:1141:ptlrpc_connect_interpret()) ffff88085b658800 fs1-OST0000_UUID: changing import state from CONNECTING to DISCONN
00000100:00080000:17.0:Wed Nov  6 13:53:00 CET 2013:0:28448:0:(import.c:1187:ptlrpc_connect_interpret()) recovery of fs1-OST0000_UUID on 10.4.0.3@o2ib1 failed (-110)
00000100:00080000:4.0:Wed Nov  6 13:53:02 CET 2013:0:28259:0:(import.c:625:ptlrpc_connect_import()) ffff88085b658800 fs1-OST0000_UUID: changing import state from DISCONN to CONNECTING
00000100:00080000:4.0:Wed Nov  6 13:53:02 CET 2013:0:28259:0:(import.c:482:import_select_connection()) fs1-OST0000-osc-MDT0000: connect to NID 10.4.0.4@o2ib1 last attempt 0
00000100:00080000:4.0:Wed Nov  6 13:53:02 CET 2013:0:28259:0:(import.c:552:import_select_connection()) fs1-OST0000-osc-MDT0000: Connection changing to fs1-OST0000 (at 10.4.0.4@o2ib1)
00000100:00080000:4.0:Wed Nov  6 13:53:02 CET 2013:0:28259:0:(import.c:560:import_select_connection()) fs1-OST0000-osc-MDT0000: import ffff88085b658800 using connection 10.4.0.3@o2ib1/10.4.0.4@o2ib1
00000100:00080000:21.0:Wed Nov  6 13:53:02 CET 2013:0:28448:0:(import.c:816:ptlrpc_connect_interpret()) fs1-OST0000-osc-MDT0000: connect to target with instance 9
00000100:00080000:21.0:Wed Nov  6 13:53:02 CET 2013:0:28448:0:(import.c:851:ptlrpc_connect_interpret()) connected to replayable target: fs1-OST0000_UUID
00000100:00080000:21.0:Wed Nov  6 13:53:02 CET 2013:0:28448:0:(import.c:868:ptlrpc_connect_interpret()) connect to fs1-OST0000_UUID during recovery
00000100:00080000:21.0:Wed Nov  6 13:53:02 CET 2013:0:28448:0:(import.c:869:ptlrpc_connect_interpret()) ffff88085b658800 fs1-OST0000_UUID: changing import state from CONNECTING to REPLAY_LOCKS
00000100:00080000:21.0:Wed Nov  6 13:53:02 CET 2013:0:28448:0:(import.c:1377:ptlrpc_import_recovery_state_machine()) ffff88085b658800 fs1-OST0000_UUID: changing import state from REPLAY_LOCKS to REPLAY_WAIT
00000100:00080000:21.0:Wed Nov  6 13:53:02 CET 2013:0:28448:0:(import.c:1106:ptlrpc_connect_interpret()) fs1-OST0000-osc-MDT0000: Resetting ns_connect_flags to server flags: 0x401443000066
00000100:00080000:13.0:Wed Nov  6 13:54:08 CET 2013:0:28448:0:(import.c:1387:ptlrpc_import_recovery_state_machine()) ffff88085b658800 fs1-OST0000_UUID: changing import state from REPLAY_WAIT to RECOVER
00000100:00080000:13.0:Wed Nov  6 13:54:08 CET 2013:0:28448:0:(import.c:1399:ptlrpc_import_recovery_state_machine()) ffff88085b658800 fs1-OST0000_UUID: changing import state from RECOVER to FULL
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;


&lt;p&gt;See a syslog extract from an OSS:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;1383742374 2013 Nov  6 13:52:54 lama8 kern info kernel Lustre: fs1-OST0000: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-450
1383742381 2013 Nov  6 13:53:01 lama8 kern warning kernel Lustre: fs1-OST0000: Will be in recovery for at least 2:30, or until 2 clients reconnect
1383742448 2013 Nov  6 13:54:08 lama8 kern info kernel Lustre: fs1-OST0000: Recovery over after 1:07, of 2 clients 2 recovered and 0 were evicted.
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;


&lt;p&gt;And we observed that the device fs1-OST0000-osc-MDT0000 on the MDS was unhealthy between 13:53:02 and 13:54:13.&lt;br/&gt;
So when the MDT starts, it tries to connect to an OST that unfortunately starts in recovery mode, and this leads the MDT to be in unhealthy state as long as the OST has not finished its recovery.&lt;/p&gt;

&lt;p&gt;Note that if the MDT is up and running while the OST enters recovery, the MDT is never seen unhealthy.&lt;/p&gt;


&lt;p&gt;Do you agree with this analysis?&lt;br/&gt;
Basically, what has changed in Lustre 2.4 compared to 2.1 so that a starting MDT is marked unhealthy if the targets it tries to connect to are in recovery?&lt;/p&gt;

&lt;p&gt;Thanks,&lt;br/&gt;
Sebastien.&lt;/p&gt;</comment>
                            <comment id="70957" author="bobijam" created="Thu, 7 Nov 2013 08:08:26 +0000"  >&lt;p&gt;Did it reveal what device reported unhealthy? The reading of /proc/fs/lustre/health_check should shows something like &quot;device XXXX reported unhealthy&quot;.&lt;/p&gt;</comment>
                            <comment id="70958" author="sebastien.buisson" created="Thu, 7 Nov 2013 08:19:30 +0000"  >&lt;p&gt;Yes sure:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;device fs1-OST0000-osc-MDT0000 reported unhealthy
NOT HEALTHY
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="71137" author="bobijam" created="Fri, 8 Nov 2013 17:24:10 +0000"  >&lt;p&gt;I agree your analysis, the MDS is unhealthy waiting for its connection to in-recovery OST.&lt;/p&gt;</comment>
                            <comment id="71303" author="sebastien.buisson" created="Tue, 12 Nov 2013 09:16:15 +0000"  >&lt;p&gt;In your opinion, is this the expected behavior?&lt;br/&gt;
Could it be fixed?&lt;/p&gt;</comment>
                            <comment id="71306" author="bobijam" created="Tue, 12 Nov 2013 09:55:37 +0000"  >&lt;p&gt;IMO it&apos;s expected behavior, MDS can not claim healthy until it connects all registered OSTs.&lt;/p&gt;</comment>
                            <comment id="71308" author="sebastien.buisson" created="Tue, 12 Nov 2013 09:59:06 +0000"  >&lt;p&gt;The fact is behavior does not seem consistent, because if MDT is up and running while the OST enters recovery, the MDT is never seen unhealthy.&lt;/p&gt;</comment>
                            <comment id="71309" author="bobijam" created="Tue, 12 Nov 2013 10:15:42 +0000"  >&lt;p&gt;I think it&apos;s 2.3.50 (git commit 1d371ca4) added additional health check, which considers that OST being connected at least once as &quot;healthy&quot;.&lt;/p&gt;</comment>
                            <comment id="72049" author="sebastien.buisson" created="Thu, 21 Nov 2013 16:44:13 +0000"  >&lt;p&gt;Hi,&lt;/p&gt;

&lt;p&gt;Do you think you could be able to make changes in the current code so that an MDS waiting for its connection to in-recovery OSTs does not report an unhealthy status?&lt;br/&gt;
This is very annoying when using an HA software on top of Lustre.&lt;/p&gt;

&lt;p&gt;Sebastien.&lt;/p&gt;</comment>
                            <comment id="72087" author="bobijam" created="Fri, 22 Nov 2013 00:42:10 +0000"  >&lt;p&gt;Tappro, what do you think about this?&lt;/p&gt;</comment>
                            <comment id="72160" author="tappro" created="Fri, 22 Nov 2013 18:58:04 +0000"  >&lt;p&gt;Well, as osp_obd_health_check() says:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;	/*
	 * 1.8/2.0 behaviour is that OST being connected once at least
	 * is considired &lt;span class=&quot;code-quote&quot;&gt;&quot;healthy&quot;&lt;/span&gt;. and one &lt;span class=&quot;code-quote&quot;&gt;&quot;healty&quot;&lt;/span&gt; OST is enough to
	 * allow lustre clients to connect to MDS
	 */
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;as it behaves. If that is definition of &apos;healthy&apos; for MDT then it works correctly. If MDT restarts then no OST were &apos;connected once&apos; yet, so it will wait for first OST connection to be established. Of course after that if some OST will go offline the MDT will remain healthy because that OST was seen &apos;once&apos; already. That is exactly behavior described in comments above so I tend to think it is correct. If before that was not so then either that was incorrect behavior or definition of &apos;healthy MDT&apos; in osp_obd_health_check() is not quite true.&lt;/p&gt;</comment>
                            <comment id="72201" author="malkolm" created="Sun, 24 Nov 2013 20:43:52 +0000"  >&lt;p&gt;Assuming then that the current behaviour is correct, is there a reliable &lt;span class=&quot;error&quot;&gt;&amp;#91;i.e. programmatic&amp;#93;&lt;/span&gt; way to differentiate between a genuinely &quot;unhealthy&quot; MDT and one that is pending connections from an OST? Monitoring health check scripts and HA resource management scripts are somewhat dependent upon a reliable status indicator. If one can be described, then an alternative monitoring probe can be implemented.&lt;/p&gt;</comment>
                            <comment id="72216" author="tappro" created="Mon, 25 Nov 2013 11:50:32 +0000"  >&lt;p&gt;Malcolm, as Kalpak noted the best way right now is to ignore &apos;not healthy&apos; report from devices like MDD, LOD and OSP. The problem is we are using their o_health_check() functionality to report network status of OSTs for the MDT to decide when it should accept connections. At the same time the obd_proc_read_health() scans all OBD devices and call o_health_check to report their health status. It is not quite the same, because for proc_read_health() we expect to see status of devices themself more than network related things. I am not sure how to properly report different status to MDT and to procfs, one possible solution would be just ignoring MDD, LOD and OSP on obd_proc_read_health() itself considering they are internal devices in MDS stack and if MDT (top) and OSD (bottom) devices are healthy then all device in-between are healthy too.&lt;/p&gt;

&lt;p&gt;Zhenyu Xu, could you prepare such patch and push it to gerrit as first step?&lt;/p&gt;</comment>
                            <comment id="72224" author="tappro" created="Mon, 25 Nov 2013 14:29:19 +0000"  >&lt;p&gt;Another possible solution for this is to avoid the using of o_health_check() for reporting network status to MDT. That may be replaced with o_get_info(), so it will not interfere with health_check&lt;/p&gt;</comment>
                            <comment id="72370" author="bobijam" created="Wed, 27 Nov 2013 04:36:45 +0000"  >&lt;p&gt;patch tracking at &lt;a href=&quot;http://review.whamcloud.com/8408&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/8408&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="72468" author="sebastien.buisson" created="Thu, 28 Nov 2013 08:36:02 +0000"  >&lt;p&gt;Hi,&lt;/p&gt;

&lt;p&gt;We gave a try to the patch at &lt;a href=&quot;http://review.whamcloud.com/8408&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/8408&lt;/a&gt;, and with it we do not see the MDT unhealthy anymore when starting while OSTs are in recovery.&lt;br/&gt;
So this is very good news for our HA setup, but now with this patch we need to know under which circumstances a Lustre target can be declared unhealthy. What can lead to an MDT or OST being not healthy? Is it still worth examaning /proc/fs/lustre/health_check contents in HA context?&lt;/p&gt;

&lt;p&gt;Thanks,&lt;br/&gt;
Sebastien.&lt;/p&gt;</comment>
                            <comment id="72470" author="tappro" created="Thu, 28 Nov 2013 10:01:06 +0000"  >&lt;p&gt;Sebastien, normally each obd device reported unhealthy when it is in setup or cleanup process, but each device may declare own additional checks, e.g. OFD checks statfs returns no error and os_state is not READONLY. OST checks that ptlrpc services are healthy. MDT has no specific checks.&lt;br/&gt;
It is still worth to check /proc/lustre/health_check as before because it reports all key devices are healthy and fully set up.&lt;/p&gt;</comment>
                            <comment id="72977" author="sebastien.buisson" created="Fri, 6 Dec 2013 14:45:28 +0000"  >&lt;p&gt;Hi,&lt;/p&gt;

&lt;p&gt;We gave a try to the new implementation of the patch at &lt;a href=&quot;http://review.whamcloud.com/8408&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/8408&lt;/a&gt; (patchset 2), and with it we do not see the MDT unhealthy anymore when starting while OSTs are in recovery.&lt;/p&gt;

&lt;p&gt;And thanks for the explanations Mikhail.&lt;/p&gt;

&lt;p&gt;Now we would like to know if this patch can be merged, or if it can be used in production.&lt;/p&gt;

&lt;p&gt;Thanks,&lt;br/&gt;
Sebastien.&lt;/p&gt;</comment>
                            <comment id="73559" author="bobijam" created="Mon, 16 Dec 2013 08:11:40 +0000"  >&lt;p&gt;back port for b2_5 &lt;a href=&quot;http://review.whamcloud.com/8585&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/8585&lt;/a&gt; and for b2_4 &lt;a href=&quot;http://review.whamcloud.com/8587&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/8587&lt;/a&gt;&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="13679" name="lustre.logs.mdt.extract.gz" size="1086075" author="sebastien.buisson" created="Thu, 24 Oct 2013 10:57:02 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzw6nr:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>11221</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>