<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:58:48 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-6273] Hard Failover replay-dual test_17: Failover OST mount hang</title>
                <link>https://jira.whamcloud.com/browse/LU-6273</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;This issue was created by maloo for sarah &amp;lt;sarah@whamcloud.com&amp;gt;&lt;/p&gt;

&lt;p&gt;This issue relates to the following test suite run: &lt;a href=&quot;https://testing.hpdd.intel.com/test_sets/0429703c-ba58-11e4-8053-5254006e85c2&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.hpdd.intel.com/test_sets/0429703c-ba58-11e4-8053-5254006e85c2&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The sub-test test_17 failed with the following error:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;test failed to respond and timed out
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;ost dmesg&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Lustre: 3053:0:(client.c:1939:ptlrpc_expire_one_request()) Skipped 2 previous similar messages
LustreError: 137-5: lustre-OST0002_UUID: not available for connect from 10.2.4.161@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: Skipped 120 previous similar messages
INFO: task mount.lustre:3630 blocked for more than 120 seconds.
      Tainted: P           ---------------    2.6.32-431.29.2.el6_lustre.x86_64 #1
&quot;echo 0 &amp;gt; /proc/sys/kernel/hung_task_timeout_secs&quot; disables this message.
mount.lustre  D 0000000000000000     0  3630   3629 0x00000080
 ffff88006edf9718 0000000000000082 0000000000000000 ffff88006ef82040
 ffff88006edf9698 ffffffff81055783 ffff88007e4c2ad8 ffff880002216880
 ffff88006ef825f8 ffff88006edf9fd8 000000000000fbc8 ffff88006ef825f8
Call Trace:
 [&amp;lt;ffffffff81055783&amp;gt;] ? set_next_buddy+0x43/0x50
 [&amp;lt;ffffffff8152a595&amp;gt;] schedule_timeout+0x215/0x2e0
 [&amp;lt;ffffffff81069f15&amp;gt;] ? enqueue_entity+0x125/0x450
 [&amp;lt;ffffffff8152a213&amp;gt;] wait_for_common+0x123/0x180
 [&amp;lt;ffffffff81061d00&amp;gt;] ? default_wake_function+0x0/0x20
 [&amp;lt;ffffffffa090cd00&amp;gt;] ? client_lwp_config_process+0x0/0x1948 [obdclass]
 [&amp;lt;ffffffff8152a32d&amp;gt;] wait_for_completion+0x1d/0x20
 [&amp;lt;ffffffffa0898e14&amp;gt;] llog_process_or_fork+0x354/0x540 [obdclass]
 [&amp;lt;ffffffffa0899014&amp;gt;] llog_process+0x14/0x30 [obdclass]
 [&amp;lt;ffffffffa08c81d4&amp;gt;] class_config_parse_llog+0x1e4/0x330 [obdclass]
 [&amp;lt;ffffffffa10314f2&amp;gt;] mgc_process_log+0xeb2/0x1970 [mgc]
 [&amp;lt;ffffffffa102b1f0&amp;gt;] ? mgc_blocking_ast+0x0/0x810 [mgc]
 [&amp;lt;ffffffffa0ad0860&amp;gt;] ? ldlm_completion_ast+0x0/0x9b0 [ptlrpc]
 [&amp;lt;ffffffffa1032ef8&amp;gt;] mgc_process_config+0x658/0x1210 [mgc]
 [&amp;lt;ffffffffa08d9383&amp;gt;] lustre_process_log+0x7e3/0x1130 [obdclass]
 [&amp;lt;ffffffffa07891c1&amp;gt;] ? libcfs_debug_msg+0x41/0x50 [libcfs]
 [&amp;lt;ffffffffa08d514f&amp;gt;] ? server_name2fsname+0x6f/0x90 [obdclass]
 [&amp;lt;ffffffffa0907496&amp;gt;] server_start_targets+0x12b6/0x1af0 [obdclass]
 [&amp;lt;ffffffffa0783818&amp;gt;] ? libcfs_log_return+0x28/0x40 [libcfs]
 [&amp;lt;ffffffffa08dbfe6&amp;gt;] ? lustre_start_mgc+0x4b6/0x1e00 [obdclass]
 [&amp;lt;ffffffffa07891c1&amp;gt;] ? libcfs_debug_msg+0x41/0x50 [libcfs]
 [&amp;lt;ffffffffa08d3390&amp;gt;] ? class_config_llog_handler+0x0/0x1a70 [obdclass]
 [&amp;lt;ffffffffa090c255&amp;gt;] server_fill_super+0xbe5/0x1690 [obdclass]
 [&amp;lt;ffffffffa0783818&amp;gt;] ? libcfs_log_return+0x28/0x40 [libcfs]
 [&amp;lt;ffffffffa08dde90&amp;gt;] lustre_fill_super+0x560/0xa80 [obdclass]
 [&amp;lt;ffffffffa08dd930&amp;gt;] ? lustre_fill_super+0x0/0xa80 [obdclass]
 [&amp;lt;ffffffff8118c56f&amp;gt;] get_sb_nodev+0x5f/0xa0
 [&amp;lt;ffffffffa08d4ee5&amp;gt;] lustre_get_sb+0x25/0x30 [obdclass]
 [&amp;lt;ffffffff8118bbcb&amp;gt;] vfs_kern_mount+0x7b/0x1b0
 [&amp;lt;ffffffff8118bd72&amp;gt;] do_kern_mount+0x52/0x130
 [&amp;lt;ffffffff8119e972&amp;gt;] ? vfs_ioctl+0x22/0xa0
 [&amp;lt;ffffffff811ad74b&amp;gt;] do_mount+0x2fb/0x930
 [&amp;lt;ffffffff811ade10&amp;gt;] sys_mount+0x90/0xe0
 [&amp;lt;ffffffff8100b072&amp;gt;] system_call_fastpath+0x16/0x1b
Lustre: 3053:0:(client.c:1939:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1424565647/real 1424565647]  req@ffff880070449080 x1493765169611180/t0(0) o38-&amp;gt;lustre-MDT0000-lwp-OST0001@10.2.4.158@tcp:12/10 lens 400/544 e 0 to 1 dl 1424565672 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Lustre: 3053:0:(client.c:1939:ptlrpc_expire_one_request()) Skipped 6 previous similar messages
Lustre: 3053:0:(client.c:1939:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1424565712/real 1424565712]  req@ffff880070449680 x1493765169611316/t0(0) o38-&amp;gt;lustre-MDT0000-lwp-OST0001@10.2.4.158@tcp:12/10 lens 400/544 e 0 to 1 dl 1424565737 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Lustre: 3053:0:(client.c:1939:ptlrpc_expire_one_request()) Skipped 7 previous similar messages
INFO: task mount.lustre:3630 blocked for more than 120 seconds.
      Tainted: P           ---------------    2.6.32-431.29.2.el6_lustre.x86_64 #1
&quot;echo 0 &amp;gt; /proc/sys/kernel/hung_task_timeout_secs&quot; disables this message.
mount.lustre  D 0000000000000000     0  3630   3629 0x00000080
 ffff88006edf9718 0000000000000082 0000000000000000 ffff88006ef82040
 ffff88006edf9698 ffffffff81055783 ffff88007e4c2ad8 ffff880002216880
 ffff88006ef825f8 ffff88006edf9fd8 000000000000fbc8 ffff88006ef825f8
Call Trace:
 [&amp;lt;ffffffff81055783&amp;gt;] ? set_next_buddy+0x43/0x50
 [&amp;lt;ffffffff8152a595&amp;gt;] schedule_timeout+0x215/0x2e0
 [&amp;lt;ffffffff81069f15&amp;gt;] ? enqueue_entity+0x125/0x450
 [&amp;lt;ffffffff8152a213&amp;gt;] wait_for_common+0x123/0x180
 [&amp;lt;ffffffff81061d00&amp;gt;] ? default_wake_function+0x0/0x20
 [&amp;lt;ffffffffa090cd00&amp;gt;] ? client_lwp_config_process+0x0/0x1948 [obdclass]
 [&amp;lt;ffffffff8152a32d&amp;gt;] wait_for_completion+0x1d/0x20
 [&amp;lt;ffffffffa0898e14&amp;gt;] llog_process_or_fork+0x354/0x540 [obdclass]
 [&amp;lt;ffffffffa0899014&amp;gt;] llog_process+0x14/0x30 [obdclass]
 [&amp;lt;ffffffffa08c81d4&amp;gt;] class_config_parse_llog+0x1e4/0x330 [obdclass]
 [&amp;lt;ffffffffa10314f2&amp;gt;] mgc_process_log+0xeb2/0x1970 [mgc]
 [&amp;lt;ffffffffa102b1f0&amp;gt;] ? mgc_blocking_ast+0x0/0x810 [mgc]
 [&amp;lt;ffffffffa0ad0860&amp;gt;] ? ldlm_completion_ast+0x0/0x9b0 [ptlrpc]
 [&amp;lt;ffffffffa1032ef8&amp;gt;] mgc_process_config+0x658/0x1210 [mgc]
 [&amp;lt;ffffffffa08d9383&amp;gt;] lustre_process_log+0x7e3/0x1130 [obdclass]
 [&amp;lt;ffffffffa07891c1&amp;gt;] ? libcfs_debug_msg+0x41/0x50 [libcfs]
 [&amp;lt;ffffffffa08d514f&amp;gt;] ? server_name2fsname+0x6f/0x90 [obdclass]
 [&amp;lt;ffffffffa0907496&amp;gt;] server_start_targets+0x12b6/0x1af0 [obdclass]
 [&amp;lt;ffffffffa0783818&amp;gt;] ? libcfs_log_return+0x28/0x40 [libcfs]
 [&amp;lt;ffffffffa08dbfe6&amp;gt;] ? lustre_start_mgc+0x4b6/0x1e00 [obdclass]
 [&amp;lt;ffffffffa07891c1&amp;gt;] ? libcfs_debug_msg+0x41/0x50 [libcfs]
 [&amp;lt;ffffffffa08d3390&amp;gt;] ? class_config_llog_handler+0x0/0x1a70 [obdclass]
 [&amp;lt;ffffffffa090c255&amp;gt;] server_fill_super+0xbe5/0x1690 [obdclass]
 [&amp;lt;ffffffffa0783818&amp;gt;] ? libcfs_log_return+0x28/0x40 [libcfs]
 [&amp;lt;ffffffffa08dde90&amp;gt;] lustre_fill_super+0x560/0xa80 [obdclass]
 [&amp;lt;ffffffffa08dd930&amp;gt;] ? lustre_fill_super+0x0/0xa80 [obdclass]
 [&amp;lt;ffffffff8118c56f&amp;gt;] get_sb_nodev+0x5f/0xa0
 [&amp;lt;ffffffffa08d4ee5&amp;gt;] lustre_get_sb+0x25/0x30 [obdclass]
 [&amp;lt;ffffffff8118bbcb&amp;gt;] vfs_kern_mount+0x7b/0x1b0
 [&amp;lt;ffffffff8118bd72&amp;gt;] do_kern_mount+0x52/0x130
 [&amp;lt;ffffffff8119e972&amp;gt;] ? vfs_ioctl+0x22/0xa0
 [&amp;lt;ffffffff811ad74b&amp;gt;] do_mount+0x2fb/0x930
 [&amp;lt;ffffffff811ade10&amp;gt;] sys_mount+0x90/0xe0
 [&amp;lt;ffffffff8100b072&amp;gt;] system_call_fastpath+0x16/0x1b
LustreError: 137-5: lustre-OST0002_UUID: not available for connect from 10.2.4.156@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: Skipped 304 previous similar messages
Lustre: 3053:0:(client.c:1939:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1424565842/real 1424565842]  req@ffff880070449c80 x1493765169611592/t0(0) o38-&amp;gt;lustre-MDT0000-lwp-OST0001@10.2.4.158@tcp:12/10 lens 400/544 e 0 to 1 dl 1424565867 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Lustre: 3053:0:(client.c:1939:ptlrpc_expire_one_request()) Skipped 16 previous similar messages
INFO: task mount.lustre:3630 blocked for more than 120 seconds.
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</description>
                <environment>client and server: lustre-master build # 2856&lt;br/&gt;
zfs</environment>
        <key id="28816">LU-6273</key>
            <summary>Hard Failover replay-dual test_17: Failover OST mount hang</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="1" iconUrl="https://jira.whamcloud.com/images/icons/priorities/blocker.svg">Blocker</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="tappro">Mikhail Pershin</assignee>
                                    <reporter username="maloo">Maloo</reporter>
                        <labels>
                            <label>zfs</label>
                    </labels>
                <created>Tue, 24 Feb 2015 06:48:29 +0000</created>
                <updated>Wed, 23 Sep 2015 06:32:19 +0000</updated>
                            <resolved>Tue, 22 Sep 2015 04:20:48 +0000</resolved>
                                    <version>Lustre 2.6.0</version>
                    <version>Lustre 2.7.0</version>
                    <version>Lustre 2.8.0</version>
                                    <fixVersion>Lustre 2.8.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>12</watches>
                                                                            <comments>
                            <comment id="107804" author="green" created="Tue, 24 Feb 2015 18:31:50 +0000"  >&lt;p&gt;So the real problem here is such that MDS has crashed and now OSTs cannot reconnect to it anymore.&lt;br/&gt;
We need to find MDS crashdump and get a log our of there to see why did it crash.&lt;/p&gt;</comment>
                            <comment id="107806" author="jlevi" created="Tue, 24 Feb 2015 18:37:22 +0000"  >&lt;p&gt;Mike,&lt;br/&gt;
Could you please have a look and comment on this one?&lt;br/&gt;
Thank you!&lt;/p&gt;</comment>
                            <comment id="107807" author="green" created="Tue, 24 Feb 2015 18:37:54 +0000"  >&lt;p&gt;Additionally - even with MDS/MGS node still down OSTs absolutely must be able to start as long as they have a cached copy of config.&lt;br/&gt;
And yet in the OST logs we see OST0000 started and OST0002 hung - so we also need to see why did that one stack - I imagine Mike would want to look at this one.&lt;/p&gt;</comment>
                            <comment id="108675" author="sarah" created="Wed, 4 Mar 2015 00:30:47 +0000"  >&lt;p&gt;another instance seen in recovery-double-scale&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://testing.hpdd.intel.com/test_sets/85116226-c170-11e4-bef2-5254006e85c2&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.hpdd.intel.com/test_sets/85116226-c170-11e4-bef2-5254006e85c2&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;OST&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Lustre: 27409:0:(client.c:1939:ptlrpc_expire_one_request()) Skipped 2 previous similar messages
INFO: task mount.lustre:27983 blocked for more than 120 seconds.
      Tainted: P           ---------------    2.6.32-504.8.1.el6_lustre.x86_64 #1
&quot;echo 0 &amp;gt; /proc/sys/kernel/hung_task_timeout_secs&quot; disables this message.
mount.lustre  D 0000000000000000     0 27983  27982 0x00000080
 ffff88005ca1f718 0000000000000082 000000000000ebc8 ffff88005ed70640
 ffff88007e4c2aa0 ffff88005ed70080 ffff88005ca1f698 ffffffff810588b3
 ffff88007e4c2ad8 ffff880002215900 ffff88005ed70638 ffff88005ca1ffd8
Call Trace:
 [&amp;lt;ffffffff810588b3&amp;gt;] ? set_next_buddy+0x43/0x50
 [&amp;lt;ffffffff81041e98&amp;gt;] ? pvclock_clocksource_read+0x58/0xd0
 [&amp;lt;ffffffff8152b185&amp;gt;] schedule_timeout+0x215/0x2e0
 [&amp;lt;ffffffff8106d175&amp;gt;] ? enqueue_entity+0x125/0x450
 [&amp;lt;ffffffff8105dea4&amp;gt;] ? check_preempt_wakeup+0x1a4/0x260
 [&amp;lt;ffffffff8106d59b&amp;gt;] ? enqueue_task_fair+0xfb/0x100
 [&amp;lt;ffffffff8152ae03&amp;gt;] wait_for_common+0x123/0x180
 [&amp;lt;ffffffff81064b90&amp;gt;] ? default_wake_function+0x0/0x20
 [&amp;lt;ffffffffa09127c0&amp;gt;] ? client_lwp_config_process+0x0/0x1912 [obdclass]
 [&amp;lt;ffffffff8152af1d&amp;gt;] wait_for_completion+0x1d/0x20
 [&amp;lt;ffffffffa089ecd4&amp;gt;] llog_process_or_fork+0x354/0x540 [obdclass]
 [&amp;lt;ffffffffa089eed4&amp;gt;] llog_process+0x14/0x30 [obdclass]
 [&amp;lt;ffffffffa08cddf4&amp;gt;] class_config_parse_llog+0x1e4/0x330 [obdclass]
 [&amp;lt;ffffffffa10344f2&amp;gt;] mgc_process_log+0xeb2/0x1970 [mgc]
 [&amp;lt;ffffffffa102e1f0&amp;gt;] ? mgc_blocking_ast+0x0/0x810 [mgc]
 [&amp;lt;ffffffffa0ad5800&amp;gt;] ? ldlm_completion_ast+0x0/0x9b0 [ptlrpc]
 [&amp;lt;ffffffffa1035ef8&amp;gt;] mgc_process_config+0x658/0x1210 [mgc]
 [&amp;lt;ffffffff81174d4c&amp;gt;] ? __kmalloc+0x21c/0x230
 [&amp;lt;ffffffffa08defa3&amp;gt;] lustre_process_log+0x7e3/0x1130 [obdclass]
 [&amp;lt;ffffffffa0906e9c&amp;gt;] ? server_find_mount+0xbc/0x160 [obdclass]
 [&amp;lt;ffffffff811749f3&amp;gt;] ? kmem_cache_alloc_trace+0x1b3/0x1c0
 [&amp;lt;ffffffffa08dad6f&amp;gt;] ? server_name2fsname+0x6f/0x90 [obdclass]
 [&amp;lt;ffffffffa090cf56&amp;gt;] server_start_targets+0x12b6/0x1af0 [obdclass]
 [&amp;lt;ffffffffa08e1bdb&amp;gt;] ? lustre_start_mgc+0x48b/0x1e00 [obdclass]
 [&amp;lt;ffffffffa08d8fb0&amp;gt;] ? class_config_llog_handler+0x0/0x1a70 [obdclass]
 [&amp;lt;ffffffffa0911d15&amp;gt;] server_fill_super+0xbe5/0x1690 [obdclass]
 [&amp;lt;ffffffffa08e3ab0&amp;gt;] lustre_fill_super+0x560/0xa80 [obdclass]
 [&amp;lt;ffffffffa08e3550&amp;gt;] ? lustre_fill_super+0x0/0xa80 [obdclass]
 [&amp;lt;ffffffff811917af&amp;gt;] get_sb_nodev+0x5f/0xa0
 [&amp;lt;ffffffffa08dab05&amp;gt;] lustre_get_sb+0x25/0x30 [obdclass]
 [&amp;lt;ffffffff81190deb&amp;gt;] vfs_kern_mount+0x7b/0x1b0
 [&amp;lt;ffffffff81190f92&amp;gt;] do_kern_mount+0x52/0x130
 [&amp;lt;ffffffff811a3c12&amp;gt;] ? vfs_ioctl+0x22/0xa0
 [&amp;lt;ffffffff811b2b9b&amp;gt;] do_mount+0x2fb/0x930
 [&amp;lt;ffffffff811b3260&amp;gt;] sys_mount+0x90/0xe0
 [&amp;lt;ffffffff8100b072&amp;gt;] system_call_fastpath+0x16/0x1b
Lustre: 27409:0:(client.c:1939:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1425299784/real 1425299784]  req@ffff88005ee57380 x1494534964904508/t0(0) o38-&amp;gt;lustre-MDT0000-lwp-OST0001@10.1.4.205@tcp:12/10 lens 400/544 e 0 to 1 dl 1425299809 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Lustre: 27409:0:(client.c:1939:ptlrpc_expire_one_request()) Skipped 6 previous similar messages
LustreError: 137-5: lustre-OST0002_UUID: not available for connect from 10.1.4.198@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: Skipped 40 previous similar messages
Lustre: 27409:0:(client.c:1939:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1425299849/real 1425299849]  req@ffff88005c0a69c0 x1494534964904636/t0(0) o38-&amp;gt;lustre-MDT0000-lwp-OST0001@10.1.4.205@tcp:12/10 lens 400/544 e 0 to 1 dl 1425299874 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Lustre: 27409:0:(client.c:1939:ptlrpc_expire_one_request()) Skipped 7 previous similar messages
INFO: task mount.lustre:27983 blocked for more than 120 seconds.
      Tainted: P           ---------------    2.6.32-504.8.1.el6_lustre.x86_64 #1
&quot;echo 0 &amp;gt; /proc/sys/kernel/hung_task_timeout_secs&quot; disables this message.
mount.lustre  D 0000000000000000     0 27983  27982 0x00000080
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="109504" author="yujian" created="Wed, 11 Mar 2015 20:52:46 +0000"  >&lt;p&gt;This is blocking ZFS hard failover testing on master branch:&lt;br/&gt;
&lt;a href=&quot;https://testing.hpdd.intel.com/test_sessions/dbcec19e-c81e-11e4-be50-5254006e85c2&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.hpdd.intel.com/test_sessions/dbcec19e-c81e-11e4-be50-5254006e85c2&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="124457" author="jgmitter" created="Tue, 18 Aug 2015 17:29:21 +0000"  >&lt;p&gt;Hi Mike,&lt;br/&gt;
Can you take a look at this issue?  It is a blocker for 2.8.&lt;br/&gt;
Thanks.&lt;br/&gt;
Joe&lt;/p&gt;</comment>
                            <comment id="124479" author="tappro" created="Tue, 18 Aug 2015 18:11:57 +0000"  >&lt;p&gt;yes, I am working on this&lt;/p&gt;</comment>
                            <comment id="124608" author="tappro" created="Wed, 19 Aug 2015 16:54:55 +0000"  >&lt;p&gt;this looks like bug with lw_client detection in target_handle_connect(). I think this patch should do the job:&lt;br/&gt;
&lt;a href=&quot;http://review.whamcloud.com/#/c/13726/3&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/13726/3&lt;/a&gt;&lt;br/&gt;
Meanwhile it is fixed in master by one of DNE2 patches, I will combine both patches to one for 2.7&lt;/p&gt;</comment>
                            <comment id="124712" author="jgmitter" created="Thu, 20 Aug 2015 17:53:01 +0000"  >&lt;p&gt;Looks like that patch needs to be rebased to move forward.&lt;/p&gt;</comment>
                            <comment id="124718" author="tappro" created="Thu, 20 Aug 2015 18:15:50 +0000"  >&lt;p&gt;in fact I think this is fixed already by commit f1d81db9376965e302ecc05e10be220d72b2f04a from &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6629&quot; title=&quot;sanity-benchmark test_bonnie: DQACQ failed with -22&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6629&quot;&gt;&lt;del&gt;LU-6629&lt;/del&gt;&lt;/a&gt;, I&apos;ve just added fix for initial LWP connection logic from my old patch for &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4214&quot; title=&quot;Hyperion - OST never recovers on failover node&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4214&quot;&gt;&lt;del&gt;LU-4214&lt;/del&gt;&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="124930" author="yujian" created="Mon, 24 Aug 2015 18:16:01 +0000"  >&lt;p&gt;On master branch:&lt;/p&gt;

&lt;p&gt;RHEL 7.1 with ZFS:&lt;br/&gt;
&lt;a href=&quot;https://testing.hpdd.intel.com/test_sessions/3d62dc24-48bc-11e5-a657-5254006e85c2&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.hpdd.intel.com/test_sessions/3d62dc24-48bc-11e5-a657-5254006e85c2&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;RHEL 6.6 with ZFS:&lt;br/&gt;
&lt;a href=&quot;https://testing.hpdd.intel.com/test_sessions/6a2783ea-4853-11e5-a4ad-5254006e85c2&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.hpdd.intel.com/test_sessions/6a2783ea-4853-11e5-a4ad-5254006e85c2&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;RHEL 7.1 with ldiskfs:&lt;br/&gt;
&lt;a href=&quot;https://testing.hpdd.intel.com/test_sessions/e8b22732-498a-11e5-8ada-5254006e85c2&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.hpdd.intel.com/test_sessions/e8b22732-498a-11e5-8ada-5254006e85c2&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;RHEL 6.6 with ldiskfs:&lt;br/&gt;
&lt;a href=&quot;https://testing.hpdd.intel.com/test_sessions/1220e9f8-470d-11e5-bfb6-5254006e85c2&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.hpdd.intel.com/test_sessions/1220e9f8-470d-11e5-bfb6-5254006e85c2&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The failure also occurred on ZFS test sessions on master branch. Not on ldiskfs test sessions.&lt;/p&gt;</comment>
                            <comment id="124993" author="tappro" created="Tue, 25 Aug 2015 08:54:54 +0000"  >&lt;p&gt;It looks like OST is unable to switch to the failover MDS after all and keep trying to reach failed MDS. That means problem in its import connection logic or local config doesn&apos;t provide information about failover MDS&lt;/p&gt;</comment>
                            <comment id="126467" author="yujian" created="Fri, 4 Sep 2015 23:55:04 +0000"  >&lt;p&gt;Lustre Build: &lt;a href=&quot;https://build.hpdd.intel.com/job/lustre-b2_6/2&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://build.hpdd.intel.com/job/lustre-b2_6/2&lt;/a&gt; (2.6.0)&lt;br/&gt;
Distro/Arch: RHEL6.5/x86_64&lt;br/&gt;
FSTYPE=zfs&lt;/p&gt;

&lt;p&gt;The failure also occurred on Lustre 2.6.0 release:&lt;br/&gt;
&lt;a href=&quot;https://testing.hpdd.intel.com/test_sessions/05c8dc02-135f-11e4-92ae-5254006e85c2&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.hpdd.intel.com/test_sessions/05c8dc02-135f-11e4-92ae-5254006e85c2&lt;/a&gt; &lt;/p&gt;</comment>
                            <comment id="126514" author="tappro" created="Sun, 6 Sep 2015 06:53:14 +0000"  >&lt;p&gt;The problem looks still related to the LWP device, it is not able to switch to the failover server for some reason. Meanwhile the local config llog exists on OST and contains failover nid, so problem might be inside LWP code. Also it is not clear why it exists only on ZFS setups.&lt;/p&gt;</comment>
                            <comment id="126535" author="niu" created="Sun, 6 Sep 2015 15:46:45 +0000"  >&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;Call Trace:
 [&amp;lt;ffffffff81055783&amp;gt;] ? set_next_buddy+0x43/0x50
 [&amp;lt;ffffffff8152a595&amp;gt;] schedule_timeout+0x215/0x2e0
 [&amp;lt;ffffffff81069f15&amp;gt;] ? enqueue_entity+0x125/0x450
 [&amp;lt;ffffffff8152a213&amp;gt;] wait_for_common+0x123/0x180
 [&amp;lt;ffffffff81061d00&amp;gt;] ? default_wake_function+0x0/0x20
 [&amp;lt;ffffffffa090cd00&amp;gt;] ? client_lwp_config_process+0x0/0x1948 [obdclass]
 [&amp;lt;ffffffff8152a32d&amp;gt;] wait_for_completion+0x1d/0x20
 [&amp;lt;ffffffffa0898e14&amp;gt;] llog_process_or_fork+0x354/0x540 [obdclass]
 [&amp;lt;ffffffffa0899014&amp;gt;] llog_process+0x14/0x30 [obdclass]
 [&amp;lt;ffffffffa08c81d4&amp;gt;] class_config_parse_llog+0x1e4/0x330 [obdclass]
 [&amp;lt;ffffffffa10314f2&amp;gt;] mgc_process_log+0xeb2/0x1970 [mgc]
 [&amp;lt;ffffffffa102b1f0&amp;gt;] ? mgc_blocking_ast+0x0/0x810 [mgc]
 [&amp;lt;ffffffffa0ad0860&amp;gt;] ? ldlm_completion_ast+0x0/0x9b0 [ptlrpc]
 [&amp;lt;ffffffffa1032ef8&amp;gt;] mgc_process_config+0x658/0x1210 [mgc]
 [&amp;lt;ffffffffa08d9383&amp;gt;] lustre_process_log+0x7e3/0x1130 [obdclass]
 [&amp;lt;ffffffffa07891c1&amp;gt;] ? libcfs_debug_msg+0x41/0x50 [libcfs]
 [&amp;lt;ffffffffa08d514f&amp;gt;] ? server_name2fsname+0x6f/0x90 [obdclass]
 [&amp;lt;ffffffffa0907496&amp;gt;] server_start_targets+0x12b6/0x1af0 [obdclass]
 [&amp;lt;ffffffffa0783818&amp;gt;] ? libcfs_log_return+0x28/0x40 [libcfs]
 [&amp;lt;ffffffffa08dbfe6&amp;gt;] ? lustre_start_mgc+0x4b6/0x1e00 [obdclass]
 [&amp;lt;ffffffffa07891c1&amp;gt;] ? libcfs_debug_msg+0x41/0x50 [libcfs]
 [&amp;lt;ffffffffa08d3390&amp;gt;] ? class_config_llog_handler+0x0/0x1a70 [obdclass]
 [&amp;lt;ffffffffa090c255&amp;gt;] server_fill_super+0xbe5/0x1690 [obdclass]
 [&amp;lt;ffffffffa0783818&amp;gt;] ? libcfs_log_return+0x28/0x40 [libcfs]
 [&amp;lt;ffffffffa08dde90&amp;gt;] lustre_fill_super+0x560/0xa80 [obdclass]
 [&amp;lt;ffffffffa08dd930&amp;gt;] ? lustre_fill_super+0x0/0xa80 [obdclass]
 [&amp;lt;ffffffff8118c56f&amp;gt;] get_sb_nodev+0x5f/0xa0
 [&amp;lt;ffffffffa08d4ee5&amp;gt;] lustre_get_sb+0x25/0x30 [obdclass]
 [&amp;lt;ffffffff8118bbcb&amp;gt;] vfs_kern_mount+0x7b/0x1b0
 [&amp;lt;ffffffff8118bd72&amp;gt;] do_kern_mount+0x52/0x130
 [&amp;lt;ffffffff8119e972&amp;gt;] ? vfs_ioctl+0x22/0xa0
 [&amp;lt;ffffffff811ad74b&amp;gt;] do_mount+0x2fb/0x930
 [&amp;lt;ffffffff811ade10&amp;gt;] sys_mount+0x90/0xe0
 [&amp;lt;ffffffff8100b072&amp;gt;] system_call_fastpath+0x16/0x1b
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The mount thread is waiting on processing log records, let&apos;s see what the log process thread is doing:&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;llog_process_ S 0000000000000001     0 28768      2 0x00000080
 ffff880058535970 0000000000000046 ffff880058535960 ffffffffa0c04f4c
 ffff8800585359d0 000000007f823240 0000000000000000 ffff8800590f4511
 0000004958535910 ffffffffa0cccece ffff880058f285f8 ffff880058535fd8
Call Trace:
 [&amp;lt;ffffffffa0c04f4c&amp;gt;] ? ptlrpc_unregister_reply+0x6c/0x810 [ptlrpc]
 [&amp;lt;ffffffff8152b222&amp;gt;] schedule_timeout+0x192/0x2e0
 [&amp;lt;ffffffff81087540&amp;gt;] ? process_timeout+0x0/0x10
 [&amp;lt;ffffffffa0c07ba9&amp;gt;] ptlrpc_set_wait+0x319/0xa20 [ptlrpc]
 [&amp;lt;ffffffffa0bfd2c0&amp;gt;] ? ptlrpc_interrupted_set+0x0/0x110 [ptlrpc]
 [&amp;lt;ffffffff81064c00&amp;gt;] ? default_wake_function+0x0/0x20
 [&amp;lt;ffffffffa0c13da5&amp;gt;] ? lustre_msg_set_jobid+0xf5/0x130 [ptlrpc]
 [&amp;lt;ffffffffa0c08331&amp;gt;] ptlrpc_queue_wait+0x81/0x220 [ptlrpc]
 [&amp;lt;ffffffffa0e7455b&amp;gt;] fld_client_rpc+0x15b/0x510 [fld]
 [&amp;lt;ffffffff81297869&amp;gt;] ? simple_strtoul+0x9/0x10
 [&amp;lt;ffffffffa0e7b024&amp;gt;] fld_update_from_controller+0x1c4/0x570 [fld]
 [&amp;lt;ffffffffa124f8e8&amp;gt;] ofd_register_lwp_callback+0xa8/0x5a0 [ofd]
 [&amp;lt;ffffffffa0a0c84f&amp;gt;] lustre_lwp_connect+0xacf/0xd10 [obdclass]
 [&amp;lt;ffffffffa0a0db25&amp;gt;] lustre_lwp_setup+0x8c5/0xc60 [obdclass]
 [&amp;lt;ffffffffa09dbc08&amp;gt;] ? target_name2index+0x78/0xc0 [obdclass]
 [&amp;lt;ffffffffa0a0fa07&amp;gt;] client_lwp_config_process+0x1357/0x1da0 [obdclass]
 [&amp;lt;ffffffffa099c35a&amp;gt;] llog_process_thread+0x94a/0xfc0 [obdclass]
 [&amp;lt;ffffffffa099d515&amp;gt;] llog_process_thread_daemonize+0x45/0x70 [obdclass]
 [&amp;lt;ffffffffa099d4d0&amp;gt;] ? llog_process_thread_daemonize+0x0/0x70 [obdclass]
 [&amp;lt;ffffffff8109e78e&amp;gt;] kthread+0x9e/0xc0
 [&amp;lt;ffffffff8100c28a&amp;gt;] child_rip+0xa/0x20
 [&amp;lt;ffffffff8109e6f0&amp;gt;] ? kthread+0x0/0xc0
 [&amp;lt;ffffffff8100c280&amp;gt;] ? child_rip+0x0/0x20
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;It&apos;s trying to query FLDB from MDT0, but unfortunately the FLD RPC timedout somehow. Though I don&apos;t quite sure the timeout reason, I think we shouldn&apos;t do that in ofd_register_lwp_callback() from the beginning, because that&apos;ll block the OST mount process when the MDT0 isn&apos;t started.&lt;/p&gt;

&lt;p&gt;Di, could you take a look? Is there anyway to avert that problem?&lt;/p&gt;



</comment>
                            <comment id="126561" author="niu" created="Mon, 7 Sep 2015 03:04:26 +0000"  >&lt;p&gt;I see why the LWP on OST can&apos;t connect to the failover MDT now:&lt;/p&gt;

&lt;p&gt;On OST mount, we process the client log to setup LWP as following:&lt;br/&gt;
1&amp;gt; Process LCFG_ADD_UUID record to setup LWP device, then connect to the MDT. (see lustre_lwp_setup());&lt;br/&gt;
2&amp;gt; Process LCFG_ADD_CONN record to add failover connection;&lt;/p&gt;

&lt;p&gt;We can see if the mount process is blocked on step 1, then it will never have a chance to add failover connection, and LWP won&apos;t be able to switch to failover node forever.&lt;/p&gt;

&lt;p&gt;Unfortunately, the process could be blocked on step 1, see lustre_lwp_setup() -&amp;gt; lustre_lwp_connect():&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;         rc = obd_connect(&amp;amp;env, &amp;amp;exp, lwp, uuid, data, NULL);
        &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (rc != 0) {
                CERROR(&lt;span class=&quot;code-quote&quot;&gt;&quot;%s: connect failed: rc = %d\n&quot;&lt;/span&gt;, lwp-&amp;gt;obd_name, rc);
        } &lt;span class=&quot;code-keyword&quot;&gt;else&lt;/span&gt; {
                &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (unlikely(lwp-&amp;gt;obd_lwp_export != NULL))
                        class_export_put(lwp-&amp;gt;obd_lwp_export);
                lwp-&amp;gt;obd_lwp_export = class_export_get(exp);
                lustre_notify_lwp_list(exp);
        }
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;lustre_notify_lwp_list() is called after obd_connect() is called successfully (which means CONNECT can be sent successfully, but not connection established), lustre_notify_lwp_list calls ofd_register_lwp_callback() to send FLD RPC, please note that it&apos;s now try to send RPC to the primary MDT0, (connection of failover MDT0 hasn&apos;t been added yet).&lt;/p&gt;

&lt;p&gt;Let look at fld_client_rpc():&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;        &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (rc != 0) {
                &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (imp-&amp;gt;imp_state != LUSTRE_IMP_CLOSED &amp;amp;&amp;amp; !imp-&amp;gt;imp_deactive) {
                        /* Since LWP is not replayable, so it will keep
                         * trying unless umount happens, otherwise it would
                         * cause unecessary failure of the application. */
                        ptlrpc_req_finished(req);
                        rc = 0;
                        &lt;span class=&quot;code-keyword&quot;&gt;goto&lt;/span&gt; again;
                }
                GOTO(out_req, rc);
        }
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;It&apos;s trying to send the RPC to primary MDT0 again and again in a loop... At the end, the mount process is blocked here forever (before the failover connection being added).&lt;/p&gt;

&lt;p&gt;I&apos;d think further on how to resolve this. Di, any thoughts? Thanks.&lt;/p&gt;</comment>
                            <comment id="126570" author="di.wang" created="Mon, 7 Sep 2015 06:39:41 +0000"  >&lt;p&gt;Hmm, I actually met this problem in one of my patch test&lt;br/&gt;
&lt;a href=&quot;http://review.whamcloud.com/#/c/15275/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/15275/&lt;/a&gt; (patch set 7)&lt;/p&gt;

&lt;p&gt;See this failure &lt;a href=&quot;https://testing.hpdd.intel.com/test_sets/e2a60dca-3ff5-11e5-afd4-5254006e85c2&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.hpdd.intel.com/test_sets/e2a60dca-3ff5-11e5-afd4-5254006e85c2&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;we probably should set no_delay and allow_replay flag for update FLD RPC. So I think this patch should fix the problem&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;diff --git a/lustre/fld/fld_request.c b/lustre/fld/fld_request.c
index a59ab95..c82b098 100644
--- a/lustre/fld/fld_request.c
+++ b/lustre/fld/fld_request.c
@@ -399,6 +399,14 @@ again:
 
                req_capsule_set_size(&amp;amp;req-&amp;gt;rq_pill, &amp;amp;RMF_GENERIC_DATA,
                                     RCL_SERVER, PAGE_CACHE_SIZE);
+
+               /* This might happen before the import state becomes to FULL,
+                * let&apos;s set allow_replay for this request to avoid deadlock
+                * see LU-6273 */
+               req-&amp;gt;rq_allow_replay = 1;
+               /* This will always use LWP connection, let&apos;s send the req
+                * with no_delay flags, see above */
+               req-&amp;gt;rq_no_delay = 1;
                break;
        default:
                rc = -EINVAL;
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I will push the fix to &lt;a href=&quot;http://review.whamcloud.com/#/c/15275/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/15275/&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="126594" author="tappro" created="Mon, 7 Sep 2015 18:20:00 +0000"  >&lt;p&gt;yes, that looks like the reason, but I still wonder why ldiskfs setups are not affected. Di, IIRC, you did local FLDB in past, shouldn&apos;t OST have a local copy of FLDB?&lt;/p&gt;</comment>
                            <comment id="126598" author="di.wang" created="Mon, 7 Sep 2015 19:16:05 +0000"  >&lt;p&gt;I discussed with niu a bit, and this patch is probably not enough according to niu&apos;s comment above. We probably should move this fled_update_FLDB to a separate thread, otherwise the LWP config log analyze will not continue, and LWP will never be able to connect.  Yes, there is FLDB locally on each target, this FLDB will only happens during first initialization (checking lsf_new flags) or upgrade from 2.4 (DNE env) to the current DNE version (2.8).  And I do not know why it only happen on ZFS. It maybe because fld_index_init is slow on ZFS,  so OST is restart before fld_index_init finish.&lt;/p&gt;</comment>
                            <comment id="126608" author="tappro" created="Tue, 8 Sep 2015 05:49:42 +0000"  >&lt;p&gt;I think we can just move the lustre_notify_lwp_list() to the lustre_start_lwp(), right after lsi_lwp_started is set to 1. At this point the llog is processed and lwp should be fully setup.&lt;/p&gt;</comment>
                            <comment id="126609" author="gerrit" created="Tue, 8 Sep 2015 05:54:41 +0000"  >&lt;p&gt;wangdi (di.wang@intel.com) uploaded a new patch: &lt;a href=&quot;http://review.whamcloud.com/16303&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/16303&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6273&quot; title=&quot;Hard Failover replay-dual test_17: Failover OST mount hang&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6273&quot;&gt;&lt;del&gt;LU-6273&lt;/del&gt;&lt;/a&gt; fld: update local FLDB in a separate thread&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 92feb9161586f00cc03c481c81598f3b77847fb3&lt;/p&gt;</comment>
                            <comment id="126610" author="di.wang" created="Tue, 8 Sep 2015 06:12:34 +0000"  >&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;I think we can just move the lustre_notify_lwp_list() to the lustre_start_lwp(), right after lsi_lwp_started is set to 1. At this point the llog is processed and lwp should be fully setup.
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;No, lustre_start_lwp does not promise LWP will be setup accordingly, because mdt0 might not be added to the client config log yet, for example, OST might be started before MDT0.&lt;/p&gt;</comment>
                            <comment id="126611" author="gerrit" created="Tue, 8 Sep 2015 06:19:57 +0000"  >&lt;p&gt;Mike Pershin (mike.pershin@intel.com) uploaded a new patch: &lt;a href=&quot;http://review.whamcloud.com/16304&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/16304&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6273&quot; title=&quot;Hard Failover replay-dual test_17: Failover OST mount hang&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6273&quot;&gt;&lt;del&gt;LU-6273&lt;/del&gt;&lt;/a&gt; lwp: notify LWP is ready after llog processing&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 1be148d882a209137e92e525bc69b601e114646c&lt;/p&gt;</comment>
                            <comment id="128037" author="gerrit" created="Tue, 22 Sep 2015 02:55:34 +0000"  >&lt;p&gt;Oleg Drokin (oleg.drokin@intel.com) merged in patch &lt;a href=&quot;http://review.whamcloud.com/16303/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/16303/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6273&quot; title=&quot;Hard Failover replay-dual test_17: Failover OST mount hang&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6273&quot;&gt;&lt;del&gt;LU-6273&lt;/del&gt;&lt;/a&gt; lwp: notify LWP users in dedicated thread&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: b1848aa5b23fd332362e9ae3d5aab31d8dd9d920&lt;/p&gt;</comment>
                            <comment id="128047" author="pjones" created="Tue, 22 Sep 2015 04:20:48 +0000"  >&lt;p&gt;Landed for 2.8&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="21872">LU-4214</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzx6xb:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>17590</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>