<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:40:57 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-4242] mdt_open.c:1685:mdt_reint_open()) LBUG</title>
                <link>https://jira.whamcloud.com/browse/LU-4242</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;After upgrading our test file system to 2.4.1 earlier (and at the same time moving the OSSes to a different network), the MDT crashes very frequently with and LBUG and reboots directly. I have managed to get the following stack trace from /var/crash.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;&amp;lt;0&amp;gt;LustreError: 8518:0:(mdt_open.c:1685:mdt_reint_open()) LBUG
&amp;lt;6&amp;gt;Lustre: play01-MDT0000: Recovery over after 0:31, of 267 clients 267 recovered and 0 were evicted.
&amp;lt;4&amp;gt;Pid: 8518, comm: mdt01_005
&amp;lt;4&amp;gt;
&amp;lt;4&amp;gt;Call Trace:
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa04ea895&amp;gt;] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa04eae97&amp;gt;] lbug_with_loc+0x47/0xb0 [libcfs]
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa0e5a6b9&amp;gt;] mdt_reint_open+0x1989/0x20c0 [mdt]
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa050782e&amp;gt;] ? upcall_cache_get_entry+0x28e/0x860 [libcfs]
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa07d5dcc&amp;gt;] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc]
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa0669f50&amp;gt;] ? lu_ucred+0x20/0x30 [obdclass]
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa0e44911&amp;gt;] mdt_reint_rec+0x41/0xe0 [mdt]
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa0e29ae3&amp;gt;] mdt_reint_internal+0x4c3/0x780 [mdt]
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa0e2a06d&amp;gt;] mdt_intent_reint+0x1ed/0x520 [mdt]
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa0e27f1e&amp;gt;] mdt_intent_policy+0x39e/0x720 [mdt]
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa078d831&amp;gt;] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc]
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa07b41ef&amp;gt;] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc]
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa0e283a6&amp;gt;] mdt_enqueue+0x46/0xe0 [mdt]
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa0e2ea97&amp;gt;] mdt_handle_common+0x647/0x16d0 [mdt]
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa07d6bac&amp;gt;] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc]
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa0e683f5&amp;gt;] mds_regular_handle+0x15/0x20 [mdt]
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa07e63c8&amp;gt;] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa04eb5de&amp;gt;] ? cfs_timer_arm+0xe/0x10 [libcfs]
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa04fcd9f&amp;gt;] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa07dd729&amp;gt;] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
&amp;lt;4&amp;gt; [&amp;lt;ffffffff81055ad3&amp;gt;] ? __wake_up+0x53/0x70
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa07e775e&amp;gt;] ptlrpc_main+0xace/0x1700 [ptlrpc]
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa07e6c90&amp;gt;] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
&amp;lt;4&amp;gt; [&amp;lt;ffffffff8100c0ca&amp;gt;] child_rip+0xa/0x20
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa07e6c90&amp;gt;] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa07e6c90&amp;gt;] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
&amp;lt;4&amp;gt; [&amp;lt;ffffffff8100c0c0&amp;gt;] ? child_rip+0x0/0x20
&amp;lt;4&amp;gt;
&amp;lt;0&amp;gt;Kernel panic - not syncing: LBUG
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I also have a vmcore file for this crash, though none of the files in /tmp that I remember from 1.8 times, not sure if this is a 2.4 thing or related to the reboots, which happen even though kernel.panic=0.&lt;/p&gt;

&lt;p&gt;It doesn&apos;t make any difference if I mount with our without --abort-recovery, the LBUG happens within a minute of the file system coming back, every time.&lt;/p&gt;

&lt;p&gt;This test file system has been upgrade from 1.8 to 2.3 previously and was running 2.3 for a while. It is also possible that this has been upgraded from 1.6 initially, though I&apos;d have to check this.&lt;/p&gt;

&lt;p&gt;It might be of note that even though we moved the OSSes to a different network, we did not manage to shutdown all clients before the migration, so quite a few clients are likely trying to communicate with the OSSes using the old IPs and will fail.&lt;/p&gt;</description>
                <environment>file system upgraded from at least 1.8 to 2.3 and now 2.4.1, clients all running 1.8.9 currently, mostly Red Hat clients, Red Hat 6 servers</environment>
        <key id="21965">LU-4242</key>
            <summary>mdt_open.c:1685:mdt_reint_open()) LBUG</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="1" iconUrl="https://jira.whamcloud.com/images/icons/priorities/blocker.svg">Blocker</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="5">Cannot Reproduce</resolution>
                                        <assignee username="bogl">Bob Glossman</assignee>
                                    <reporter username="ferner">Frederik Ferner</reporter>
                        <labels>
                    </labels>
                <created>Mon, 11 Nov 2013 19:43:57 +0000</created>
                <updated>Tue, 4 Feb 2014 18:43:09 +0000</updated>
                            <resolved>Tue, 4 Feb 2014 18:43:09 +0000</resolved>
                                    <version>Lustre 2.4.1</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>5</watches>
                                                                            <comments>
                            <comment id="71268" author="ferner" created="Mon, 11 Nov 2013 21:32:40 +0000"  >&lt;p&gt;attempting to downgrade the MDS to 2.3.0 which it was running previously, results in another LBUG when attempting to mount the MDT. This one looks a bit like &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2888&quot; title=&quot;After downgrade from 2.4 to 2.1.4, hit (osd_handler.c:2343:osd_index_try()) ASSERTION( dt_object_exists(dt) ) failed&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2888&quot;&gt;&lt;del&gt;LU-2888&lt;/del&gt;&lt;/a&gt; or &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3639&quot; title=&quot;After downgrade from 2.5 to 2.3.0, hit (osd_handler.c:2720:osd_index_try()) ASSERTION( dt_object_exists(dt) ) failed&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3639&quot;&gt;&lt;del&gt;LU-3639&lt;/del&gt;&lt;/a&gt;:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;LustreError: 8822:0:(osd_handler.c:2720:osd_index_try()) ASSERTION( dt_object_ex
LustreError: 8822:0:(osd_handler.c:2720:osd_index_try()) LBUG
Pid: 8822, comm: llog_process_th

Call Trace:
 [&amp;lt;ffffffffa0570905&amp;gt;] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
 [&amp;lt;ffffffffa0570f17&amp;gt;] lbug_with_loc+0x47/0xb0 [libcfs]
 [&amp;lt;ffffffffa0f42735&amp;gt;] osd_index_try+0x175/0x620 [osd_ldiskfs]
 [&amp;lt;ffffffffa0a2dc08&amp;gt;] fld_index_init+0x88/0x4d0 [fld]
 [&amp;lt;ffffffffa0a2b13d&amp;gt;] ? fld_cache_init+0x14d/0x430 [fld]
 [&amp;lt;ffffffffa0a26a3e&amp;gt;] fld_server_init+0x29e/0x450 [fld]
 [&amp;lt;ffffffffa0e961b6&amp;gt;] mdt_fld_init+0x126/0x430 [mdt]
 [&amp;lt;ffffffffa0e9b326&amp;gt;] mdt_init0+0x8c6/0x23f0 [mdt]
 [&amp;lt;ffffffffa0571be0&amp;gt;] ? cfs_alloc+0x30/0x60 [libcfs]
 [&amp;lt;ffffffffa0e9cf43&amp;gt;] mdt_device_alloc+0xf3/0x220 [mdt]
 [&amp;lt;ffffffffa06b60d7&amp;gt;] obd_setup+0x1d7/0x2f0 [obdclass]
 [&amp;lt;ffffffffa06b63f8&amp;gt;] class_setup+0x208/0x890 [obdclass]
 [&amp;lt;ffffffffa06be08c&amp;gt;] class_process_config+0xc0c/0x1c30 [obdclass]
 [&amp;lt;ffffffffa0571be0&amp;gt;] ? cfs_alloc+0x30/0x60 [libcfs]
 [&amp;lt;ffffffffa06b7eb3&amp;gt;] ? lustre_cfg_new+0x353/0x7e0 [obdclass]
 [&amp;lt;ffffffffa06c015b&amp;gt;] class_config_llog_handler+0x9bb/0x1610 [obdclass]
 [&amp;lt;ffffffffa068e5f0&amp;gt;] ? llog_lvfs_next_block+0x2d0/0x650 [obdclass]
 [&amp;lt;ffffffffa0688970&amp;gt;] ? llog_process_thread+0x0/0xd00 [obdclass]
 [&amp;lt;ffffffffa06891f8&amp;gt;] llog_process_thread+0x888/0xd00 [obdclass]
 [&amp;lt;ffffffffa0688970&amp;gt;] ? llog_process_thread+0x0/0xd00 [obdclass]
 [&amp;lt;ffffffff8100c14a&amp;gt;] child_rip+0xa/0x20
 [&amp;lt;ffffffffa0688970&amp;gt;] ? llog_process_thread+0x0/0xd00 [obdclass]
 [&amp;lt;ffffffffa0688970&amp;gt;] ? llog_process_thread+0x0/0xd00 [obdclass]
 [&amp;lt;ffffffff8100c140&amp;gt;] ? child_rip+0x0/0x20

Kernel panic - not syncing: LBUG
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;and on a terminal:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;essage from syslogd@cs04r-sc-mds02-03 at Nov 11 21:21:30 ...
 kernel:LustreError: 8822:0:(osd_handler.c:2720:osd_index_try()) ASSERTION( dt_object_exists(dt) ) failed: 

Message from syslogd@cs04r-sc-mds02-03 at Nov 11 21:21:30 ...
 kernel:LustreError: 8822:0:(osd_handler.c:2720:osd_index_try()) LBUG

Message from syslogd@cs04r-sc-mds02-03 at Nov 11 21:21:30 ...
 kernel:Kernel panic - not syncing: LBUG
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="71274" author="bogl" created="Mon, 11 Nov 2013 22:16:56 +0000"  >&lt;p&gt;I&apos;m wondering if this is an instance of known bug &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2842&quot; title=&quot;osd_handler.c:3183:osd_remote_fid()) lustre-MDT0000-osd: Can not lookup fld for [0x1100000002000004:0x2000010:0x3b000000]&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2842&quot;&gt;&lt;del&gt;LU-2842&lt;/del&gt;&lt;/a&gt;. I believe there is a fix already in the b2_4 branch, but it went in after the 2.4.1 release.&lt;/p&gt;</comment>
                            <comment id="71276" author="di.wang" created="Mon, 11 Nov 2013 22:54:44 +0000"  >&lt;p&gt;Are there any other error console message before LBUG, if the object triggered this LBUG is a IGIF object(i.e. created in &amp;lt;= 1.8), then it is probably &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3934&quot; title=&quot;Directories gone missing after 2.4 update&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3934&quot;&gt;&lt;del&gt;LU-3934&lt;/del&gt;&lt;/a&gt;. &lt;a href=&quot;http://review.whamcloud.com/#/c/7625/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/7625/&lt;/a&gt; should fix this problem.&lt;/p&gt;</comment>
                            <comment id="71305" author="ferner" created="Tue, 12 Nov 2013 09:53:54 +0000"  >&lt;p&gt;This is the full contents starting with the new mount found in /var/crash/ after one of the initial LBUGs, not this is after tunefs.lustre --writeconf on all clients.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;&amp;lt;4&amp;gt;LDISKFS-fs warning (device dm-9): ldiskfs_multi_mount_protect: MMP interval 42 higher than expected, please wait.
&amp;lt;4&amp;gt;LDISKFS-fs (dm-10): warning: maximal mount count reached, running e2fsck is recommended
&amp;lt;6&amp;gt;LDISKFS-fs (dm-10): recovery complete
&amp;lt;6&amp;gt;LDISKFS-fs (dm-10): mounted filesystem with ordered data mode. quota=off. Opts: 
&amp;lt;4&amp;gt;
&amp;lt;3&amp;gt;LustreError: 137-5: play01-MDT0000_UUID: not available for connect from 172.23.132.8@tcp (no target)
&amp;lt;3&amp;gt;LustreError: 137-5: play01-MDT0000_UUID: not available for connect from 172.23.132.23@tcp (no target)
&amp;lt;3&amp;gt;LustreError: 137-5: play01-MDT0000_UUID: not available for connect from 172.23.140.11@tcp (no target)
&amp;lt;4&amp;gt;Lustre: 8320:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1384193891/real 1384193891]  req@ffff880237a6c800 x1451432468283396/t0(0) o250-&amp;gt;MGC172.23.138.36@tcp@0@lo:26/25 lens 400/544 e 0 to 1 dl 1384193896 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
&amp;lt;3&amp;gt;LustreError: 137-5: play01-MDT0000_UUID: not available for connect from 172.23.138.78@tcp (no target)
&amp;lt;3&amp;gt;LustreError: Skipped 1 previous similar message
&amp;lt;3&amp;gt;LustreError: 137-5: play01-MDT0000_UUID: not available for connect from 172.23.146.12@tcp (no target)
&amp;lt;3&amp;gt;LustreError: Skipped 1 previous similar message
&amp;lt;3&amp;gt;LustreError: 8338:0:(client.c:1052:ptlrpc_import_delay_req()) @@@ send limit expired   req@ffff880237a6c000 x1451432468283400/t0(0) o253-&amp;gt;MGC172.23.138.36@tcp@0@lo:26/25 lens 4768/4768 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
&amp;lt;3&amp;gt;LustreError: 8338:0:(obd_mount_server.c:1124:server_register_target()) play01-MDT0000: error registering with the MGS: rc = -5 (not fatal)
&amp;lt;3&amp;gt;LustreError: 8338:0:(client.c:1052:ptlrpc_import_delay_req()) @@@ send limit expired   req@ffff880237a6c000 x1451432468283404/t0(0) o101-&amp;gt;MGC172.23.138.36@tcp@0@lo:26/25 lens 328/344 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
&amp;lt;3&amp;gt;LustreError: 8338:0:(client.c:1052:ptlrpc_import_delay_req()) @@@ send limit expired   req@ffff880237a6c000 x1451432468283408/t0(0) o101-&amp;gt;MGC172.23.138.36@tcp@0@lo:26/25 lens 328/344 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
&amp;lt;4&amp;gt;Lustre: play01-MDT0000: used disk, loading
&amp;lt;3&amp;gt;LustreError: 8419:0:(sec_config.c:1115:sptlrpc_target_local_read_conf()) missing llog context
&amp;lt;4&amp;gt;Lustre: 8419:0:(mdt_handler.c:4948:mdt_process_config()) For interoperability, skip this mdt.quota_type. It is obsolete.
&amp;lt;4&amp;gt;Lustre: 8419:0:(mdt_handler.c:4948:mdt_process_config()) For interoperability, skip this mdt.group_upcall. It is obsolete.
&amp;lt;3&amp;gt;LustreError: 8338:0:(client.c:1052:ptlrpc_import_delay_req()) @@@ send limit expired   req@ffff880237a6c000 x1451432468283412/t0(0) o101-&amp;gt;MGC172.23.138.36@tcp@0@lo:26/25 lens 328/344 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
&amp;lt;4&amp;gt;Lustre: 8320:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1384193916/real 1384193916]  req@ffff88042a57d000 x1451432468283416/t0(0) o250-&amp;gt;MGC172.23.138.36@tcp@0@lo:26/25 lens 400/544 e 0 to 1 dl 1384193926 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
&amp;lt;3&amp;gt;LustreError: 8338:0:(client.c:1052:ptlrpc_import_delay_req()) @@@ send limit expired   req@ffff880237a6c000 x1451432468283420/t0(0) o101-&amp;gt;MGC172.23.138.36@tcp@0@lo:26/25 lens 328/344 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
&amp;lt;4&amp;gt;Lustre: 8320:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1384193926/real 1384193926]  req@ffff88043bb77400 x1451432468283424/t0(0) o38-&amp;gt;play01-MDT0000-lwp-MDT0000@172.23.138.19@tcp:12/10 lens 400/544 e 0 to 1 dl 1384193931 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
&amp;lt;4&amp;gt;LDISKFS-fs (dm-9): warning: maximal mount count reached, running e2fsck is recommended
&amp;lt;6&amp;gt;LDISKFS-fs (dm-9): recovery complete
&amp;lt;6&amp;gt;LDISKFS-fs (dm-9): mounted filesystem with ordered data mode. quota=off. Opts: 
&amp;lt;4&amp;gt;Lustre: play01-MDT0000: Will be in recovery for at least 5:00, or until 267 clients reconnect
&amp;lt;4&amp;gt;Lustre: MGS: Regenerating play01-OST0005 log by user request.
&amp;lt;6&amp;gt;Lustre: Setting parameter play01-OST0005.ost.quota_type in log play01-OST0005
&amp;lt;4&amp;gt;Lustre: MGS: Regenerating play01-OST0003 log by user request.
&amp;lt;6&amp;gt;Lustre: Setting parameter play01-OST0003.ost.quota_type in log play01-OST0003
&amp;lt;4&amp;gt;Lustre: MGS: Regenerating play01-OST0001 log by user request.
&amp;lt;6&amp;gt;Lustre: Setting parameter play01-OST0001.ost.quota_type in log play01-OST0001
&amp;lt;3&amp;gt;LustreError: 8446:0:(mdt_open.c:1497:mdt_reint_open()) @@@ OPEN &amp;amp; CREAT not in open replay/by_fid.  req@ffff88042a15c000 x1433645786902677/t0(167503724545) o101-&amp;gt;9b75b3ec-58d8-9e5e-4031-acd3744332ea@172.23.142.163@tcp:0/0 lens 664/1272 e 0 to 0 dl 1384193985 ref 1 fl Complete:/4/0 rc 0/0
&amp;lt;3&amp;gt;LustreError: 8446:0:(mdt_open.c:1497:mdt_reint_open()) @@@ OPEN &amp;amp; CREAT not in open replay/by_fid.  req@ffff880429f09000 x1433645786902678/t0(167503724546) o101-&amp;gt;9b75b3ec-58d8-9e5e-4031-acd3744332ea@172.23.142.163@tcp:0/0 lens 664/1272 e 0 to 0 dl 1384194007 ref 1 fl Complete:/4/0 rc 0/0
&amp;lt;0&amp;gt;LustreError: 8518:0:(mdt_open.c:1685:mdt_reint_open()) LBUG
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
</comment>
                            <comment id="71323" author="bogl" created="Tue, 12 Nov 2013 15:08:48 +0000"  >&lt;p&gt;Just to be sure I&apos;m interpreting correctly, the log shown above is from the LBUG seen when you tried to upgrade.  Not the one from when you tried to downgrade again from 2.4.1 to 2.3.  Is that right?&lt;/p&gt;

&lt;p&gt;Have you tried the patch that Di Wang suggested, or is this still without any patches?&lt;/p&gt;</comment>
                            <comment id="71374" author="ferner" created="Tue, 12 Nov 2013 21:07:44 +0000"  >&lt;p&gt;Bob,&lt;/p&gt;

&lt;p&gt;you are correct, the additional errors from my previous comment were from the original LBUG running 2.4.1.&lt;/p&gt;

&lt;p&gt;We&apos;ve not had a chance to run with the patch, we&apos;ve been very busy today with other work. We&apos;ve not yet tried to bring the file system in question back today, though this will be a task for tomorrow. &lt;/p&gt;

&lt;p&gt;Is there a link to RPMs containing that patch that I could try or do I need to build the server myself? The one linked to from the review page doesn&apos;t seem to exist anymore. Or should I just try a more recent build from b2_4 branch in jenkins? &lt;/p&gt;
</comment>
                            <comment id="71379" author="bogl" created="Tue, 12 Nov 2013 22:57:40 +0000"  >&lt;p&gt;Frederik,&lt;/p&gt;

&lt;p&gt;If you use a more recent build from b2_4 it may already have the suggested patch in it.  The most recent certainly does.  However I can&apos;t recommend that as the best course.  The latest prebuilt rpms may not be for your exact kernel version, they may be for a later version.  Also the latest b2_4 has many patches in it on the way to our upcoming 2.4.2 release besides the one you want and hasn&apos;t yet been through a full and complete test cycle for release.&lt;/p&gt;

&lt;p&gt;Your safest course is to pull the tagged 2.4.1 lustre source, install that, add just patch recommended by Di Wang, and build the server yourself against the kernel you are already running on your RedHat or Centos servers.&lt;/p&gt;</comment>
                            <comment id="71527" author="ferner" created="Thu, 14 Nov 2013 14:25:06 +0000"  >&lt;p&gt;Bob,&lt;/p&gt;

&lt;p&gt;I&apos;ve now reproduced this using the jenkins build #47 for be b2_4 branch (&lt;a href=&quot;http://build.whamcloud.com/job/lustre-b2_4/47/arch=x86_64,build_type=server,distro=el6,ib_stack=inkernel/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://build.whamcloud.com/job/lustre-b2_4/47/arch=x86_64,build_type=server,distro=el6,ib_stack=inkernel/&lt;/a&gt;) as this is the first one with the patch included as far as I can see and my own build didn&apos;t even boot.&lt;/p&gt;

&lt;p&gt;So with this build:&lt;/p&gt;

&lt;p&gt;Lustre: Lustre: Build Version: jenkins-arch=x86_64,build_type=server,distro=el6,ib_stack=inkernel-47-g63f298d-PRISTINE-2.6.32-358.18.1.el6_lustre.g63f298d.x86_64&lt;/p&gt;

&lt;p&gt;it still crashes as soon as recovery is complete with LBUG, (nearly) same LustreError:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;LustreError: 6172:0:(mdt_open.c:1692:mdt_reint_open()) LBUG
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
</comment>
                            <comment id="71553" author="ferner" created="Thu, 14 Nov 2013 18:39:04 +0000"  >&lt;p&gt;I have now managed to get the file system back using the latest jenkins build (#51) on the MDS. I&apos;m currently running this following on the MDT and the LBUG directly after recovery has not happened.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[bnh65367@cs04r-sc-mds02-03 ~]$ cat /proc/fs/lustre/version 
lustre: 2.4.1
kernel: patchless_client
build:  jenkins-arch=x86_64,build_type=server,distro=el6,ib_stack=inkernel-51-g5ee03f6-PRISTINE-2.6.32-358.18.1.el6_lustre.gf81b846.x86_64
[bnh65367@cs04r-sc-mds02-03 ~]$ 
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
</comment>
                            <comment id="71581" author="pjones" created="Thu, 14 Nov 2013 21:25:55 +0000"  >&lt;p&gt;Frederik&lt;/p&gt;

&lt;p&gt;So are you comfortable that this seems likely to be a duplicate of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3934&quot; title=&quot;Directories gone missing after 2.4 update&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3934&quot;&gt;&lt;del&gt;LU-3934&lt;/del&gt;&lt;/a&gt; and if you defer your upgrade until the upcoming 2.4.2 you will avoid this issue?&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="71583" author="di.wang" created="Thu, 14 Nov 2013 21:48:36 +0000"  >&lt;p&gt;Frederik, Could you please provide us debug log for this LBUG. So you need&lt;br/&gt;
1. setup MDT, then &lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;lctl set_param panic_on_lbug=0
lctl set_param debug_mb=30
lctl set_param debug=-1
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Note: these should be done before LBUG happens.&lt;br/&gt;
2. When LBUG happens, it should dump some debug log somewhere(usually under /tmp), you can see this from console message.  Could you please put this debug log somewhere(ftp.whamcloud.com)? Thanks!&lt;/p&gt;
</comment>
                            <comment id="71958" author="ferner" created="Wed, 20 Nov 2013 15:06:44 +0000"  >&lt;p&gt;I&apos;ve dropped our MDS in this file system back to 2.4.1 but can no longer reproduce the problem. Sorry I didn&apos;t collect debug logs earlier. &lt;/p&gt;

&lt;p&gt;I&apos;m not fully convinced it is a duplicate of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3934&quot; title=&quot;Directories gone missing after 2.4 update&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3934&quot;&gt;&lt;del&gt;LU-3934&lt;/del&gt;&lt;/a&gt;, as my initial attempt to run with the version that had the suggested patch included didn&apos;t fix the problem, but a later version did fix the problem. Before then it was happening on every attempt to mount the MDT.&lt;/p&gt;

&lt;p&gt;Unfortunately now I have a different problem, somehow my clients don&apos;t succeed in connecting to all OSTs anymore (note that each client seems to be able to connect to at least one OST for each OSS in this file system but not all of them) I guess I&apos;ll open a separate ticket for that.&lt;/p&gt;</comment>
                            <comment id="76200" author="jfc" created="Tue, 4 Feb 2014 18:43:09 +0000"  >&lt;p&gt;Customer cannot reproduce and has opened another ticket for a related problem.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="22184">LU-4282</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzw8m7:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>11548</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>