<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:29:04 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-2888] After downgrade from 2.4 to 2.1.4, hit (osd_handler.c:2343:osd_index_try()) ASSERTION( dt_object_exists(dt) ) failed</title>
                <link>https://jira.whamcloud.com/browse/LU-2888</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Here are what I did:&lt;br/&gt;
1. format the system as 2.1.4 and then upgrade to 2.4, success.&lt;br/&gt;
2. showdown the filesystem and disable quota&lt;br/&gt;
3. downgrade the system to 2.1.4 again, when mount MDS, hit following errors&lt;/p&gt;

&lt;p&gt;Here is the console of MDS:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Lustre: DEBUG MARKER: == upgrade-downgrade End == 18:53:45 (1362020025)
LDISKFS-fs warning (device sdb1): ldiskfs_fill_super: extents feature not enabled on this filesystem, use tune2fs.
LDISKFS-fs (sdb1): mounted filesystem with ordered data mode. Opts: 
LDISKFS-fs warning (device sdb1): ldiskfs_fill_super: extents feature not enabled on this filesystem, use tune2fs.
LDISKFS-fs (sdb1): mounted filesystem with ordered data mode. Opts: 
LDISKFS-fs warning (device sdb1): ldiskfs_fill_super: extents feature not enabled on this filesystem, use tune2fs.
LDISKFS-fs (sdb1): mounted filesystem with ordered data mode. Opts: 
Lustre: MGS MGS started
Lustre: 7888:0:(ldlm_lib.c:952:target_handle_connect()) MGS: connection from 7306ea48-8511-52b2-40cf-6424fc417e41@0@lo t0 exp (null) cur 1362020029 last 0
Lustre: MGC10.10.4.132@tcp: Reactivating import
Lustre: MGS: Logs for fs lustre were removed by user request.  All servers must be restarted in order to regenerate the logs.
Lustre: Setting parameter lustre-MDT0000-mdtlov.lov.stripesize in log lustre-MDT0000
Lustre: Setting parameter lustre-clilov.lov.stripesize in log lustre-client
Lustre: Enabling ACL
Lustre: Enabling user_xattr
LustreError: 7901:0:(osd_handler.c:2343:osd_index_try()) ASSERTION( dt_object_exists(dt) ) failed: 
LustreError: 7901:0:(osd_handler.c:2343:osd_index_try()) LBUG
Pid: 7901, comm: llog_process_th

Message from
Call Trace:
 syslogd@fat-amd [&amp;lt;ffffffffa03797f5&amp;gt;] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
-1 at Feb 27 18: [&amp;lt;ffffffffa0379e07&amp;gt;] lbug_with_loc+0x47/0xb0 [libcfs]
53:49 ...
 ker [&amp;lt;ffffffffa0d6bd74&amp;gt;] osd_index_try+0x84/0x540 [osd_ldiskfs]
nel:LustreError: [&amp;lt;ffffffffa04c1dfe&amp;gt;] dt_try_as_dir+0x3e/0x60 [obdclass]
 7901:0:(osd_han [&amp;lt;ffffffffa0c5eb3a&amp;gt;] orph_index_init+0x6a/0x1e0 [mdd]
dler.c:2343:osd_ [&amp;lt;ffffffffa0c6ec45&amp;gt;] mdd_prepare+0x1d5/0x640 [mdd]
index_try()) ASS [&amp;lt;ffffffffa0ccd23c&amp;gt;] ? mdt_process_config+0x6c/0x1030 [mdt]
ERTION( dt_objec [&amp;lt;ffffffffa0da0499&amp;gt;] cmm_prepare+0x39/0xe0 [cmm]
t_exists(dt) ) f [&amp;lt;ffffffffa0ccfd7d&amp;gt;] mdt_device_alloc+0xe0d/0x2190 [mdt]
ailed: 

Me [&amp;lt;ffffffffa04bdeff&amp;gt;] ? keys_fill+0x6f/0x1a0 [obdclass]
ssage from syslo [&amp;lt;ffffffffa04a2c87&amp;gt;] obd_setup+0x1d7/0x2f0 [obdclass]
gd@fat-amd-1 at  [&amp;lt;ffffffffa048ef3b&amp;gt;] ? class_new_export+0x72b/0x960 [obdclass]
Feb 27 18:53:49  [&amp;lt;ffffffffa04a2fa8&amp;gt;] class_setup+0x208/0x890 [obdclass]
...
 kernel:Lu [&amp;lt;ffffffffa04aac6c&amp;gt;] class_process_config+0xc3c/0x1c30 [obdclass]
streError: 7901: [&amp;lt;ffffffffa037a993&amp;gt;] ? cfs_alloc+0x63/0x90 [libcfs]
0:(osd_handler.c [&amp;lt;ffffffffa04a5813&amp;gt;] ? lustre_cfg_new+0x353/0x7e0 [obdclass]
:2343:osd_index_ [&amp;lt;ffffffffa04acd0b&amp;gt;] class_config_llog_handler+0x9bb/0x1610 [obdclass]
try()) LBUG
 [&amp;lt;ffffffffa0637e3b&amp;gt;] ? llog_client_next_block+0x1db/0x4b0 [ptlrpc]
 [&amp;lt;ffffffffa0478098&amp;gt;] llog_process_thread+0x888/0xd00 [obdclass]
 [&amp;lt;ffffffffa0477810&amp;gt;] ? llog_process_thread+0x0/0xd00 [obdclass]
 [&amp;lt;ffffffff8100c14a&amp;gt;] child_rip+0xa/0x20
 [&amp;lt;ffffffffa0477810&amp;gt;] ? llog_process_thread+0x0/0xd00 [obdclass]
 [&amp;lt;ffffffffa0477810&amp;gt;] ? llog_process_thread+0x0/0xd00 [obdclass]
 [&amp;lt;ffffffff8100c140&amp;gt;] ? child_rip+0x0/0x20

Kernel panic - not syncing: LBUG
Pid: 7901, comm: llog_process_th Not tainted 2.6.32-279.14.1.el6_lustre.x86_64 #1
Call Trace:

 [&amp;lt;ffffffff814fdcba&amp;gt;] ? panic+0xa0/0x168
Message from sy [&amp;lt;ffffffffa0379e5b&amp;gt;] ? lbug_with_loc+0x9b/0xb0 [libcfs]
slogd@fat-amd-1  [&amp;lt;ffffffffa0d6bd74&amp;gt;] ? osd_index_try+0x84/0x540 [osd_ldiskfs]
at Feb 27 18:53: [&amp;lt;ffffffffa04c1dfe&amp;gt;] ? dt_try_as_dir+0x3e/0x60 [obdclass]
49 ...
 kernel [&amp;lt;ffffffffa0c5eb3a&amp;gt;] ? orph_index_init+0x6a/0x1e0 [mdd]
:Kernel panic -  [&amp;lt;ffffffffa0c6ec45&amp;gt;] ? mdd_prepare+0x1d5/0x640 [mdd]
not syncing: LBU [&amp;lt;ffffffffa0ccd23c&amp;gt;] ? mdt_process_config+0x6c/0x1030 [mdt]
G
 [&amp;lt;ffffffffa0da0499&amp;gt;] ? cmm_prepare+0x39/0xe0 [cmm]
 [&amp;lt;ffffffffa0ccfd7d&amp;gt;] ? mdt_device_alloc+0xe0d/0x2190 [mdt]
 [&amp;lt;ffffffffa04bdeff&amp;gt;] ? keys_fill+0x6f/0x1a0 [obdclass]
 [&amp;lt;ffffffffa04a2c87&amp;gt;] ? obd_setup+0x1d7/0x2f0 [obdclass]
 [&amp;lt;ffffffffa048ef3b&amp;gt;] ? class_new_export+0x72b/0x960 [obdclass]
 [&amp;lt;ffffffffa04a2fa8&amp;gt;] ? class_setup+0x208/0x890 [obdclass]
 [&amp;lt;ffffffffa04aac6c&amp;gt;] ? class_process_config+0xc3c/0x1c30 [obdclass]
 [&amp;lt;ffffffffa037a993&amp;gt;] ? cfs_alloc+0x63/0x90 [libcfs]
 [&amp;lt;ffffffffa04a5813&amp;gt;] ? lustre_cfg_new+0x353/0x7e0 [obdclass]
 [&amp;lt;ffffffffa04acd0b&amp;gt;] ? class_config_llog_handler+0x9bb/0x1610 [obdclass]
 [&amp;lt;ffffffffa0637e3b&amp;gt;] ? llog_client_next_block+0x1db/0x4b0 [ptlrpc]
 [&amp;lt;ffffffffa0478098&amp;gt;] ? llog_process_thread+0x888/0xd00 [obdclass]
 [&amp;lt;ffffffffa0477810&amp;gt;] ? llog_process_thread+0x0/0xd00 [obdclass]
 [&amp;lt;ffffffff8100c14a&amp;gt;] ? child_rip+0xa/0x20
 [&amp;lt;ffffffffa0477810&amp;gt;] ? llog_process_thread+0x0/0xd00 [obdclass]
 [&amp;lt;ffffffffa0477810&amp;gt;] ? llog_process_thread+0x0/0xd00 [obdclass]
 [&amp;lt;ffffffff8100c140&amp;gt;] ? child_rip+0x0/0x20
Initializing cgroup subsys cpuset
Initializing cgroup subsys cpu
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</description>
                <environment>before upgrade, server and client: 2.1.4 RHEL6&lt;br/&gt;
after upgrade, server and client: lustre-master build# 1270 RHEL6</environment>
        <key id="17733">LU-2888</key>
            <summary>After downgrade from 2.4 to 2.1.4, hit (osd_handler.c:2343:osd_index_try()) ASSERTION( dt_object_exists(dt) ) failed</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="bobijam">Zhenyu Xu</assignee>
                                    <reporter username="sarah">Sarah Liu</reporter>
                        <labels>
                    </labels>
                <created>Wed, 27 Feb 2013 22:02:35 +0000</created>
                <updated>Thu, 25 Jul 2013 20:07:10 +0000</updated>
                            <resolved>Thu, 18 Apr 2013 06:28:51 +0000</resolved>
                                    <version>Lustre 2.4.0</version>
                    <version>Lustre 2.1.4</version>
                                    <fixVersion>Lustre 2.4.0</fixVersion>
                    <fixVersion>Lustre 2.1.6</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>14</watches>
                                                                            <comments>
                            <comment id="53211" author="bobijam" created="Fri, 1 Mar 2013 02:36:52 +0000"  >&lt;p&gt;I doubt it&apos;s related to new OI index file changed in 2.4&lt;/p&gt;

&lt;p&gt;In 2.14, the calling path is mdd_prepare()-&amp;gt; orph_index_init() -&amp;gt; dt_store_open() // to open a dt_object named as &quot;PENDING&quot;&lt;br/&gt;
-&amp;gt;dt_reg_open() -&amp;gt;dt_locate() -&amp;gt;lu_object_find() -&amp;gt;lu_object_find_at() -&amp;gt;lu_object_find_try -&amp;gt;lu_object_alloc -&amp;gt;loo_object_init()       // Initialize &quot;PENDING&quot; dt_object&lt;/p&gt;

&lt;p&gt;-&amp;gt;osd_object_init() -&amp;gt;osd_fid_lookup()         // lookup &quot;PENDING&quot; dt_object&lt;/p&gt;

 &lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeHeader panelHeader&quot; style=&quot;border-bottom-width: 1px;&quot;&gt;&lt;b&gt;osd_oi_lookup()&lt;/b&gt;&lt;/div&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;        result = osd_oi_lookup(info, oi, fid, id); 
        &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (result == 0) {                         
           ......
        } &lt;span class=&quot;code-keyword&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (result == -ENOENT)
                result = 0;

&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;If in OI index file, &quot;PENDING&quot; is not found there (2.4 changed OI index mechanism, could it be the reason?) osd_oi_lookup() returns 0, then osd_object_init() won&apos;t call osd_object_init0() and &quot;PENDING&quot; dt_object does not has LOHA_EXISTS in its common attributes.&lt;/p&gt;</comment>
                            <comment id="53357" author="adilger" created="Tue, 5 Mar 2013 14:18:22 +0000"  >&lt;p&gt;&quot;PENDING&quot; should definitely exist, and I&apos;d think it would be in the special &quot;lookup by name&quot; list instead of needing the OI?&lt;/p&gt;</comment>
                            <comment id="53415" author="bobijam" created="Wed, 6 Mar 2013 02:50:54 +0000"  >&lt;p&gt;Add some log, which partially fit my description. During orph_index_init(), it find out that PENDING object has been created.&lt;/p&gt;

&lt;div class=&quot;panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;panelContent&quot;&gt;
&lt;p&gt;00000004:00000001:0.0:1362551173.779945:0:4818:0:(mdd_orphans.c:470:orph_index_init()) Process entered&lt;br/&gt;
00000020:00000001:0.0:1362551173.779945:0:4818:0:(dt_object.c:377:dt_store_open()) Process entered&lt;br/&gt;
00000020:00000001:0.0:1362551173.779952:0:4818:0:(dt_object.c:352:dt_reg_open()) Process entered&lt;br/&gt;
00000020:00000001:0.0:1362551173.779952:0:4818:0:(dt_object.c:222:dt_lookup()) Process entered&lt;br/&gt;
00000004:00000001:0.0:1362551173.779952:0:4818:0:(osd_handler.c:3804:osd_index_ea_lookup()) Process entered&lt;br/&gt;
00000004:00000001:0.0:1362551173.779953:0:4818:0:(osd_handler.c:2753:osd_get_fid_from_dentry()) Process leaving (rc=18446744073709551555 : -61 : ffffffffffffffc3)&lt;br/&gt;
00000004:00000001:0.0:1362551173.779954:0:4818:0:(osd_handler.c:1903:osd_ea_fid_get()) Process entered&lt;br/&gt;
00000004:00000001:0.0:1362551173.779955:0:4818:0:(osd_handler.c:1939:osd_ea_fid_get()) Process leaving (rc=0 : 0 : 0)&lt;br/&gt;
00000004:00000001:0.0:1362551173.779955:0:4818:0:(osd_handler.c:3099:osd_ea_lookup_rec()) Process leaving (rc=0 : 0 : 0)&lt;br/&gt;
00000004:00000001:0.0:1362551173.779956:0:4818:0:(osd_handler.c:3816:osd_index_ea_lookup()) Process leaving (rc=1 : 1 : 1)&lt;br/&gt;
00000020:00000001:0.0:1362551173.779956:0:4818:0:(dt_object.c:232:dt_lookup()) Process leaving (rc=0 : 0 : 0)&lt;br/&gt;
00000020:00000002:0.0:1362551173.779957:0:4818:0:(dt_object.c:355:dt_reg_open()) dt_locate &lt;span class=&quot;error&quot;&gt;&amp;#91;0x200000001:0x7:0x0&amp;#93;&lt;/span&gt;   &lt;font color=&quot;red&quot;&gt;==&amp;gt; PENDING fid&lt;/font&gt;&lt;br/&gt;
00000020:00000001:0.0:1362551173.779957:0:4818:0:(dt_object.c:245:dt_locate()) Process entered&lt;br/&gt;
00000020:00000001:0.0:1362551173.779957:0:4818:0:(lu_object.c:621:lu_object_find_at()) Process entered&lt;br/&gt;
00000020:00000001:0.0:1362551173.779958:0:4818:0:(lu_object.c:550:lu_object_find_try()) Process entered&lt;br/&gt;
00000020:00000001:0.0:1362551173.779959:0:4818:0:(lu_object.c:573:lu_object_find_try()) Process leaving (rc=18446612133311027672 : -131940398523944 : ffff88003b6a19d8)  &lt;font color=&quot;red&quot;&gt;==&amp;gt; found it in the hash table&lt;/font&gt;&lt;br/&gt;
00000020:00000001:0.0:1362551173.779959:0:4818:0:(lu_object.c:625:lu_object_find_at()) Process leaving (rc=18446612133311027672 : -131940398523944 : ffff88003b6a19d8)&lt;br/&gt;
00000020:00000001:0.0:1362551173.779960:0:4818:0:(dt_object.c:253:dt_locate()) Process leaving (rc=18446612133330352192 : -131940379199424 : ffff88003c90f840)&lt;br/&gt;
00000020:00000001:0.0:1362551173.779961:0:4818:0:(dt_object.c:360:dt_reg_open()) Process leaving (rc=18446612133330352192 : -131940379199424 : ffff88003c90f840)&lt;br/&gt;
00000020:00000001:0.0:1362551173.779961:0:4818:0:(lustre_fid.h:375:fid_flatten32()) Process leaving (rc=4261420800 : 4261420800 : fe001f00)&lt;br/&gt;
00000020:00000001:0.0:1362551173.779962:0:4818:0:(dt_object.c:387:dt_store_open()) Process leaving (rc=18446612133330352192 : -131940379199424 : ffff88003c90f840)&lt;br/&gt;
00000004:00020000:0.0:1362551173.779963:0:4818:0:(osd_handler.c:2345:osd_index_try()) dt does not exists&lt;br/&gt;
00000004:00020000:0.0:1362551173.780032:0:4818:0:(mdd_orphans.c:478:orph_index_init()) &quot;PENDING&quot; is not an index! : rc = -20&lt;/p&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;so I look back to when this object was created, and I find that the PENDING dt_object was created in osd_prepare() =&amp;gt; llo_local_objects_setup() =&amp;gt; llo_store_create_index() =&amp;gt; llo_create_obj() &lt;/p&gt;

&lt;div class=&quot;panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;panelContent&quot;&gt;
&lt;p&gt;00000004:00000001:0.0:1362555707.896340:0:4934:0:(mdt_handler.c:4881:mdt_object_init()) Process leaving (rc=0 : 0 : 0)&lt;br/&gt;
00000004:00000001:0.0:1362555707.896341:0:4934:0:(cmm_object.c:187:cml_object_init()) Process entered&lt;br/&gt;
00000004:00000010:0.0:1362555707.896341:0:4934:0:(mdd_object.c:248:mdd_object_alloc()) kmalloced &apos;mdd_obj&apos;: 136 at ffff88003d75cb40.&lt;br/&gt;
00000004:00000001:0.0:1362555707.896341:0:4934:0:(cmm_object.c:209:cml_object_init()) Process leaving (rc=0 : 0 : 0)&lt;br/&gt;
00000004:00000001:0.0:1362555707.896342:0:4934:0:(mdd_object.c:271:mdd_object_init()) Process entered&lt;br/&gt;
00000004:00000010:0.0:1362555707.896342:0:4934:0:(osd_handler.c:298:osd_object_alloc()) kmalloced &apos;mo&apos;: 192 at ffff88003d75cc00.&lt;br/&gt;
00000004:00000001:0.0:1362555707.896343:0:4934:0:(mdd_object.c:282:mdd_object_init()) Process leaving (rc=0 : 0 : 0)&lt;br/&gt;
00000004:00000001:0.0:1362555707.896343:0:4934:0:(osd_handler.c:444:osd_object_init()) Process entered&lt;br/&gt;
00000004:00000001:0.0:1362555707.896344:0:4934:0:(osd_handler.c:386:osd_fid_lookup()) Process entered&lt;br/&gt;
00000004:00000002:0.0:1362555707.896344:0:4934:0:(osd_handler.c:397:osd_fid_lookup()) osd_oi_lookup &lt;span class=&quot;error&quot;&gt;&amp;#91;0x200000001:0x7:0x0&amp;#93;&lt;/span&gt; returns -2 &lt;font color=&quot;red&quot;&gt;&lt;/font&gt;oi lookup cannot find the obj&lt;br/&gt;
00000004:00000001:0.0:1362555707.896344:0:4934:0:(osd_handler.c:421:osd_fid_lookup()) Process leaving (rc=0 : 0 : 0)&lt;br/&gt;
00000004:00000001:0.0:1362555707.896345:0:4934:0:(osd_handler.c:453:osd_object_init()) Process leaving (rc=0 : 0 : 0)&lt;/p&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;check out b2_1 version of osd_oi_lookup&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeHeader panelHeader&quot; style=&quot;border-bottom-width: 1px;&quot;&gt;&lt;b&gt;osd_oi_lookup&lt;/b&gt;&lt;/div&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;        &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (fid_seq(fid) == FID_SEQ_LOCAL_FILE)
                &lt;span class=&quot;code-keyword&quot;&gt;return&lt;/span&gt; -ENOENT;
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;2.1 does not use oi for local file lookup, that&apos;s the root cause.&lt;/p&gt;</comment>
                            <comment id="53491" author="yong.fan" created="Wed, 6 Mar 2013 21:45:44 +0000"  >&lt;p&gt;Try to remove the check &quot;if (fid_seq(fid) == FID_SEQ_LOCAL_FILE)&quot; from &quot;osd_oi_lookup()&quot;.&lt;/p&gt;</comment>
                            <comment id="53493" author="bobijam" created="Wed, 6 Mar 2013 22:19:18 +0000"  >&lt;p&gt;For the record, my test shows that removing the check works on this scenario, but panics on 2.1.4 formatted system.&lt;/p&gt;</comment>
                            <comment id="53848" author="adilger" created="Wed, 13 Mar 2013 00:35:29 +0000"  >&lt;p&gt;To clarify, in 2.4 the &quot;PENDING&quot; FID is stored in the OI file, but in 2.1 it is not?  And 2.1 does not like this?&lt;/p&gt;

&lt;p&gt;Is it still possible to get a patch into 2.1.5 to fix this?  Is this caused by the initial LFSCK scan added in 2.4 to fix the OI for local objects?&lt;/p&gt;</comment>
                            <comment id="53849" author="yong.fan" created="Wed, 13 Mar 2013 00:52:57 +0000"  >&lt;p&gt;In 2.4, the &quot;PENDING&quot; FID is added into local file, but it is NOT in 2.1. We can back port OI scrub related patches (including initial OI scrub) to b2_1, part of the back porting are as following:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://review.whamcloud.com/#change,4620&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#change,4620&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;http://review.whamcloud.com/#change,4621&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#change,4621&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;http://review.whamcloud.com/#change,4623&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#change,4623&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;http://review.whamcloud.com/#change,4624&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#change,4624&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;http://review.whamcloud.com/#change,4626&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#change,4626&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;http://review.whamcloud.com/#change,4627&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#change,4627&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;http://review.whamcloud.com/#change,4628&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#change,4628&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;http://review.whamcloud.com/#change,4629&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#change,4629&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;http://review.whamcloud.com/#change,4630&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#change,4630&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I do not think the back porting can be ready in a short time. So the better way is to fix b2_1 with removing some check. If there are other bugs after removing, we can try to resolve them one by one. I think it will be faster than the whole back porting.&lt;/p&gt;</comment>
                            <comment id="54084" author="green" created="Fri, 15 Mar 2013 02:06:43 +0000"  >&lt;p&gt;Well, we should not really crash unfixed 2.1 servers.&lt;br/&gt;
and 2.1.5 should be released any moment now too so there&apos;s not tiem for any development to fit there.&lt;br/&gt;
We need to think up something in 2.4, and fast I think&lt;/p&gt;</comment>
                            <comment id="54085" author="bobijam" created="Fri, 15 Mar 2013 02:10:49 +0000"  >&lt;p&gt;just posted a 2.1 patch to port only necessary ldiskfs based OI implementation at &lt;a href=&quot;http://review.whamcloud.com/5731&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/5731&lt;/a&gt; &lt;/p&gt;</comment>
                            <comment id="54516" author="jlevi" created="Wed, 20 Mar 2013 21:47:47 +0000"  >&lt;p&gt;Will there also be a patch for master? &lt;/p&gt;</comment>
                            <comment id="54883" author="bobijam" created="Wed, 27 Mar 2013 03:42:37 +0000"  >&lt;p&gt;Jodi,&lt;/p&gt;

&lt;p&gt;No, master does not need a patch, this patch is to make 2.1.4 server OI implementation align to that of master.&lt;/p&gt;</comment>
                            <comment id="54940" author="green" created="Wed, 27 Mar 2013 17:54:38 +0000"  >&lt;p&gt;Can yu please rebase this on top of current b2_1?&lt;/p&gt;</comment>
                            <comment id="54985" author="bobijam" created="Thu, 28 Mar 2013 04:26:40 +0000"  >&lt;p&gt;ok, and done&lt;/p&gt;</comment>
                            <comment id="55267" author="yujian" created="Tue, 2 Apr 2013 10:39:23 +0000"  >&lt;blockquote&gt;&lt;p&gt;just posted a 2.1 patch to port only necessary ldiskfs based OI implementation at &lt;a href=&quot;http://review.whamcloud.com/5731&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/5731&lt;/a&gt;&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;Here are the test configuration and result:&lt;/p&gt;

&lt;p&gt;Lustre b2_1 build: &lt;a href=&quot;http://build.whamcloud.com/job/lustre-reviews/14375/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://build.whamcloud.com/job/lustre-reviews/14375/&lt;/a&gt;&lt;br/&gt;
Lustre master build: &lt;a href=&quot;http://build.whamcloud.com/job/lustre-master/1369/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://build.whamcloud.com/job/lustre-master/1369/&lt;/a&gt;&lt;br/&gt;
Distro/Arch: RHEL6.3/x86_64&lt;/p&gt;

&lt;p&gt;Clean upgrade and downgrade path: b2_1-&amp;gt;master-&amp;gt;b2_1&lt;/p&gt;

&lt;p&gt;After downgrading from master to b2_1, mounting the server targets and clients succeeded. However, on the MDS node, &quot;lctl dl&quot; showed that:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@wtm-83 ~]# lctl dl
  0 UP mgs MGS MGS 15
  1 UP mgc MGC10.10.19.8@tcp ac763461-679d-82b7-e00e-7e3d7f5e6234 5
  2 UP lov lustre-MDT0000-mdtlov lustre-MDT0000-mdtlov_UUID 4
  3 UP mdt lustre-MDT0000 lustre-MDT0000_UUID 9
  4 UP mds mdd_obd-lustre-MDT0000 mdd_obd_uuid-lustre-MDT0000 3
  5 IN osc lustre-OST0000-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 5
  6 IN osc lustre-OST0001-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 5
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Dmesg on the MDS node showed that:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Lustre: MGS MGS started
Lustre: 10437:0:(ldlm_lib.c:952:target_handle_connect()) MGS: connection from ac763461-679d-82b7-e00e-7e3d7f5e6234@0@lo t0 exp (null) cur 1365015074 last 0
Lustre: MGC10.10.19.8@tcp: Reactivating import
Lustre: Enabling ACL
Lustre: Enabling user_xattr
Lustre: lustre-MDT0000: used disk, loading
Lustre: 10441:0:(mdt_lproc.c:416:lprocfs_wr_identity_upcall()) lustre-MDT0000: identity upcall set to /usr/sbin/l_getidentity
LustreError: 10441:0:(llog_lvfs.c:199:llog_lvfs_read_header()) bad log / header magic: 0x2e (expected 0x10645539)
LustreError: 10441:0:(llog_obd.c:220:llog_setup_named()) obd lustre-OST0000-osc-MDT0000 ctxt 2 lop_setup=ffffffffa0562ca0 failed -5
LustreError: 10441:0:(osc_request.c:4231:__osc_llog_init()) failed LLOG_MDS_OST_ORIG_CTXT
LustreError: 10441:0:(osc_request.c:4248:__osc_llog_init()) osc &apos;lustre-OST0000-osc-MDT0000&apos; tgt &apos;mdd_obd-lustre-MDT0000&apos; catid ffff880c24f6b8b0 rc=-5
LustreError: 10441:0:(osc_request.c:4250:__osc_llog_init()) logid 0x2:0x0
LustreError: 10441:0:(osc_request.c:4278:osc_llog_init()) rc: -5
LustreError: 10441:0:(lov_log.c:248:lov_llog_init()) error osc_llog_init idx 0 osc &apos;lustre-OST0000-osc-MDT0000&apos; tgt &apos;mdd_obd-lustre-MDT0000&apos; (rc=-5)
LustreError: 10441:0:(llog_lvfs.c:616:llog_lvfs_create()) error looking up logfile 0x4:0x0: rc -116
LustreError: 10441:0:(llog_obd.c:220:llog_setup_named()) obd lustre-OST0001-osc-MDT0000 ctxt 2 lop_setup=ffffffffa0562ca0 failed -116
LustreError: 10441:0:(osc_request.c:4231:__osc_llog_init()) failed LLOG_MDS_OST_ORIG_CTXT
LustreError: 10441:0:(osc_request.c:4248:__osc_llog_init()) osc &apos;lustre-OST0001-osc-MDT0000&apos; tgt &apos;mdd_obd-lustre-MDT0000&apos; catid ffff880c24f6b8b0 rc=-116
LustreError: 10441:0:(osc_request.c:4250:__osc_llog_init()) logid 0x4:0x0
LustreError: 10441:0:(osc_request.c:4278:osc_llog_init()) rc: -116
LustreError: 10441:0:(lov_log.c:248:lov_llog_init()) error osc_llog_init idx 1 osc &apos;lustre-OST0001-osc-MDT0000&apos; tgt &apos;mdd_obd-lustre-MDT0000&apos; (rc=-116)
Lustre: 10566:0:(debug.c:326:libcfs_debug_str2mask()) You are trying to use a numerical value for the mask - this will be deprecated in a future release.
Lustre: 10567:0:(debug.c:326:libcfs_debug_str2mask()) You are trying to use a numerical value for the mask - this will be deprecated in a future release.
Lustre: 10437:0:(ldlm_lib.c:952:target_handle_connect()) MGS: connection from 4447b0f0-2f2f-51f4-cd9f-aa011ef3eb77@10.10.19.17@tcp t0 exp (null) cur 1365015077 last 0
Lustre: 10209:0:(client.c:1817:ptlrpc_expire_one_request()) @@@ Request  sent has timed out for slow reply: [sent 1365015074/real 1365015074]  req@ffff8805fb180800 x1431322036797450/t0(0) o8-&amp;gt;lustre-OST0000-osc-MDT0000@10.10.19.17@tcp:28/4 lens 368/512 e 0 to 1 dl 1365015079 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Lustre: 10437:0:(ldlm_lib.c:952:target_handle_connect()) MGS: connection from a8451e2f-8501-14bd-8ab3-88e0df1b7640@10.10.19.26@tcp t0 exp (null) cur 1365015079 last 0
Lustre: 10209:0:(client.c:1817:ptlrpc_expire_one_request()) @@@ Request  sent has timed out for slow reply: [sent 1365015075/real 1365015075]  req@ffff880626de8800 x1431322036797451/t0(0) o8-&amp;gt;lustre-OST0001-osc-MDT0000@10.10.19.26@tcp:28/4 lens 368/512 e 0 to 1 dl 1365015080 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Lustre: 10437:0:(ldlm_lib.c:952:target_handle_connect()) MGS: connection from a2dec252-dec3-5b33-4395-9b26a06a9dac@10.10.18.253@tcp t0 exp (null) cur 1365015081 last 0
Lustre: lustre-MDT0000: Temporarily refusing client connection from 10.10.18.253@tcp
Lustre: lustre-MDT0000: Temporarily refusing client connection from 10.10.18.253@tcp
Lustre: lustre-MDT0000: Temporarily refusing client connection from 10.10.18.253@tcp
Lustre: lustre-MDT0000: Temporarily refusing client connection from 10.10.18.253@tcp
LustreError: 10599:0:(lov_log.c:160:lov_llog_origin_connect()) error osc_llog_connect tgt 1 (-107)
LustreError: 10598:0:(mds_lov.c:832:__mds_lov_synchronize()) lustre-OST0000_UUID failed at llog_origin_connect: -107
LustreError: 10598:0:(mds_lov.c:861:__mds_lov_synchronize()) sync lustre-OST0000_UUID failed -107
LustreError: 10598:0:(mds_lov.c:865:__mds_lov_synchronize()) deactivating lustre-OST0000_UUID
LustreError: 10599:0:(lov_log.c:160:lov_llog_origin_connect()) Skipped 1 previous similar message
LustreError: 10599:0:(mds_lov.c:832:__mds_lov_synchronize()) lustre-OST0001_UUID failed at llog_origin_connect: -107
LustreError: 10599:0:(mds_lov.c:861:__mds_lov_synchronize()) sync lustre-OST0001_UUID failed -107
LustreError: 10599:0:(mds_lov.c:865:__mds_lov_synchronize()) deactivating lustre-OST0001_UUID
Lustre: 10454:0:(ldlm_lib.c:952:target_handle_connect()) lustre-MDT0000: connection from d03c55d9-e816-05ed-ff50-0ba89f2504bb@10.10.18.253@tcp t0 exp (null) cur 1365015101 last 0
Lustre: DEBUG MARKER: Using TIMEOUT=20
Lustre: DEBUG MARKER: 2 OST are inactive after 20 seconds, give up
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Test report is still in Maloo import queue.&lt;/p&gt;</comment>
                            <comment id="55478" author="mdiep" created="Thu, 4 Apr 2013 14:42:31 +0000"  >&lt;p&gt;Here is the test report: &lt;a href=&quot;https://maloo.whamcloud.com/test_sessions/f77f5032-9cd5-11e2-802d-52540035b04c&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://maloo.whamcloud.com/test_sessions/f77f5032-9cd5-11e2-802d-52540035b04c&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="55756" author="green" created="Mon, 8 Apr 2013 15:34:30 +0000"  >&lt;p&gt;So is my reaing right that the test actually failed because all OSCs are in inactive state, so it won&apos;t be possible to create any new files on such a filesystem? I do not see any attempts for the test to actually create anything post-downgrade so perhaps it&apos;s a case we are missing?&lt;/p&gt;</comment>
                            <comment id="55815" author="bobijam" created="Tue, 9 Apr 2013 00:55:52 +0000"  >&lt;p&gt;I don&apos;t know about the test case, but the latest error has something about CATALOGS file changing in 2.4.&lt;/p&gt;

&lt;p&gt;the CATALOGS write by 2.1 is as follows (logid is i_ino+ __u64 0x0+i_generation)&lt;/p&gt;
&lt;div class=&quot;panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;panelContent&quot;&gt;
&lt;ol&gt;
	&lt;li&gt;od -x /mnt/mds1/CATALOGS&lt;br/&gt;
0000000 0021 0000 0000 0000 0000 0000 0000 0000&lt;br/&gt;
0000020 2a9d 1d4f 0000 0000 0000 0000 0000 0000&lt;br/&gt;
0000040 0022 0000 0000 0000 0000 0000 0000 0000&lt;br/&gt;
0000060 2a9e 1d4f 0000 0000 0000 0000 0000 0000&lt;br/&gt;
0000100 0023 0000 0000 0000 0000 0000 0000 0000&lt;br/&gt;
0000120 2a9f 1d4f 0000 0000 0000 0000 0000 0000&lt;br/&gt;
0000140&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;after 2.4 mounted it, the CATALOGS logic arrays changes to &lt;/p&gt;
&lt;div class=&quot;panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;panelContent&quot;&gt;
&lt;ol&gt;
	&lt;li&gt;od -x /mnt/mds1/CATALOGS&lt;br/&gt;
0000000 0002 0000 0000 0000 0001 0000 0000 0000&lt;br/&gt;
0000020 0000 0000 0000 0000 0000 0000 0000 0000&lt;br/&gt;
0000040 0004 0000 0000 0000 0001 0000 0000 0000&lt;br/&gt;
0000060 0000 0000 0000 0000 0000 0000 0000 0000&lt;br/&gt;
0000100 0006 0000 0000 0000 0001 0000 0000 0000&lt;br/&gt;
0000120 0000 0000 0000 0000 0000 0000 0000 0000&lt;br/&gt;
0000140&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="55816" author="bobijam" created="Tue, 9 Apr 2013 01:44:38 +0000"  >&lt;p&gt;Without those patches having on-disk change porting backward, the upgrade then downgrade test would be a headache.&lt;/p&gt;</comment>
                            <comment id="55821" author="yujian" created="Tue, 9 Apr 2013 03:45:51 +0000"  >&lt;blockquote&gt;&lt;p&gt;So is my reaing right that the test actually failed because all OSCs are in inactive state, so it won&apos;t be possible to create any new files on such a filesystem?&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;Right, touching a new file on Lustre client failed as follows:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;# touch /mnt/lustre/file
touch: cannot touch `/mnt/lustre/file&apos;: Input/output error
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;blockquote&gt;&lt;p&gt;I do not see any attempts for the test to actually create anything post-downgrade so perhaps it&apos;s a case we are missing?&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;The upgrade/downgrade testing needs to be performed as per wiki page &lt;a href=&quot;https://wiki.hpdd.intel.com/display/ENG/Upgrade+and+Downgrade+Testing&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://wiki.hpdd.intel.com/display/ENG/Upgrade+and+Downgrade+Testing&lt;/a&gt;. As we can see, the data creating/verifying steps are included in the post-downgrade phases. However, these test cases are not covered in one test script currently. The script which detected the issue in this ticket was upgrade-downgrade.sh, which covered extra quotas and OST pools testing along the upgrade/downgrade path.&lt;/p&gt;

&lt;p&gt;The issue is that while performing this script to do downgrade testing, the extra quotas testing was disabled (due to the new way of setting quotas on master branch), only OST pools testing was covered along the downgrade path, which only verified that the existing files/directories could be accessed and the striping info was correct, but did not create new files.&lt;/p&gt;

&lt;p&gt;So, in order to cover all of the test cases in the wiki page, we need improve upgrade-downgrade.sh to make the quotas codes work on master branch, and also need run the other two test scripts {clean,rolling}-upgrade-downgrade.sh before the test cases covered by them are added into upgrade-downgrade.sh.&lt;/p&gt;</comment>
                            <comment id="55891" author="bobijam" created="Tue, 9 Apr 2013 16:49:07 +0000"  >&lt;p&gt;This log shows that 2.4 MDT start cannot find the old llog objects, and created new ones in 2.4 way (using llog_osd_ops), which 2.1 code (using llog_lvfs_ops) cannot recognise.&lt;/p&gt;</comment>
                            <comment id="55954" author="bobijam" created="Wed, 10 Apr 2013 02:24:27 +0000"  >&lt;p&gt;Actually, I found that when 2.1 formatted disk upgraded 2.4, all files&apos; size becomes 0, their content are lost.&lt;/p&gt;</comment>
                            <comment id="55957" author="adilger" created="Wed, 10 Apr 2013 04:56:39 +0000"  >&lt;p&gt;Bobijam, that sounds like a very critical problem. Does conf-sanity test 32 not detect this problem during 2.1 to 2.4 upgrade?  Is that problem repeatable?&lt;/p&gt;

&lt;p&gt;Please file a separate bug for that problem and make it a 2.4 blocker until it is better understood and fixed. &lt;/p&gt;</comment>
                            <comment id="55961" author="bzzz" created="Wed, 10 Apr 2013 05:40:41 +0000"  >&lt;p&gt;this is rather strange.. I do remember Li Wei improved conf-sanity/32 to verify actual data with md5sum.&lt;/p&gt;</comment>
                            <comment id="55962" author="bobijam" created="Wed, 10 Apr 2013 05:51:54 +0000"  >&lt;p&gt;Is it possible that the disk2_1-ldiskfs.tar.bz2 was created with an outdated compatible b2_1 version? I just tested with current b2_1 and master branch, the procedure is &lt;/p&gt;

&lt;p&gt;1. create a lustre filesystem with b2_1 version&lt;br/&gt;
2. copy /etc/* to this filesystem&lt;br/&gt;
3. umount it&lt;br/&gt;
4. mount the file system with master version, succeeded&lt;br/&gt;
5. &apos;ls -l&apos; the filesystem, all files are there but all sized 0 with no content.&lt;/p&gt;</comment>
                            <comment id="55963" author="bzzz" created="Wed, 10 Apr 2013 05:55:12 +0000"  >&lt;p&gt;iirc, Li Wei did put a function to create such an image in conf-sanity.sh&lt;/p&gt;</comment>
                            <comment id="55973" author="bobijam" created="Wed, 10 Apr 2013 08:58:23 +0000"  >&lt;p&gt;strangely, I don&apos;t even need master version, with b2_1 alone, format &amp;amp; mount &amp;amp; copy files &amp;amp; umount &amp;amp; mount it again, all files become 0.&lt;/p&gt;

&lt;p&gt;btw, I&apos;m using current b2_1 lustre code with linux-2.6.32-279.22.1.el6 kernel code.&lt;/p&gt;</comment>
                            <comment id="55976" author="bobijam" created="Wed, 10 Apr 2013 09:16:38 +0000"  >&lt;p&gt;Fired a ticket (&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3141&quot; title=&quot;umount lost all file contents&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3141&quot;&gt;&lt;del&gt;LU-3141&lt;/del&gt;&lt;/a&gt;) for the &apos;empty file&apos; issue.&lt;/p&gt;</comment>
                            <comment id="55980" author="bobijam" created="Wed, 10 Apr 2013 09:34:01 +0000"  >&lt;p&gt;it turns out that the &apos;empty file&apos; issue is not a bug, I was using llmountcleanup.sh which calls &apos;umount -f&apos; and it does not flush dirty data.&lt;/p&gt;

&lt;p&gt;I&apos;ll push another patch which can pass the upgrade &amp;amp; downgrade test on my VM environment.&lt;/p&gt;</comment>
                            <comment id="56126" author="bobijam" created="Thu, 11 Apr 2013 18:28:20 +0000"  >&lt;p&gt;Wang Di,&lt;/p&gt;

&lt;p&gt;The upgrade test shows these messages on client side &lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;
Maloo report: https://maloo.whamcloud.com/test_sessions/ef5c83fe-a2be-11e2-81ba-52540035b04c

I found the following error messages in the console log of client wtm-81 after upgrading:

LustreError: 33998:0:(lustre_idl.h:705:ostid_to_fid()) bad MDT0 id, 0x6dec:1024 ost_idx:0
LustreError: 33998:0:(lustre_idl.h:705:ostid_to_fid()) bad MDT0 id, 0x6dec:1024 ost_idx:0
LustreError: 33998:0:(lustre_idl.h:705:ostid_to_fid()) bad MDT0 id, 0x6d78:1024 ost_idx:0
LustreError: 33998:0:(lustre_idl.h:705:ostid_to_fid()) Skipped 31 previous similar messages
LustreError: 33998:0:(lustre_idl.h:705:ostid_to_fid()) bad MDT0 id, 0x6d6d:1024 ost_idx:0
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I noticed that it&apos;s from your recent patch, what the issue could it be? Is it potentially harmful?&lt;/p&gt;</comment>
                            <comment id="56133" author="bobijam" created="Thu, 11 Apr 2013 19:10:22 +0000"  >&lt;p&gt;Just update what I found. My local test shows the error message is from lov_attr_get_raid0()&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;
00020000:00000001:0.0:1365706828.914809:0:18347:0:(lov_object.c:420:lov_attr_get_raid0()) Process entered
00020000:00020000:0.0:1365706828.914809:0:18347:0:(lustre_idl.h:705:ostid_to_fid()) bad MDT0 id, 0x54:1024 ost_idx:0
00020000:00000002:0.0:1365706828.914810:0:18347:0:(lov_merge.c:76:lov_merge_lvb_kms()) MDT FID [0x0:0x0:0x0] initial value: s=0 m=9223372036854775808 a=9223372036854775808 c=9223372036854775808 b=0
00020000:00000001:0.0:1365706828.914811:0:18347:0:(lov_offset.c:59:lov_stripe_size()) Process entered
00020000:00000001:0.0:1365706828.914811:0:18347:0:(lov_offset.c:74:lov_stripe_size()) Process leaving (rc=370 : 370 : 172)
00020000:00000001:0.0:1365706828.914812:0:18347:0:(lov_offset.c:59:lov_stripe_size()) Process entered
00020000:00000001:0.0:1365706828.914812:0:18347:0:(lov_offset.c:74:lov_stripe_size()) Process leaving (rc=370 : 370 : 172)
00020000:00000002:0.0:1365706828.914813:0:18347:0:(lov_merge.c:110:lov_merge_lvb_kms()) MDT FID [0x0:0x0:0x0] on OST[1]: s=370 m=1365706747 a=1365706747 c=1365706747 b=8
00020000:00000001:0.0:1365706828.914814:0:18347:0:(lov_merge.c:119:lov_merge_lvb_kms()) Process leaving (rc=0 : 0 : 0)
00000020:00000001:0.0:1365706828.914814:0:18347:0:(cl_object.c:1019:cl_lvb2attr()) Process entered
00000020:00000001:0.0:1365706828.914814:0:18347:0:(cl_object.c:1025:cl_lvb2attr()) Process leaving
00020000:00000001:0.0:1365706828.914815:0:18347:0:(lov_object.c:476:lov_attr_get_raid0()) Process leaving (rc=0 : 0 : 0)
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;while in lov_merge_lvb_kms(), stack variable &apos;fid&apos; has no effect except for displaying debug messages.&lt;/p&gt;

&lt;p&gt;WangDi, need your input.&lt;/p&gt;</comment>
                            <comment id="56137" author="adilger" created="Thu, 11 Apr 2013 20:02:47 +0000"  >&lt;p&gt;Di, it appears that this might relate to &lt;a href=&quot;http://review.whamcloud.com/5820&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/5820&lt;/a&gt; incorrectly converting struct llog_logid to use struct ost_id.  The fields in that structure were already used a bit strangely, and I think that the ostid_to_fid() and fid_to_ostid() macros may be mangling the fields in this structure, since they were never really used as OST id/seq/, but rather MDS inode/0/generation.&lt;/p&gt;

&lt;p&gt;On 1.8 the CATALOGS file looks like oid=ino/ogr=0/ogen=gen:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;0000000 0000000000000021 0000000000000000
0000020 000000001d4f2a9d 0000000000000000
0000040 0000000000000022 0000000000000000
0000060 000000001d4f2a9e 0000000000000000
0000100 0000000000000023 0000000000000000
0000120 000000001d4f2a9f 0000000000000000
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;On 2.1 (upgraded from 1.8, if it matters) the CATALOGS file looks like oid=ino/oseq=1/ogen=gen:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;000000 0000000000020002 0000000000000001
000010 00000000ec467621 0000000000000000
000020 0000000000020003 0000000000000001
000030 00000000ec467622 0000000000000000
000040 0000000000020004 0000000000000001
000050 00000000ec467623 0000000000000000
000060 0000000000020005 0000000000000001
000070 0000000095595788 0000000000000000
000080 000000000002005b 0000000000000001
000090 000000001ecc9141 0000000000000000
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;On 2.4 (new) the CATALOGS file looks like (f_seq=1/f_oid=oid/ogen=0):&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;000000 0000000000000001 0000000000000002
000010 0000000000000000 0000000000000000
000020 0000000000000001 0000000000000004
000030 0000000000000000 0000000000000000
000040 0000000000000001 0000000000000006
000050 0000000000000000 0000000000000000
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I suspect that the latter change is unintentional, since I think it breaks the on-disk and on-wire llog_logid protocol.  I&apos;m just running another test with a pre 6794d7654b4c459519a9e6d85ed439c8c594c2e7 build to see what the CATALOGS file looks like.&lt;/p&gt;

&lt;p&gt;Di, could you please work up a patch that reverts the changes to llog_logid and its usage, and we can see if that fixes this latest issue.&lt;/p&gt;</comment>
                            <comment id="56139" author="adilger" created="Thu, 11 Apr 2013 20:11:11 +0000"  >&lt;p&gt;Only the latest issue may be related to &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2684&quot; title=&quot;convert ost_id to lu_fid for FID_SEQ_NORMAL objects&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2684&quot;&gt;&lt;del&gt;LU-2684&lt;/del&gt;&lt;/a&gt;, not the original one.&lt;/p&gt;</comment>
                            <comment id="56140" author="adilger" created="Thu, 11 Apr 2013 20:13:24 +0000"  >&lt;p&gt;The pre-&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2684&quot; title=&quot;convert ost_id to lu_fid for FID_SEQ_NORMAL objects&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2684&quot;&gt;&lt;del&gt;LU-2684&lt;/del&gt;&lt;/a&gt; CATALOGS file looks like:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;000000 000000000000000a 0000000000000001
000010 0000000000000000 0000000000000000
000020 000000000000000c 0000000000000001
000030 0000000000000000 0000000000000000
000040 000000000000000e 0000000000000001
000050 0000000000000000 0000000000000000
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;so indeed this is an unintentional compatibility breakage with the new code.&lt;/p&gt;</comment>
                            <comment id="56165" author="di.wang" created="Fri, 12 Apr 2013 00:17:32 +0000"  >&lt;p&gt;Bobi: Could you please try this patch? &lt;a href=&quot;http://review.whamcloud.com/#change,6034&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#change,6034&lt;/a&gt;  It seems lmm_oi is being overwritten by some other threads. As Andreas said it might related with ostid_to_fid for llog object. So this patch will use oi_id/oi_seq directly to identify the log object, so to avoid ostid_to_fid conversion. Could you please try a few times, I guess this problem can not be reproduced often, at least I can not reproduce it locally. Thanks. &lt;/p&gt;</comment>
                            <comment id="56172" author="bobijam" created="Fri, 12 Apr 2013 01:22:03 +0000"  >&lt;p&gt;&lt;a href=&quot;http://review.whamcloud.com/6034&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/6034&lt;/a&gt; does not solve the issue. The error message is not from llog handling, it&apos;s from stat/ls the file created by 2.1 system.&lt;/p&gt;

&lt;p&gt;attached is the -1 log.&lt;/p&gt;</comment>
                            <comment id="56180" author="bobijam" created="Fri, 12 Apr 2013 02:11:51 +0000"  >&lt;p&gt;I have an example of ostid here extracted from lov_merge_lvb_kms()&lt;/p&gt;


&lt;p&gt;LustreError: 8277:0:(lustre_idl.h:705:ostid_to_fid()) bad MDT0 id, 0x51:1024 ost_idx:0&lt;br/&gt;
LustreError: 8277:0:(lustre_idl.h:706:ostid_to_fid()) 0x51:0x200000400&lt;/p&gt;


&lt;p&gt;The second log message format is as follows&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;                        CERROR(&lt;span class=&quot;code-quote&quot;&gt;&quot;bad MDT0 id, &quot;&lt;/span&gt;DOSTID&lt;span class=&quot;code-quote&quot;&gt;&quot; ost_idx:%u\n&quot;&lt;/span&gt;,
                                POSTID(ostid), ost_idx);
                        CERROR(LPX64&lt;span class=&quot;code-quote&quot;&gt;&quot;:&quot;&lt;/span&gt;LPX64&lt;span class=&quot;code-quote&quot;&gt;&quot;\n&quot;&lt;/span&gt;, ostid-&amp;gt;oi.oi_id, ostid-&amp;gt;oi.oi_seq);
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="56184" author="di.wang" created="Fri, 12 Apr 2013 03:27:38 +0000"  >&lt;p&gt;Hmm, the problem is that in 2.1, we define the lsm/lmm in this&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;struct lov_mds_md_v1 {            /* LOV EA mds/wire data (little-endian) */
        __u32 lmm_magic;          /* magic number = LOV_MAGIC_V1 */
        __u32 lmm_pattern;        /* LOV_PATTERN_RAID0, LOV_PATTERN_RAID1 */
        __u64 lmm_object_id;      /* LOV object ID */
        __u64 lmm_object_seq;     /* LOV object seq number */
        __u32 lmm_stripe_size;    /* size of stripe in bytes */
        __u32 lmm_stripe_count;   /* num stripes in use for this object */
        struct lov_ost_data_v1 lmm_objects[0]; /* per-stripe data */
};        
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;But lmm_object_seq/lmm_object_id is actually normal MDT FIDS, i.e. lmm_object_id/lmm_object_seq will be f_oid/normal_seq, and when unpack lmm to lsm on 2.4,&lt;/p&gt;

&lt;p&gt;it will use ostid_le_to_cpu()&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;static inline void ostid_le_to_cpu(struct ost_id *src_oi,
                                   struct ost_id *dst_oi)
{
        if (fid_seq_is_mdt0(ostid_seq(src_oi))) {
                dst_oi-&amp;gt;oi.oi_id = le64_to_cpu(src_oi-&amp;gt;oi.oi_id);
                dst_oi-&amp;gt;oi.oi_seq = le64_to_cpu(src_oi-&amp;gt;oi.oi_seq);
        } else {
                fid_le_to_cpu(&amp;amp;dst_oi-&amp;gt;oi_fid, &amp;amp;src_oi-&amp;gt;oi_fid);
        }
}
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;And treat the ostid as normal FID, which cause the problem.&lt;/p&gt;

&lt;p&gt;Sigh, it seems we do not have better way to convert this special ostid to the real FID.&lt;/p&gt;


</comment>
                            <comment id="56191" author="di.wang" created="Fri, 12 Apr 2013 07:22:40 +0000"  >&lt;p&gt;&lt;a href=&quot;http://review.whamcloud.com/#change,6037&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#change,6037&lt;/a&gt; Bobi: please check this one. Thanks!&lt;/p&gt;</comment>
                            <comment id="56193" author="bobijam" created="Fri, 12 Apr 2013 08:41:29 +0000"  >&lt;p&gt;yes, with &lt;a href=&quot;http://review.whamcloud.com/5731&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/5731&lt;/a&gt; on b2_1 and &lt;a href=&quot;http://review.whamcloud.com/6034&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/6034&lt;/a&gt; and &lt;a href=&quot;http://review.whamcloud.com/6037&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/6037&lt;/a&gt; on master, the downgrade and upgrade test passed with no noise.&lt;/p&gt;</comment>
                            <comment id="56224" author="jlevi" created="Fri, 12 Apr 2013 20:38:29 +0000"  >&lt;p&gt;Change/6034 and 6037 have been merged into &lt;a href=&quot;http://review.whamcloud.com/#change,6044&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#change,6044&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="56315" author="jlevi" created="Mon, 15 Apr 2013 12:39:27 +0000"  >&lt;p&gt;Landed for 2.4&lt;/p&gt;</comment>
                            <comment id="56317" author="bobijam" created="Mon, 15 Apr 2013 12:43:17 +0000"  >&lt;p&gt;&lt;a href=&quot;http://review.whamcloud.com/#change,5731&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#change,5731&lt;/a&gt; hasn&apos;t landed on b2_1 yet, need that to be claimed to be fixed.&lt;/p&gt;</comment>
                            <comment id="56318" author="jlevi" created="Mon, 15 Apr 2013 12:56:16 +0000"  >&lt;p&gt;Reducing 2.4 blocker, but keeping open until lands on b2_1.&lt;/p&gt;</comment>
                            <comment id="56380" author="sarah" created="Tue, 16 Apr 2013 05:38:14 +0000"  >&lt;p&gt;Please also make changes to b1_8, hit the same LBUG after downgrade from the latest tag-2.3.64 to 1.8.9&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Lustre: Enabling user_xattr
LustreError: 27458:0:(osd_handler.c:2343:osd_index_try()) ASSERTION( dt_object_exists(dt) ) failed: 
LustreError: 27458:0:(osd_handler.c:2343:osd_index_try()) LBUG
Pid: 27458, comm: llog_process_th
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="56381" author="bobijam" created="Tue, 16 Apr 2013 06:09:01 +0000"  >&lt;p&gt;Do we support 2.x to 1.8 downgrade?&lt;/p&gt;</comment>
                            <comment id="56414" author="sarah" created="Tue, 16 Apr 2013 17:58:49 +0000"  >&lt;p&gt;Sorry, I was downgraded the system to b2_1 by mistake, 1.8.9&amp;lt;-&amp;gt;2.4 doesn&apos;t have this issue.&lt;/p&gt;</comment>
                            <comment id="56531" author="bobijam" created="Thu, 18 Apr 2013 06:28:51 +0000"  >&lt;p&gt;landed on b2_1 for 2.1.6&lt;/p&gt;</comment>
                            <comment id="58272" author="spitzcor" created="Mon, 13 May 2013 13:54:42 +0000"  >&lt;p&gt;Yes, I think that we should support 1.8.x-&amp;gt;2.4-&amp;gt;1.8.9-wc1 upgrade &amp;amp; downgrade.  Of course, w/o dirdata and large_xattr, etc. used.&lt;/p&gt;</comment>
                            <comment id="58898" author="spitzcor" created="Mon, 20 May 2013 16:26:07 +0000"  >&lt;p&gt;I also think that 2.3-&amp;gt;2.4-&amp;gt;2.3 upgrade &amp;amp; downgrade should be supported.  But this bug occurs there as well.  Is the appropriate response to reopen this ticket and mark it as affecting 2.3 (and I suppose 2.2 too)?  Or should we make a clone?&lt;/p&gt;

&lt;p&gt;Also, I think it is a pretty bad story that we&apos;d have to get a fix onto b2_3 to make the upgrade/downgrade scenario work.  Because even if we get the existing patch landed to b2_3 you can&apos;t revert to 2.3.0.&lt;/p&gt;
</comment>
                            <comment id="63001" author="sarah" created="Thu, 25 Jul 2013 20:06:25 +0000"  >&lt;p&gt;I hit this issue when downgrade from 2.5 to 2.3.0, will create a new issue for that.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                            <outwardlinks description="duplicates">
                                        <issuelink>
            <issuekey id="19766">LU-3574</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="17315">LU-2684</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="18700">LU-3267</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="12495" name="2.4-start-mdt.log.tar.bz2" size="172696" author="bobijam" created="Tue, 9 Apr 2013 16:49:07 +0000"/>
                            <attachment id="12507" name="upgrade-ls.log.tar.bz2" size="164096" author="bobijam" created="Fri, 12 Apr 2013 01:22:03 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzvk1b:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>6970</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>