<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:41:46 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-11195] LBUG: (lod_lov.c:633:lod_free_comp_buffer()) ASSERTION( entry-&gt;llc_stripe == ((void *)0) )</title>
                <link>https://jira.whamcloud.com/browse/LU-11195</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;The primary MDS and then its backup panic when replaying layout change requests on a PFL file.&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[  842.061520] LustreError: 22966:0:(lod_lov.c:633:lod_free_comp_buffer()) ASSERTION( entry-&amp;gt;llc_stripe == ((void *)0) ) failed: 
[  842.075148] LustreError: 22966:0:(lod_lov.c:633:lod_free_comp_buffer()) LBUG 
[  842.083308] Pid: 22966, comm: mdt03_001

crash_x86_64&amp;gt; bt
PID: 22966  TASK: ffff881f0cbfdee0  CPU: 18  COMMAND: &quot;mdt03_001&quot;
 #0 [ffff881f0c4f7680] machine_kexec at ffffffff8105d77b
 #1 [ffff881f0c4f76e0] __crash_kexec at ffffffff81108742
 #2 [ffff881f0c4f77b0] panic at ffffffff816a863f
 #3 [ffff881f0c4f7830] lbug_with_loc at ffffffffc0b65854 [libcfs]
 #4 [ffff881f0c4f7850] lod_free_comp_buffer at ffffffffc151b62f [lod]
 #5 [ffff881f0c4f7898] lod_free_comp_entries at ffffffffc151ff57 [lod]
 #6 [ffff881f0c4f78b0] lod_parse_striping at ffffffffc1522140 [lod]
 #7 [ffff881f0c4f7920] lod_load_striping_locked at ffffffffc152262a [lod]
 #8 [ffff881f0c4f7958] lod_declare_layout_change at ffffffffc153d0bf [lod]
 #9 [ffff881f0c4f79c0] mdd_declare_layout_change at ffffffffc13e9102 [mdd]
#10 [ffff881f0c4f79e8] mdd_layout_change at ffffffffc13f110d [mdd]
#11 [ffff881f0c4f7a40] mdt_layout_change at ffffffffc145aaf8 [mdt]
#12 [ffff881f0c4f7a88] mdt_intent_layout at ffffffffc146220a [mdt]
#13 [ffff881f0c4f7b40] mdt_intent_opc at ffffffffc1458315 [mdt]
#14 [ffff881f0c4f7ba0] mdt_intent_policy at ffffffffc145ff63 [mdt]
#15 [ffff881f0c4f7be0] ldlm_lock_enqueue at ffffffffc0e92f0e [ptlrpc]
#16 [ffff881f0c4f7c48] ldlm_handle_enqueue0 at ffffffffc0ebb753 [ptlrpc]
#17 [ffff881f0c4f7cd8] tgt_enqueue at ffffffffc0f41b32 [ptlrpc]
#18 [ffff881f0c4f7cf8] tgt_request_handle at ffffffffc0f454da [ptlrpc]
#19 [ffff881f0c4f7d40] ptlrpc_server_handle_request at ffffffffc0eed733 [ptlrpc]
#20 [ffff881f0c4f7de0] ptlrpc_main at ffffffffc0ef0ee2 [ptlrpc]
#21 [ffff881f0c4f7ec8] kthread at ffffffff810b4031
#22 [ffff881f0c4f7f50] ret_from_fork at ffffffff816c155d
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;During recovery, the MDS can not handle the fact that the file is already striped. The client is replaying a layout operation which completed. The server tries to free the component buffers but asserts when it finds striping info defined that it doesn&apos;t expect.&lt;/p&gt;

&lt;p&gt;We have a dump of the LBUG with -1 debug logs. The scenario in the dump is the replay of 2 layout change rpcs on a PFL file with 4 components:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;

Process A - pid 22961

00000020:00000040:4.0:1531845858.765709:000000:22961:0:(tgt_handler.c:578:tgt_handle_recovery()) @@@ Got new replay  req@ffff881f1041ce00 x1605550737582320/t0(1065152839981) o101-&amp;gt;5fa746bc-5d86-4a17-630c-889714734378@175@gni6:148/0 lens 848/0 e 0 to 0 dl 1531845938 ref 1 fl Interpret:/4/ffffffff rc 0/-1

00000004:00000040:4.0:1531845858.765745:000000:22961:0:(mdt_handler.c:3807:mdt_intent_layout()) [0x340077935:0x99b3:0x0]got layout change request from client: opc:2 flags:0x0 extent [0x1000000, 0x1100000)

Process B - pid 

00000020:00000040:18.0:1531845858.766672:000000:22966:0:(tgt_handler.c:578:tgt_handle_recovery()) @@@ Got new replay  req@ffff881e83a8bf00 x1605550737582560/t0(1065152849881) o101-&amp;gt;5fa746bc-5d86-4a17-630c-889714734378@175@gni6:148/0 lens 920/0 e 0 to 0 dl 1531845938 ref 1 fl Interpret:/4/ffffffff rc 0/-1

00000004:00000040:18.0:1531845858.766710:000000:22966:0:(mdt_handler.c:3807:mdt_intent_layout()) [0x340077935:0x99b3:0x0]got layout change request from client: opc:2 flags:0x0 extent [0x2000000, 0x2100000)
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Process A runs first. It:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;finds object O for &lt;span class=&quot;error&quot;&gt;&amp;#91;0x340077935:0x99b3:0x0&amp;#93;&lt;/span&gt; in the lu_site object cache,&lt;/li&gt;
	&lt;li&gt;follows the normal layout request path and loads component definitions from disk into O&lt;/li&gt;
	&lt;li&gt;detects that the request is for replay: deletes all components and loads component definitions from the rpc buffers into O. Only the first 2 of the 4 components are flagged as initialized.&lt;/li&gt;
	&lt;li&gt;defines striping info for those two components&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Process A leaves the object in cache. Some components have striping info, some don&apos;t. The O.ldo_comp_cached flag is NOT set.&lt;/p&gt;

&lt;p&gt;Process B then runs. It tries to follow the same path as Process A. It:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;finds O in the object cache&lt;/li&gt;
	&lt;li&gt;tries to remove the components of O before loading them from disk&lt;/li&gt;
	&lt;li&gt;finds the llc_stripe info set on the first component and LBUGs&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;If O.lldo_comp_cached flag was set, then Process B would not try to load the components from disk. Instead it would remove the components defined for O and replace them with the layout from the rpc buffer, which would be correct.&lt;/p&gt;

&lt;p&gt;The following notes outline the function flow involved in the bug. Process A and Process B both call lod_declare_layout_change(); the calls are sequential. &lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;lod_declare_layout_change() - layout from rpc
    lod_load_striping_locked() - returns if obj striping is cached
                                               - creates obj.comps; lbuf from lu_env
        lod_parse_striping() - allocs stripes according to obj &amp;lt;---  Process B LBUGs
        sets obj.ldo_comp_cached
    layout_lock_data_init()
  
    lod_declare_update_plain() - comp info from mlc_buf(rpc)
        if replay
            lod_object_free_striping() - removes stripes from obj
                unsets obj.ldo_comp_cached
            lod_use_defined_striping() - obj, mlc_buf(rpc), 
                                                       - builds comps,stripes from rpc
                  lod_alloc_comp_entries()                               
            lod_get_lov_ea() - gets ea from disk puts in lov_env_info
        loop over obj components 
            if obj.e_start &amp;gt; rpc.e_end break from loop
            if replay skip comps already initialized on disk
            add comp to reconstruct list
        if no comps added to reconstruct list return EALREADY  &amp;lt;--- Process A exits through this path
        lod_obj_inc_layout_gen()
        lod_declare_instantiate_components()
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The replay logic in lod_declare_update plain leaves the object in an unexpected state: components have llc_stripe set but ldo_comp_cached is false. The simplest fix is just to add &quot;obj.ldo_comp_cached = 1&quot; following the call to lod_use_defined_striping().&lt;/p&gt;</description>
                <environment></environment>
        <key id="52860">LU-11195</key>
            <summary>LBUG: (lod_lov.c:633:lod_free_comp_buffer()) ASSERTION( entry-&gt;llc_stripe == ((void *)0) )</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="amk">Ann Koehler</assignee>
                                    <reporter username="amk">Ann Koehler</reporter>
                        <labels>
                    </labels>
                <created>Mon, 30 Jul 2018 20:28:17 +0000</created>
                <updated>Tue, 13 Apr 2021 05:10:53 +0000</updated>
                            <resolved>Tue, 4 Sep 2018 06:06:53 +0000</resolved>
                                    <version>Lustre 2.10.0</version>
                    <version>Lustre 2.11.0</version>
                                    <fixVersion>Lustre 2.12.0</fixVersion>
                    <fixVersion>Lustre 2.10.7</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>4</watches>
                                                                            <comments>
                            <comment id="231109" author="amk" created="Mon, 30 Jul 2018 20:36:42 +0000"  >&lt;p&gt;I&apos;ve attached the full dklog along with extractions of the problematic replay requests. If you want the vmcore, let me know where to put it.&lt;/p&gt;</comment>
                            <comment id="231113" author="gerrit" created="Mon, 30 Jul 2018 21:15:11 +0000"  >&lt;p&gt;Ann Koehler (amk@cray.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/32904&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/32904&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11195&quot; title=&quot;LBUG: (lod_lov.c:633:lod_free_comp_buffer()) ASSERTION( entry-&amp;gt;llc_stripe == ((void *)0) )&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11195&quot;&gt;&lt;del&gt;LU-11195&lt;/del&gt;&lt;/a&gt; lod: Mark comps cached on replay of layout change&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 6533ded780cacd36c7ecff86e0aba32c7efab18e&lt;/p&gt;</comment>
                            <comment id="231531" author="amk" created="Mon, 6 Aug 2018 15:59:04 +0000"  >&lt;p&gt;An additional requirement to trigger this bug is that the components on disk must be flagged as initialized when Process A replays the layout change request.  lod_declare_update_plain() must return -EALREADY. If lod_declare_instantiate_components() is called then obj.ldo_comp_cached will be set and Process B will not LBUG on the next replay.&lt;/p&gt;</comment>
                            <comment id="232943" author="gerrit" created="Tue, 4 Sep 2018 03:48:44 +0000"  >&lt;p&gt;Oleg Drokin (green@whamcloud.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/32904/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/32904/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11195&quot; title=&quot;LBUG: (lod_lov.c:633:lod_free_comp_buffer()) ASSERTION( entry-&amp;gt;llc_stripe == ((void *)0) )&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11195&quot;&gt;&lt;del&gt;LU-11195&lt;/del&gt;&lt;/a&gt; lod: Mark comps cached on replay of layout change&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: e021026d0c37d8806d16dbaad6a9d4f47844c999&lt;/p&gt;</comment>
                            <comment id="232955" author="pjones" created="Tue, 4 Sep 2018 06:06:54 +0000"  >&lt;p&gt;Landed for 2.12&lt;/p&gt;</comment>
                            <comment id="232999" author="gerrit" created="Tue, 4 Sep 2018 17:55:23 +0000"  >&lt;p&gt;Minh Diep (mdiep@whamcloud.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/33110&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/33110&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11195&quot; title=&quot;LBUG: (lod_lov.c:633:lod_free_comp_buffer()) ASSERTION( entry-&amp;gt;llc_stripe == ((void *)0) )&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11195&quot;&gt;&lt;del&gt;LU-11195&lt;/del&gt;&lt;/a&gt; lod: Mark comps cached on replay of layout change&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_10&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: c93c7902048497a40dcb94977f7c47949dd36b3a&lt;/p&gt;</comment>
                            <comment id="240103" author="gerrit" created="Wed, 16 Jan 2019 07:31:05 +0000"  >&lt;p&gt;Oleg Drokin (green@whamcloud.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/33110/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/33110/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11195&quot; title=&quot;LBUG: (lod_lov.c:633:lod_free_comp_buffer()) ASSERTION( entry-&amp;gt;llc_stripe == ((void *)0) )&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11195&quot;&gt;&lt;del&gt;LU-11195&lt;/del&gt;&lt;/a&gt; lod: Mark comps cached on replay of layout change&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_10&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: de87598eaf5eea309faa845209148b787ed29c10&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                                                <inwardlinks description="is duplicated by">
                                                        </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                                        </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="30676" name="dklog_plus_extracts.tgz" size="1511935" author="amk" created="Mon, 30 Jul 2018 20:30:53 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i0001j:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>