<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:32:22 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-3263] llog_osd_next_block(): ASSERTION( last_rec-&gt;lrh_index == tail-&gt;lrt_index ) failed:</title>
                <link>https://jira.whamcloud.com/browse/LU-3263</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;On a SPARC machine, &quot;./llmount.sh&quot; hit this assertion failure:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;LustreError: 20452:0:(llog_osd.c:630:llog_osd_next_block()) ASSERTION( last_rec-&amp;gt;lrh_index == tail-&amp;gt;lrt_index ) failed:
LustreError: 20452:0:(llog_osd.c:630:llog_osd_next_block()) LBUG
Pid: 20452, comm: ll_mgs_0002

Call Trace:

Kernel panic - not syncing: LBUG
Call Trace:
 [00000000103a3194] lbug_with_loc+0x94/0xc0 [libcfs]
 [0000000010535fbc] llog_osd_next_block+0xb5c/0x1000 [obdclass]
 [00000000104f39d0] llog_process_thread+0x2b0/0x13a0 [obdclass]
 [00000000104f4cdc] llog_process_or_fork+0x21c/0x980 [obdclass]
 [000000001090a140] mgs_steal_llog_for_mdt_from_client+0x5e0/0xae0 [mgs]
 [000000001090b120] mgs_write_log_mdt+0xae0/0x3a60 [mgs]
 [00000000109262f8] mgs_write_log_target+0x798/0x20a0 [mgs]
 [00000000108ea624] mgs_handle_target_reg+0xd44/0x17c0 [mgs]
 [00000000108edab8] mgs_handle+0xd18/0x22a0 [mgs]
 [00000000106f5f60] ptlrpc_server_handle_request+0x980/0x16c0 [ptlrpc]
 [00000000106fc130] ptlrpc_main+0xa10/0x1680 [ptlrpc]
 [000000000042ad88] kernel_thread+0x30/0x48
 [00000000103aea44] cfs_create_thread+0x24/0x60 [libcfs]
Press Stop-A (L1-A) to return to the boot prom
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The llog_osd_next_block() lines in question are&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;                tail = (struct llog_rec_tail *)((&lt;span class=&quot;code-object&quot;&gt;char&lt;/span&gt; *)buf + rc -
                                                sizeof(struct llog_rec_tail));
                &lt;span class=&quot;code-comment&quot;&gt;/* get the last record in block */&lt;/span&gt;
                last_rec = (struct llog_rec_hdr *)((&lt;span class=&quot;code-object&quot;&gt;char&lt;/span&gt; *)buf + rc -
                                                   le32_to_cpu(tail-&amp;gt;lrt_len));

                &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (LLOG_REC_HDR_NEEDS_SWABBING(last_rec))
                        lustre_swab_llog_rec(last_rec);
                LASSERT(last_rec-&amp;gt;lrh_index == tail-&amp;gt;lrt_index);
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The le32_to_cpu() call above assumes the data to be little-endian.  That is not true, however, because configuration logs (as well as at least OSP logs) are actually written in host-endianness, which is big-endian on sparc Linux.&lt;/p&gt;

&lt;p&gt;It is not clear what the endianness rule should be.  The comment above the definition of llog_rec_hdr requires little-endianness, while the LLOG_REC_HDR_NEEDS_SWABBING() calls and log writing code suggest host-endianness (or adaptive-endianness).  Enforcing little-endianness requires a larger amount of changes, while host-endianness makes it impossible to find the index of the last record in a chunk in O(1) time, since the record header must be read first to determine endianness.&lt;/p&gt;</description>
                <environment></environment>
        <key id="18685">LU-3263</key>
            <summary>llog_osd_next_block(): ASSERTION( last_rec-&gt;lrh_index == tail-&gt;lrt_index ) failed:</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="5">Cannot Reproduce</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="liwei">Li Wei</reporter>
                        <labels>
                            <label>endianness</label>
                            <label>sparc</label>
                    </labels>
                <created>Thu, 2 May 2013 07:11:39 +0000</created>
                <updated>Sun, 10 Oct 2021 22:19:24 +0000</updated>
                            <resolved>Sun, 10 Oct 2021 22:19:24 +0000</resolved>
                                    <version>Lustre 2.4.0</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>9</watches>
                                                                            <comments>
                            <comment id="57550" author="adilger" created="Thu, 2 May 2013 18:06:08 +0000"  >&lt;p&gt;I think the record header would always have been read initially, so it would be possible to save the endianness of the llog if it can&apos;t be determined in isolation, which shouldn&apos;t change from one call to the next.&lt;/p&gt;</comment>
                            <comment id="57599" author="liwei" created="Fri, 3 May 2013 01:41:41 +0000"  >&lt;p&gt;Strictly speaking, each record could have its own endianness, based on the adaptive-endianness scheme.  And, whether mixed endianness is possible is beyond llog_osd&apos;s knowledge.&lt;/p&gt;</comment>
                            <comment id="57779" author="jhammond" created="Mon, 6 May 2013 22:27:03 +0000"  >&lt;p&gt;For the time being, is it enough just to ensure that future llog records are written in LE?&lt;/p&gt;</comment>
                            <comment id="57790" author="liwei" created="Tue, 7 May 2013 02:02:09 +0000"  >&lt;p&gt;Yes, I think that would be excellent, although considerable work are required.&lt;/p&gt;</comment>
                            <comment id="57829" author="adilger" created="Tue, 7 May 2013 15:55:50 +0000"  >&lt;p&gt;I know there were efforts to that end at one time or another, maybe Mike will recall the details. We have never officially supported having big-endian servers, so this wouldn&apos;t impact existing systems except those SPARC systems from Fujitsu (AFAIK).&lt;/p&gt;</comment>
                            <comment id="57863" author="jhammond" created="Tue, 7 May 2013 21:37:11 +0000"  >&lt;p&gt;We also see misaligned accesses in the llog OSD code:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Kernel unaligned access at TPC[10325280] llog_osd_write_rec+0xca0/0x1c20 [obdclass]
Kernel unaligned access at TPC[10325298] llog_osd_write_rec+0xcb8/0x1c20 [obdclass]
Kernel unaligned access at TPC[103252b4] llog_osd_write_rec+0xcd4/0x1c20 [obdclass]
Kernel unaligned access at TPC[103252d0] llog_osd_write_rec+0xcf0/0x1c20 [obdclass]
Kernel unaligned access at TPC[103252f8] llog_osd_write_rec+0xd18/0x1c20 [obdclass]
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="57876" author="liwei" created="Wed, 8 May 2013 01:07:48 +0000"  >&lt;p&gt;I checked every of these unaligned accesses.  All were resulted from the last CDEBUG() in llog_osd_write_rec().  These, although need to be fixed, shouldn&apos;t be harmful at the moment.&lt;/p&gt;

&lt;p&gt;The root cause is an interesting semantics of the &quot;packed&quot; attribute.  From the GCC manual:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;`-Wpacked&apos;&lt;br/&gt;
     Warn if a structure is given the packed attribute, but the packed&lt;br/&gt;
     attribute has no effect on the layout or size of the structure.&lt;br/&gt;
     Such structures may be mis-aligned for little benefit.  For&lt;br/&gt;
     instance, in this code, the variable `f.x&apos; in `struct bar&apos; will be&lt;br/&gt;
     misaligned even though `struct bar&apos; does not itself have the&lt;br/&gt;
     packed attribute:&lt;/p&gt;

&lt;p&gt;          struct foo &lt;/p&gt;
&lt;div class=&quot;error&quot;&gt;&lt;span class=&quot;error&quot;&gt;Unknown macro: {            int x;            char a, b, c, d;          }&lt;/span&gt; &lt;/div&gt;
&lt;p&gt; _&lt;em&gt;attribute&lt;/em&gt;_((packed));&lt;br/&gt;
          struct bar &lt;/p&gt;
&lt;div class=&quot;error&quot;&gt;&lt;span class=&quot;error&quot;&gt;Unknown macro: {            char z;            struct foo f;          }&lt;/span&gt; &lt;/div&gt;
&lt;p&gt;;&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;This led me thinking whether we should use &quot;packed&quot; for structure definitions at all.  But anyway, it seems this could be resolved a bit later.&lt;/p&gt;</comment>
                            <comment id="58733" author="green" created="Fri, 17 May 2013 04:28:54 +0000"  >&lt;p&gt;If we have anything like what&apos;s described in the gcc manual, that&apos;s our bug.&lt;/p&gt;

&lt;p&gt;We should NOT mix packed and non-packed structures in the same structure.&lt;/p&gt;

&lt;p&gt;Overall packed structures are for tings on the wire/on disk. Even then we can do without if we carefully do our own layout of the structures.&lt;/p&gt;

&lt;p&gt;For any bad structures like that we have right now (esp. in OSD), we need to fix them by yesterday so 2.4.0 has all of those changes and there is no protocol breakage going forward.&lt;br/&gt;
So I am expecting a patch real soon.&lt;/p&gt;</comment>
                            <comment id="58735" author="liwei" created="Fri, 17 May 2013 04:43:42 +0000"  >&lt;p&gt;The structure caused the unaligned accesses above is llog_handle.  It is not packed, but contains a packed llog_logid lgh_id, which was misaligned on sparc.  llog_handle instances are not for wire or disk, AFAIK.  Hence, this defect doesn&apos;t affect protocol or disk format.&lt;/p&gt;</comment>
                            <comment id="120819" author="cliffw" created="Thu, 9 Jul 2015 14:25:15 +0000"  >&lt;p&gt;We are seeing this now on lola with 2.7.56&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;Jul  8 15:18:27 lola-11 kernel: LustreError: 14265:0:(llog_osd.c:784:llog_osd_next_block()) ASSERTION( last_rec-&amp;gt;lrh_index == tail-&amp;gt;lrt_index ) failed:
Jul  8 15:18:27 lola-11 kernel: LustreError: 14265:0:(llog_osd.c:784:llog_osd_next_block()) LBUG
Jul  8 15:18:27 lola-11 kernel: Pid: 14265, comm: lod0006_rec0005
Jul  8 15:18:27 lola-11 kernel:
Jul  8 15:18:27 lola-11 kernel: Call Trace:
Jul  8 15:18:27 lola-11 kernel: [&amp;lt;ffffffffa0741875&amp;gt;] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
Jul  8 15:18:27 lola-11 kernel: [&amp;lt;ffffffffa0741e77&amp;gt;] lbug_with_loc+0x47/0xb0 [libcfs]
Jul  8 15:18:27 lola-11 kernel: [&amp;lt;ffffffffa085d397&amp;gt;] llog_osd_next_block+0xb37/0xbc0 [obdclass]
Jul  8 15:18:27 lola-11 kernel: [&amp;lt;ffffffffa084b0e6&amp;gt;] llog_process_thread+0x286/0xfd0 [obdclass]
Jul  8 15:18:27 lola-11 kernel: [&amp;lt;ffffffffa084d9d4&amp;gt;] ? llog_init_handle+0x104/0xbb0 [obdclass]
Jul  8 15:18:27 lola-11 kernel: [&amp;lt;ffffffffa12d2f20&amp;gt;] ? lod_process_recovery_updates+0x0/0x420 [lod]
Jul  8 15:18:27 lola-11 kernel: [&amp;lt;ffffffffa084bf6f&amp;gt;] llog_process_or_fork+0x13f/0x690 [obdclass]
Jul  8 15:18:27 lola-11 kernel: [&amp;lt;ffffffffa0850b68&amp;gt;] llog_cat_process_cb+0x458/0x600 [obdclass]
Jul  8 15:18:27 lola-11 kernel: [&amp;lt;ffffffffa084b9e2&amp;gt;] llog_process_thread+0xb82/0xfd0 [obdclass]
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="125027" author="jamesanunez" created="Tue, 25 Aug 2015 15:13:54 +0000"  >&lt;p&gt;We&apos;ve seen this issue recently in the test results for a patch to master (pre-2.8). The patch is modifying some llog routines. The logs are at:&lt;br/&gt;
2015-08-24 22:36:54 - &lt;a href=&quot;https://testing.hpdd.intel.com/test_sets/fffdc95e-4ad0-11e5-b2ff-5254006e85c2&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.hpdd.intel.com/test_sets/fffdc95e-4ad0-11e5-b2ff-5254006e85c2&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="125032" author="simmonsja" created="Tue, 25 Aug 2015 16:19:09 +0000"  >&lt;p&gt;The patch in question is for &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6968&quot; title=&quot;Update the whole header in llog_cancel_rec()&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6968&quot;&gt;&lt;del&gt;LU-6968&lt;/del&gt;&lt;/a&gt;&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="31391">LU-6968</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzvq0n:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>8082</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>