<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:17:24 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-8422] llog_osd.c:165:llog_osd_pad()) ASSERTION( len &gt;= (24) &amp;&amp; (len &amp; 0x7) == 0 )</title>
                <link>https://jira.whamcloud.com/browse/LU-8422</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;During 2.8.0 DNE testing (in particular tag 2.8.0_0.0.llnlpreview.25), we hit the following assertion on of the MDS nodes:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;2016-07-20 09:39:28 [87775.729342] LustreError: 28402:0:(llog_osd.c:165:llog_osd_pad()) ASSERTION( len &amp;gt;= (24) &amp;amp;&amp;amp; (len &amp;amp; 0x7) == 0 ) failed:
2016-07-20 09:39:28 [87775.741933] LustreError: 28402:0:(llog_osd.c:165:llog_osd_pad()) LBUG
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This is nothing else much happening in the console log except for regular occurrences of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6447&quot; title=&quot;mdt_identity_upcall calls sleeping function under rwlock&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6447&quot;&gt;&lt;del&gt;LU-6447&lt;/del&gt;&lt;/a&gt;.  The most recent hit of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6447&quot; title=&quot;mdt_identity_upcall calls sleeping function under rwlock&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6447&quot;&gt;&lt;del&gt;LU-6447&lt;/del&gt;&lt;/a&gt; was about two minutes earlier.&lt;/p&gt;

&lt;p&gt;There is no available crash dump.  No additional logs since the node LBUGs on assertion.&lt;/p&gt;</description>
                <environment></environment>
        <key id="38313">LU-8422</key>
            <summary>llog_osd.c:165:llog_osd_pad()) ASSERTION( len &gt;= (24) &amp;&amp; (len &amp; 0x7) == 0 )</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="3">Duplicate</resolution>
                                        <assignee username="tappro">Mikhail Pershin</assignee>
                                    <reporter username="morrone">Christopher Morrone</reporter>
                        <labels>
                            <label>llnl</label>
                    </labels>
                <created>Wed, 20 Jul 2016 22:20:42 +0000</created>
                <updated>Fri, 18 Aug 2017 20:19:16 +0000</updated>
                            <resolved>Thu, 18 Aug 2016 21:23:14 +0000</resolved>
                                                                        <due></due>
                            <votes>0</votes>
                                    <watches>7</watches>
                                                                            <comments>
                            <comment id="159543" author="morrone" created="Thu, 21 Jul 2016 20:40:54 +0000"  >&lt;p&gt;This is now hitting frequently, sometimes within minutes of recovery completing on the MDS.  This particular test system is hosed until we have a fix for this.&lt;/p&gt;</comment>
                            <comment id="159545" author="morrone" created="Thu, 21 Jul 2016 20:52:05 +0000"  >&lt;p&gt;And on the plus side, if you have any debug patches that you would like me to try, I can probably get you the results pretty quickly.&lt;/p&gt;</comment>
                            <comment id="159557" author="morrone" created="Fri, 22 Jul 2016 00:13:25 +0000"  >&lt;p&gt;We got crash dumps hacked into a basically working state.  The full backtrace of the latest hit of this assertion looks like this:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;PID: 14429  TASK: ffff887f23d90b80  CPU: 12  COMMAND: &quot;mdt03_004&quot;
 #0 [ffff887ee35c35d0] machine_kexec at ffffffff810545db
 #1 [ffff887ee35c3630] crash_kexec at ffffffff810f7ec2
 #2 [ffff887ee35c3700] panic at ffffffff816456b6
 #3 [ffff887ee35c3780] lbug_with_loc at ffffffffa0858dfb [libcfs]
 #4 [ffff887ee35c37a0] llog_osd_pad at ffffffffa093b171 [obdclass]
 #5 [ffff887ee35c37f8] llog_osd_write_rec at ffffffffa09407d0 [obdclass]
 #6 [ffff887ee35c3898] llog_write_rec at ffffffffa092f97a [obdclass]
 #7 [ffff887ee35c38e0] llog_cat_add_rec at ffffffffa0934108 [obdclass]
 #8 [ffff887ee35c3930] llog_add at ffffffffa092ca8a [obdclass]
 #9 [ffff887ee35c3970] sub_updates_write at ffffffffa0c508cf [ptlrpc]
#10 [ffff887ee35c3a08] top_trans_stop at ffffffffa0c3eb96 [ptlrpc]
#11 [ffff887ee35c3a60] lod_trans_stop at ffffffffa0f27439 [lod]
#12 [ffff887ee35c3ad8] mdd_trans_stop at ffffffffa0fb70ca [mdd]
#13 [ffff887ee35c3ae8] mdd_unlink at ffffffffa0fa6b7a [mdd]
#14 [ffff887ee35c3bb0] mdt_reint_unlink at ffffffffa0e9a470 [mdt]
#15 [ffff887ee35c3c48] mdt_reint_rec at ffffffffa0e9d970 [mdt]
#16 [ffff887ee35c3c70] mdt_reint_internal at ffffffffa0e80b01 [mdt]
#17 [ffff887ee35c3ca8] mdt_reint at ffffffffa0e8a3c7 [mdt]
#18 [ffff887ee35c3cd8] tgt_request_handle at ffffffffa0c2aa65 [ptlrpc]
#19 [ffff887ee35c3d20] ptlrpc_server_handle_request at ffffffffa0bd61ab [ptlrpc]
#20 [ffff887ee35c3de8] ptlrpc_main at ffffffffa0bda2f0 [ptlrpc]
#21 [ffff887ee35c3ec8] kthread at ffffffff810a997f
#22 [ffff887ee35c3f50] ret_from_fork at ffffffff8165d658
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This time there was also some additional console traffic right before the assertion:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;2016-07-21 16:45:44 [ 2587.821304] LustreError: 14086:0:(llog_cat.c:712:llog_cat_cancel_records()) lquake-MDT000f-osp-MDT0005: fail to cancel 1 of 1 llog-records: rc = -116
2016-07-21 16:45:49 [ 2592.856881] LustreError: 14086:0:(llog_cat.c:712:llog_cat_cancel_records()) lquake-MDT000a-osp-MDT0005: fail to cancel 1 of 1 llog-records: rc = -116
2016-07-21 16:45:54 [ 2597.877414] LustreError: 14086:0:(llog_cat.c:712:llog_cat_cancel_records()) lquake-MDT0001-osp-MDT0005: fail to cancel 1 of 1 llog-records: rc = -116
2016-07-21 16:45:54 [ 2597.893545] LustreError: 14086:0:(llog_cat.c:712:llog_cat_cancel_records()) Skipped 3 previous similar messages
2016-07-21 16:45:59 [ 2602.962942] LustreError: 14086:0:(llog_cat.c:712:llog_cat_cancel_records()) lquake-MDT000c-osp-MDT0005: fail to cancel 1 of 1 llog-records: rc = -116
2016-07-21 16:45:59 [ 2602.979124] LustreError: 14086:0:(llog_cat.c:712:llog_cat_cancel_records()) Skipped 6 previous similar messages
2016-07-21 16:46:04 [ 2607.966714] LustreError: 14086:0:(llog_cat.c:712:llog_cat_cancel_records()) lquake-MDT000c-osp-MDT0005: fail to cancel 1 of 1 llog-records: rc = -116
2016-07-21 16:46:04 [ 2607.982948] LustreError: 14086:0:(llog_cat.c:712:llog_cat_cancel_recor
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="159722" author="tappro" created="Mon, 25 Jul 2016 14:01:11 +0000"  >&lt;p&gt;I think I&apos;ve found the reason, but there is no simple fix for that. I am thinking about proper solution.&lt;/p&gt;</comment>
                            <comment id="159852" author="gerrit" created="Tue, 26 Jul 2016 07:00:54 +0000"  >&lt;p&gt;Mike Pershin (mike.pershin@intel.com) uploaded a new patch: &lt;a href=&quot;http://review.whamcloud.com/21509&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/21509&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-8422&quot; title=&quot;llog_osd.c:165:llog_osd_pad()) ASSERTION( len &amp;gt;= (24) &amp;amp;&amp;amp; (len &amp;amp; 0x7) == 0 )&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-8422&quot;&gt;&lt;del&gt;LU-8422&lt;/del&gt;&lt;/a&gt; llog: extended debug info&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 955096bddc204c5f2a34143e351ccf4cbbd9da8d&lt;/p&gt;</comment>
                            <comment id="159853" author="tappro" created="Tue, 26 Jul 2016 07:01:50 +0000"  >&lt;p&gt;this is debug patch to prove the issue reason.&lt;/p&gt;</comment>
                            <comment id="159975" author="morrone" created="Tue, 26 Jul 2016 20:49:35 +0000"  >&lt;p&gt;Thanks, we reproduced the assertion with the 21509 debug patch applied.  We hit the expanded LASSERTF, of course:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[ 1465.793319] LustreError: 13881:0:(llog_osd.c:167:llog_osd_pad()) ASSERTION( len &amp;gt;= LLOG_MIN_REC_SIZE &amp;amp;&amp;amp; (len &amp;amp; 0x7) == 0 ) failed: wrong pad len 8, off 65528, index 3
[ 1465.793320] LustreError: 13881:0:(llog_osd.c:167:llog_osd_pad()) LBUG
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I do &lt;em&gt;not&lt;/em&gt; see the new &quot;Failed write llog index&quot; CERROR message.&lt;/p&gt;</comment>
                            <comment id="160254" author="morrone" created="Thu, 28 Jul 2016 23:16:35 +0000"  >&lt;p&gt;Was that what you expected to see?&lt;/p&gt;</comment>
                            <comment id="160313" author="tappro" created="Fri, 29 Jul 2016 16:41:36 +0000"  >&lt;p&gt;Chris, yes, extended info in assertion was helpful. I&apos;ve just updated this patch with more debug and few workarounds. Could you try it?&lt;/p&gt;</comment>
                            <comment id="160334" author="morrone" created="Fri, 29 Jul 2016 17:57:13 +0000"  >&lt;p&gt;Yes, I will get that on the testbed today.  Thanks.&lt;/p&gt;</comment>
                            <comment id="160342" author="morrone" created="Fri, 29 Jul 2016 18:17:57 +0000"  >&lt;p&gt;It looks like there is a fair bit of conflict between your patch and the latest in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-8370&quot; title=&quot;ASSERTION( lur-&amp;gt;lur_hdr.lrh_len &amp;lt;= ctxt-&amp;gt;loc_chunk_size )&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-8370&quot;&gt;&lt;del&gt;LU-8370&lt;/del&gt;&lt;/a&gt;.  Do you know that you two are working in the same area?&lt;/p&gt;

&lt;p&gt;I&apos;ll need direction on which set of patches to try.&lt;/p&gt;</comment>
                            <comment id="160375" author="gerrit" created="Sat, 30 Jul 2016 07:08:15 +0000"  >&lt;p&gt;Mike Pershin (mike.pershin@intel.com) uploaded a new patch: &lt;a href=&quot;http://review.whamcloud.com/21609&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/21609&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-8422&quot; title=&quot;llog_osd.c:165:llog_osd_pad()) ASSERTION( len &amp;gt;= (24) &amp;amp;&amp;amp; (len &amp;amp; 0x7) == 0 )&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-8422&quot;&gt;&lt;del&gt;LU-8422&lt;/del&gt;&lt;/a&gt; update: add more debug info for the ticket&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 0b3881152f6b95f1fb04e559f6905c6f5636f230&lt;/p&gt;</comment>
                            <comment id="160376" author="tappro" created="Sat, 30 Jul 2016 07:12:13 +0000"  >&lt;p&gt;Chris, I&apos;ve just updated patch to be applied over the one from the &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-8370&quot; title=&quot;ASSERTION( lur-&amp;gt;lur_hdr.lrh_len &amp;lt;= ctxt-&amp;gt;loc_chunk_size )&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-8370&quot;&gt;&lt;del&gt;LU-8370&lt;/del&gt;&lt;/a&gt; ticket. Try the patch in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-8370&quot; title=&quot;ASSERTION( lur-&amp;gt;lur_hdr.lrh_len &amp;lt;= ctxt-&amp;gt;loc_chunk_size )&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-8370&quot;&gt;&lt;del&gt;LU-8370&lt;/del&gt;&lt;/a&gt; first, the apply mine or apply both at once.&lt;/p&gt;</comment>
                            <comment id="160833" author="morrone" created="Thu, 4 Aug 2016 18:00:22 +0000"  >&lt;p&gt;Building power and cooling issues have been delaying work on the testbed, but I think those problems are worked out now.&lt;/p&gt;

&lt;p&gt;We hit a NULL pointer dereference when running with just the updated &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-8370&quot; title=&quot;ASSERTION( lur-&amp;gt;lur_hdr.lrh_len &amp;lt;= ctxt-&amp;gt;loc_chunk_size )&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-8370&quot;&gt;&lt;del&gt;LU-8370&lt;/del&gt;&lt;/a&gt; patch (patch set 5).  More info in that issue.&lt;/p&gt;

&lt;p&gt;I can try with both patches applied if that seems worth while.  What do you think?&lt;/p&gt;</comment>
                            <comment id="160868" author="tappro" created="Fri, 5 Aug 2016 00:52:49 +0000"  >&lt;p&gt;yes, please, let&apos;s try both patches. If &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-8370&quot; title=&quot;ASSERTION( lur-&amp;gt;lur_hdr.lrh_len &amp;lt;= ctxt-&amp;gt;loc_chunk_size )&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-8370&quot;&gt;&lt;del&gt;LU-8370&lt;/del&gt;&lt;/a&gt; will cause issues than it will be better to run just patch &lt;a href=&quot;http://review.whamcloud.com/21509&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/21509&lt;/a&gt; that will be helpful too.&lt;/p&gt;</comment>
                            <comment id="162452" author="morrone" created="Thu, 18 Aug 2016 21:21:05 +0000"  >&lt;p&gt;We haven&apos;t seen this in a while, since running the &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7800&quot; title=&quot;Panic during recovery of soak-test.&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7800&quot;&gt;&lt;del&gt;LU-7800&lt;/del&gt;&lt;/a&gt; and &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-8370&quot; title=&quot;ASSERTION( lur-&amp;gt;lur_hdr.lrh_len &amp;lt;= ctxt-&amp;gt;loc_chunk_size )&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-8370&quot;&gt;&lt;del&gt;LU-8370&lt;/del&gt;&lt;/a&gt; patches starting at 2.8.0_0.0.llnlpreview.32.&lt;/p&gt;</comment>
                            <comment id="162455" author="pjones" created="Thu, 18 Aug 2016 21:23:14 +0000"  >&lt;p&gt;Good news! Thanks for updating Chris&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="47719">LU-9848</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzyi2f:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>