<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:14:00 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-14932] runtests: test_1 llog_cat_cleanup()) ASSERTION( index ) on MDS</title>
                <link>https://jira.whamcloud.com/browse/LU-14932</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;This issue was created by maloo for Andreas Dilger &amp;lt;adilger@whamcloud.com&amp;gt;&lt;/p&gt;

&lt;p&gt;This issue relates to the following test suite run: &lt;a href=&quot;https://testing.whamcloud.com/test_sets/1ff8d9a5-c3da-4835-8739-9f790d3c2491&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/1ff8d9a5-c3da-4835-8739-9f790d3c2491&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;test_1 crashed on the MDS with the following error:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;onyx-44vm9 crashed during runtests test_1

LustreError: 138526:0:(llog_cat.c:1162:llog_cat_cleanup()) ASSERTION( index ) failed: 
LustreError: 138526:0:(llog_cat.c:1162:llog_cat_cleanup()) LBUG
Pid: 138526, comm: lod0001_rec0000 4.18.0-240.22.1.el8_lustre.x86_64 #1 SMP Fri Jul 30 19:47:15 UTC 2021
header
Call Trace TBD:
libcfs_call_trace+0x6f/0x90 [libcfs]
lbug_with_loc+0x43/0x80 [libcfs]
llog_cat_cleanup+0x391/0x3d0 [obdclass]
llog_cat_close+0x193/0x210 [obdclass]
lod_sub_recovery_th6+0x1e3/0xb40 [lod]
kthread+0x112/0x130

LustreError: 143361:0:(llog.c:1149:llog_write_rec()) lustre-MDT0000-osp-MDT0001: loghandle 0000000062d00541 with no 
LustreError: 143361:0:(llog_cat.c:602:llog_cat_add_rec()) llog_write_rec -71: lh=0000000062d00541
LustreError: 143361:0:(update_trans.c:1062:top_trans_stop()) lustre-MDT0000-osp-MDT0001: write updates failed: rc = -71
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;A second test had a similar MDS crash with a slightly different stack:&lt;br/&gt;
&lt;a href=&quot;https://testing.whamcloud.com/test_sets/366c2ba7-795e-4856-b4c4-9f2cce973618&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/366c2ba7-795e-4856-b4c4-9f2cce973618&lt;/a&gt;&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;general protection fault: 0000 [#1] SMP PTI
CPU: 0 PID: 139728 Comm: mdt00_002  4.18.0-240.22.1.el8_lustre.x86_64 #1
RIP: 0010:__list_add_valid+0x10/0x50
Call Trace:
 llog_cat_prep_log+0x311/0x3c0 [obdclass]
 llog_cat_declare_add_rec+0xbe/0x220 [obdclass]
 llog_declare_add+0x187/0x1d0 [obdclass]
 top_trans_start+0x212/0x940 [ptlrpc]
 mdd_unlink+0x4a0/0xb30 [mdd]
 mdt_reint_unlink+0xb0c/0x12a0 [mdt]
 mdt_reint_rec+0x11f/0x250 [mdt]
 mdt_reint_internal+0x498/0x780 [mdt]
 mdt_reint+0x5e/0x100 [mdt]
 tgt_request_handle+0xc90/0x1940 [ptlrpc]
 ptlrpc_server_handle_request+0x323/0xbc0 [ptlrpc]
 ptlrpc_main+0xba2/0x1490 [ptlrpc]
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;A third test crashed the MDS with a different operation, but also in llog list handling:&lt;br/&gt;
&lt;a href=&quot;https://testing.whamcloud.com/test_sets/b7099363-3b2c-4b7a-ad54-795ca4541ddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/b7099363-3b2c-4b7a-ad54-795ca4541ddc&lt;/a&gt;&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;general protection fault: 0000 [#1] SMP PTI
CPU: 0 PID: 138567 Comm: mdt00_002 4.18.0-240.22.1.el8_lustre.x86_64 #1
RIP: 0010:__list_add_valid+0x10/0x50
Call Trace:
 llog_cat_prep_log+0x311/0x3c0 [obdclass]
 llog_cat_declare_add_rec+0xbe/0x220 [obdclass]
 llog_declare_add+0x187/0x1d0 [obdclass]
 top_trans_start+0x212/0x940 [ptlrpc]
 mdd_create+0xb42/0x1870 [mdd]
 mdt_create+0x7a7/0xc20 [mdt]
 mdt_reint_create+0x30b/0x3c0 [mdt]
 mdt_reint_rec+0x11f/0x250 [mdt]
 mdt_reint_internal+0x498/0x780 [mdt]
 mdt_reint+0x5e/0x100 [mdt]
 tgt_request_handle+0xc90/0x1940 [ptlrpc]
 ptlrpc_server_handle_request+0x323/0xbc0 [ptlrpc]
 ptlrpc_main+0xba2/0x1490 [ptlrpc]
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Searching back through the Maloo crashes of runtests to the start of the year, it appears this started failing with this ASSERTION on 2021-07-31 (though there are other, unlrelated crashes in runtests due to bugs in under-development patches).&lt;/p&gt;
</description>
                <environment></environment>
        <key id="65610">LU-14932</key>
            <summary>runtests: test_1 llog_cat_cleanup()) ASSERTION( index ) on MDS</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="3">Duplicate</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="maloo">Maloo</reporter>
                        <labels>
                    </labels>
                <created>Wed, 11 Aug 2021 19:55:43 +0000</created>
                <updated>Thu, 15 Dec 2022 21:04:53 +0000</updated>
                            <resolved>Thu, 14 Oct 2021 22:25:58 +0000</resolved>
                                                                        <due></due>
                            <votes>0</votes>
                                    <watches>2</watches>
                                                                            <comments>
                            <comment id="309975" author="adilger" created="Wed, 11 Aug 2021 19:55:55 +0000"  >&lt;p&gt;It appears from the current failures that these are all happening with ZFS and after replay-single fails with &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-10729&quot; title=&quot;replay-dual test_23d: FAIL: Remote creation failed 1 : mkdir: cannot create directory&amp;#39;: File exists&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-10729&quot;&gt;&lt;del&gt;LU-10729&lt;/del&gt;&lt;/a&gt;.&lt;br/&gt;
While the &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-10729&quot; title=&quot;replay-dual test_23d: FAIL: Remote creation failed 1 : mkdir: cannot create directory&amp;#39;: File exists&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-10729&quot;&gt;&lt;del&gt;LU-10729&lt;/del&gt;&lt;/a&gt; failure has been around for quite a while, the runtests crash is new and should be fixed.  Patches that landed on 2021-07-31 are:&lt;/p&gt;

&lt;p&gt;e9cffb256d &lt;a href=&quot;https://review.whamcloud.com/44373&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;LU-14880 libcfs: Use crypto/sha2.h if available&lt;/a&gt;&lt;br/&gt;
39e4c97530 &lt;a href=&quot;https://review.whamcloud.com/44363&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;LU-14093 gss: gcc10 fixes for GSS&lt;/a&gt;&lt;br/&gt;
db0b09018e &lt;a href=&quot;https://review.whamcloud.com/44150&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;LU-13299 lnet: add &quot;stats reset&quot; to lnetctl&lt;/a&gt;&lt;br/&gt;
4668283cd1 &lt;a href=&quot;https://review.whamcloud.com/44139&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;LU-14806 o2iblnd: clear fatal error on successful failover&lt;/a&gt;&lt;br/&gt;
b9c4dc3c33 &lt;a href=&quot;https://review.whamcloud.com/44090&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;LU-14792 llite: enable filesystem-wide default LMV&lt;/a&gt; +&lt;br/&gt;
b7bd4e3422 &lt;a href=&quot;https://review.whamcloud.com/43366&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;LU-14621 mdd: fix lock-tx order in mdd_xattr_merge()&lt;/a&gt; &lt;b&gt;!&lt;/b&gt;&lt;br/&gt;
3e04b0fd6c &lt;a href=&quot;https://review.whamcloud.com/38553&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;LU-13417 mdd: set default LMV on ROOT&lt;/a&gt; +&lt;/p&gt;

&lt;p&gt;The patch that is the highest probability of introducing this failure is marked with &quot;&lt;b&gt;!&lt;/b&gt;&quot;.  The default LMV changes marked with + do not really change the code as much as they change the behavior of the tests themselves to be more likely to use remote DNE directories, but that is largely driven by the client and &lt;em&gt;shouldn&apos;t&lt;/em&gt; cause the client to crash.&lt;/p&gt;

&lt;p&gt;The other change to the test environment on 2021-07-28 was the increase of VM RAM for ZFS from 2GB to 3GB, though it would be confusing if more memory &lt;b&gt;caused&lt;/b&gt; the MDS to crash.&lt;/p&gt;

&lt;p&gt;However, since the crash is only hit every 3-5 days, it may have been first introduced by a batch of landings on 2021-07-26, but none of these patches appear to modify any related code:&lt;/p&gt;

&lt;p&gt;adc1bbbf20 &lt;a href=&quot;https://review.whamcloud.com/40813&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;LU-13602 pcc: add LCM_FL_PCC_RDONLY layout flag&lt;/a&gt;&lt;br/&gt;
6717c573ed &lt;a href=&quot;https://review.whamcloud.com/44152&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;LU-14814 osc: osc: Do not flush on lockless cancel&lt;/a&gt;&lt;br/&gt;
5ad00e36ec &lt;a href=&quot;https://review.whamcloud.com/44205&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;LU-14838 osc: Remove client contention support&lt;/a&gt;&lt;br/&gt;
6335dba839 &lt;a href=&quot;https://review.whamcloud.com/44204&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;LU-14838 osc: Remove lockless truncate&lt;/a&gt;&lt;br/&gt;
592d9a737b &lt;a href=&quot;https://review.whamcloud.com/44332&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;LU-9859 libcfs: make lnet_debugfs_symlink_def local to libcfs/modules.c&lt;/a&gt;&lt;br/&gt;
4b52ea1d30 &lt;a href=&quot;https://review.whamcloud.com/44185&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;LU-14637 flr: get rid of excluding dom+flr support test&lt;/a&gt;&lt;br/&gt;
5a28f3bc4b &lt;a href=&quot;https://review.whamcloud.com/44184&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;LU-14789 tests: make sanity 133f and 133g working&lt;/a&gt;&lt;br/&gt;
449d046e55 &lt;a href=&quot;https://review.whamcloud.com/44091&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;LU-14788 lnet: check memdup_user_nul using IS_ERR&lt;/a&gt;&lt;br/&gt;
393885c027 &lt;a href=&quot;https://review.whamcloud.com/44022&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;LU-13055 doc: update changelog manpages&lt;/a&gt;&lt;br/&gt;
66dcbd503f &lt;a href=&quot;https://review.whamcloud.com/43961&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;LU-14748 build: gcc9 fix address of packed member warning&lt;/a&gt;&lt;br/&gt;
3ffa5d680f &lt;a href=&quot;https://review.whamcloud.com/43939&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;LU-14740 llite: avoid project quota overflow&lt;/a&gt;&lt;br/&gt;
b1ed8e57da &lt;a href=&quot;https://review.whamcloud.com/43740&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;LU-14430 mdd: rename mti_fid to mdi_fid and friends&lt;/a&gt;&lt;br/&gt;
f18c87cb53 &lt;a href=&quot;https://review.whamcloud.com/43388&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;LU-13717 sec: handle null algo for filename encryption&lt;/a&gt;&lt;br/&gt;
87c4535f7a &lt;a href=&quot;https://review.whamcloud.com/39482&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;LU-13799 osc: Improve osc_queue_sync_pages&lt;/a&gt;&lt;br/&gt;
b855397878 &lt;a href=&quot;https://review.whamcloud.com/39448&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;LU-13799 clio: Skip prep for transients&lt;/a&gt;&lt;br/&gt;
1e4d10af39 &lt;a href=&quot;https://review.whamcloud.com/39447&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;LU-13799 llite: Adjust dio refcounting&lt;/a&gt;&lt;br/&gt;
d31647c017 &lt;a href=&quot;https://review.whamcloud.com/39446&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;LU-13799 lov: Improve DIO submit&lt;/a&gt;&lt;br/&gt;
587e5aa834 &lt;a href=&quot;https://review.whamcloud.com/39441&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;LU-13799 llite: Remove transient page counting&lt;/a&gt;&lt;br/&gt;
b3de247b76 &lt;a href=&quot;https://review.whamcloud.com/39442&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;LU-13799 llite: Modify AIO/DIO reference counting&lt;/a&gt;&lt;br/&gt;
7a2ef25f1f &lt;a href=&quot;https://review.whamcloud.com/37798&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;LU-13326 mds: remove MDS_SETATTR_PORTAL and service&lt;/a&gt;&lt;br/&gt;
618625af42 &lt;a href=&quot;https://review.whamcloud.com/44315&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;LU-13417 test: mkdir_on_mdt0() in more tests&lt;/a&gt;&lt;br/&gt;
d87af24452 &lt;a href=&quot;https://review.whamcloud.com/43503&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;LU-14655 lnet: Protect lpni deref in lnet_health_check&lt;/a&gt;&lt;/p&gt;

</comment>
                            <comment id="310999" author="adilger" created="Tue, 24 Aug 2021 15:26:08 +0000"  >&lt;p&gt;+1 on master &lt;a href=&quot;https://testing.whamcloud.com/test_sets/3b075625-6d30-4bdf-a180-95dc1024dda8&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/3b075625-6d30-4bdf-a180-95dc1024dda8&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="311000" author="adilger" created="Tue, 24 Aug 2021 15:28:27 +0000"  >&lt;p&gt;Hit 6 other times in the past 4 weeks.&lt;/p&gt;</comment>
                            <comment id="314459" author="adilger" created="Thu, 30 Sep 2021 23:32:06 +0000"  >&lt;p&gt;May be fixed by patch: &lt;a href=&quot;https://review.whamcloud.com/44998&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/44998&lt;/a&gt; &quot;&lt;tt&gt;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14474&quot; title=&quot;Oops in llog_cat_prep_log() in sanity-quota / recovery-small&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14474&quot;&gt;&lt;del&gt;LU-14474&lt;/del&gt;&lt;/a&gt; llog: reset pointer to the next llog&lt;/tt&gt;&quot;.&lt;/p&gt;</comment>
                            <comment id="315636" author="adilger" created="Thu, 14 Oct 2021 22:20:57 +0000"  >&lt;p&gt;One test failure was seen today after the landing of patch 44998 5 days ago as v2_14_55-16-g4521f6af35, but the failed patch was based on an old parent v2_14_55-1-g1a409a3e6a that did not include that fix:&lt;br/&gt;
&lt;a href=&quot;https://testing.whamcloud.com/test_sessions/805cf1c4-4678-405e-91a1-2d94b53d345d&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sessions/805cf1c4-4678-405e-91a1-2d94b53d345d&lt;/a&gt;&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="63090">LU-14474</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="63086">LU-15139</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="65799">LU-14964</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="73627">LU-16398</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i021jr:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>