<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:58:47 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-13146] crash in lod_sub_recovery_thread</title>
                <link>https://jira.whamcloud.com/browse/LU-13146</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;hit this oops in my boilpot:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[195498.474047] BUG: unable to handle kernel paging request at ffff8802fd732fc4
[195498.474922] IP: [&amp;lt;ffffffffa1283d70&amp;gt;] lod_sub_recovery_thread+0x4b0/0xd00 [lod]
[195498.474922] PGD 241c067 PUD 33e9f9067 PMD 33e80d067 PTE 80000002fd732060
[195498.474922] Oops: 0002 [#1] SMP DEBUG_PAGEALLOC
[195498.474922] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_zfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) pcc_cpufreq zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) crc_t10dif crct10dif_generic sb_edac edac_core iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd virtio_balloon pcspkr virtio_console i2c_piix4 ip_tables rpcsec_gss_krb5 ata_generic drm_kms_helper pata_acpi ttm drm crct10dif_pclmul crct10dif_common ata_piix drm_panel_orientation_quirks crc32c_intel serio_raw virtio_blk i2c_core libata floppy [last unloaded: libcfs]
[195498.474922] CPU: 6 PID: 16186 Comm: lod0000_rec0002 Kdump: loaded Tainted: P           OE  ------------   3.10.0-7.7-debug #1
[195498.474922] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[195498.474922] task: ffff88009f09c4c0 ti: ffff880323750000 task.ti: ffff880323750000
[195498.474922] RIP: 0010:[&amp;lt;ffffffffa1283d70&amp;gt;]  [&amp;lt;ffffffffa1283d70&amp;gt;] lod_sub_recovery_thread+0x4b0/0xd00 [lod]
[195498.474922] RSP: 0018:ffff880323753e10  EFLAGS: 00010286
[195498.474922] RAX: ffff8802fd732f00 RBX: 0000000000000000 RCX: 0000000000000000
[195498.474922] RDX: ffff8802c085b0b8 RSI: 000000000000006b RDI: 0000000000000286
[195498.474922] RBP: ffff880323753ea0 R08: 0000000000000010 R09: ffff8802b63aa900
[195498.474922] R10: 0000000000000000 R11: 000000000000000f R12: ffff8802edc34000
[195498.474922] R13: ffff8802dd5b2000 R14: ffff8802b63aa900 R15: 00000000fffffffb
[195498.474922] FS:  0000000000000000(0000) GS:ffff88033db80000(0000) knlGS:0000000000000000
[195498.474922] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[195498.474922] CR2: ffff8802fd732fc4 CR3: 00000002b6bb6000 CR4: 00000000001607e0
[195498.474922] Call Trace:
[195498.474922]  [&amp;lt;ffffffffa12838c0&amp;gt;] ? lod_trans_stop+0x340/0x340 [lod]
[195498.474922]  [&amp;lt;ffffffff810b8254&amp;gt;] kthread+0xe4/0xf0
[195498.474922]  [&amp;lt;ffffffff810b8170&amp;gt;] ? kthread_create_on_node+0x140/0x140
[195498.474922]  [&amp;lt;ffffffff817e0ddd&amp;gt;] ret_from_fork_nospec_begin+0x7/0x21
[195498.474922]  [&amp;lt;ffffffff810b8170&amp;gt;] ? kthread_create_on_node+0x140/0x140
[195498.474922] Code: f6 05 70 dd 5c ff 04 0f 85 f6 03 00 00 4c 89 f7 e8 86 3d f9 df 48 8b 45 88 48 8b 95 78 ff ff ff c7 40 18 01 00 00 00 48 8b 42 48 &amp;lt;f0&amp;gt; ff 88 c4 00 00 00 48 8b 7a 48 31 c9 ba 01 00 00 00 be 03 00 
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;it&apos;s in this code:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;0x2d9e is in lod_sub_recovery_thread (/home/green/git/lustre-release/lustre/lod/lod_dev.c:478).
473		EXIT;
474	
475	out:
476		OBD_FREE_PTR(lrd);
477		thread-&amp;gt;t_flags = SVC_STOPPED;
478		atomic_dec(&amp;amp;lut-&amp;gt;lut_tdtd-&amp;gt;tdtd_recovery_threads_count);
479		wake_up(&amp;amp;lut-&amp;gt;lut_tdtd-&amp;gt;tdtd_recovery_threads_waitq);
480		wake_up(&amp;amp;thread-&amp;gt;t_ctl_waitq);
481		lu_env_fini(&amp;amp;env);
482		return rc;
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;examining logs it&apos;s a use after free:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;:
00000004:00000010:6.0:1578997461.530712:0:16186:0:(lod_dev.c:476:lod_sub_recover
y_thread()) kfreed &apos;lrd&apos;: 32 at ffff8802b63aa900.
00000004:00000010:1.0:1578997461.538400:0:17416:0:(lod_dev.c:869:lod_fini_distribute_txn()) kfreed &apos;lut-&amp;gt;lut_tdtd&apos;: 256 at ffff8802fd732f00.
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;the oops time converted from since bootup to wallclock is 1578997461.539770&lt;/p&gt;

&lt;p&gt;the call to lod_fini_distribute_txn looks like this:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
                lod_sub_stop_recovery_threads(env, lod);
                lod_fini_distribute_txn(env, lod);
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;and lod_sub_stop_recovery_threads:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
        lod_getref(&amp;amp;lod-&amp;gt;lod_mdt_descs);
        lod_foreach_mdt(lod, mdt) {
                thread = mdt-&amp;gt;ltd_recovery_thread;
                &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (thread &amp;amp;&amp;amp; thread-&amp;gt;t_flags &amp;amp; SVC_RUNNING) {
                        thread-&amp;gt;t_flags = SVC_STOPPING;
                        wake_up(&amp;amp;thread-&amp;gt;t_ctl_waitq);
                        wait_event(thread-&amp;gt;t_ctl_waitq,
                                   thread-&amp;gt;t_flags &amp;amp; SVC_STOPPED);
                        OBD_FREE_PTR(mdt-&amp;gt;ltd_recovery_thread);
                        mdt-&amp;gt;ltd_recovery_thread = NULL;
                }
        }
        lod_putref(lod, &amp;amp;lod-&amp;gt;lod_mdt_descs);
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;there&apos;s no mdt-&amp;gt;ltd_recovery_thread freeing in the logs from lod_sub_stop_recovery_threads so it appears by the time we got to the flags check the lod_sub_recovery_thread has already set it to stopped (it appears that the allocation would also be leaked in this case as I don&apos;t see anything else that would free it)&lt;/p&gt;

&lt;p&gt;hm, the fix is pobably going to be annoying if this code could be entered more than once.&lt;/p&gt;
</description>
                <environment></environment>
        <key id="57817">LU-13146</key>
            <summary>crash in lod_sub_recovery_thread</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="green">Oleg Drokin</reporter>
                        <labels>
                    </labels>
                <created>Thu, 16 Jan 2020 07:25:57 +0000</created>
                <updated>Thu, 16 Jan 2020 07:25:57 +0000</updated>
                                            <version>Lustre 2.14.0</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>2</watches>
                                                                                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i00s4f:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>