<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:22:07 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-8968] Use after free in osp_precreate_thread()</title>
                <link>https://jira.whamcloud.com/browse/LU-8968</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;I am hitting this relatively frequently now:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[101711.727214] Lustre: DEBUG MARKER: == replay-dual test 19: resend of open request ======================================================= 15:24:11 (1482438251)
[101712.093258] Turning device loop0 (0x700000) read-only
[101712.111290] Lustre: DEBUG MARKER: mds1 REPLAY BARRIER on lustre-MDT0000
[101712.116120] Lustre: DEBUG MARKER: local REPLAY BARRIER on lustre-MDT0000
[101712.627330] LustreError: 4778:0:(client.c:1166:ptlrpc_import_delay_req()) @@@ IMP_CLOSED   req@ffff880074df8c40 x1554448289689696/t0(0) o13-&amp;gt;lustre-OST0001-osc-MDT0000@0@lo:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1
[101712.638417] BUG: unable to handle kernel paging request at ffff88006dfc7954
[101712.638943] IP: [&amp;lt;ffffffff8138ea39&amp;gt;] do_raw_spin_lock+0x9/0x150
[101712.639385] PGD 2e75067 PUD bcc1a067 PMD bcaaa067 PTE 800000006dfc7060
[101712.639786] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[101712.640160] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) osc(OE) mdc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) loop mbcache jbd2 sha512_generic crypto_null rpcsec_gss_krb5 syscopyarea sysfillrect sysimgblt ttm drm_kms_helper ata_generic pata_acpi drm i2c_piix4 ata_piix serio_raw pcspkr i2c_core virtio_balloon virtio_console libata virtio_blk floppy nfsd ip_tables [last unloaded: libcfs]
[101712.645165] CPU: 3 PID: 15844 Comm: osp-pre-1-0 Tainted: G           OE  ------------   3.10.0-debug #1
[101712.646327] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[101712.646947] task: ffff88008f3f41c0 ti: ffff88007c3c4000 task.ti: ffff88007c3c4000
[101712.648175] RIP: 0010:[&amp;lt;ffffffff8138ea39&amp;gt;]  [&amp;lt;ffffffff8138ea39&amp;gt;] do_raw_spin_lock+0x9/0x150
[101712.649364] RSP: 0018:ffff88007c3c7cb0  EFLAGS: 00010096
[101712.649975] RAX: ffff88008f3f41c0 RBX: ffff88006dfc7950 RCX: 0000000000000000
[101712.651091] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88006dfc7950
[101712.652193] RBP: ffff88007c3c7cc8 R08: 0000000000000001 R09: 0000000000000000
[101712.653453] R10: 0000000000000000 R11: 000000000000000f R12: 0000000000000296
[101712.654575] R13: 0000000000000003 R14: 0000000000000001 R15: 0000000000000000
[101712.655989] FS:  0000000000000000(0000) GS:ffff8800bc6c0000(0000) knlGS:0000000000000000
[101712.657101] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[101712.657657] CR2: ffff88006dfc7954 CR3: 0000000001c0e000 CR4: 00000000000006e0
[101712.658676] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[101712.659685] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[101712.660691] Stack:
[101712.661165]  ffff88006dfc7950 0000000000000296 0000000000000003 ffff88007c3c7cf0
[101712.662216]  ffffffff81706b5c ffffffff810af503 ffff88006dfc7950 ffffffff00000000
[101712.663304]  ffff88007c3c7d28 ffffffff810af503 ffff88008f3f41c0 ffffffff00000000
[101712.664346] Call Trace:
[101712.664841]  [&amp;lt;ffffffff81706b5c&amp;gt;] _raw_spin_lock_irqsave+0x5c/0x70
[101712.665405]  [&amp;lt;ffffffff810af503&amp;gt;] ? __wake_up+0x23/0x50
[101712.665943]  [&amp;lt;ffffffff810af503&amp;gt;] __wake_up+0x23/0x50
[101712.666495]  [&amp;lt;ffffffffa0d00efe&amp;gt;] osp_precreate_thread+0x2be/0x1230 [osp]
[101712.667061]  [&amp;lt;ffffffff810af941&amp;gt;] ? finish_task_switch+0x81/0x180
[101712.667631]  [&amp;lt;ffffffff810b7ce0&amp;gt;] ? wake_up_state+0x20/0x20
[101712.668177]  [&amp;lt;ffffffffa0d00c40&amp;gt;] ? osp_init_pre_fid+0x5f0/0x5f0 [osp]
[101712.668787]  [&amp;lt;ffffffff810a2eda&amp;gt;] kthread+0xea/0xf0
[101712.669342]  [&amp;lt;ffffffff810a2df0&amp;gt;] ? kthread_create_on_node+0x140/0x140
[101712.669915]  [&amp;lt;ffffffff8170fbd8&amp;gt;] ret_from_fork+0x58/0x90
[101712.670530]  [&amp;lt;ffffffff810a2df0&amp;gt;] ? kthread_create_on_node+0x140/0x140
[101712.671095] Code: 48 89 03 48 c7 c0 ff ff ff ff 48 89 43 10 89 43 0c 5b 41 5c 41 5d 5d c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 55 41 54 53 &amp;lt;81&amp;gt; 7f 04 ad 4e ad de 48 89 fb 0f 85 0b 01 00 00 65 48 8b 04 25 
[101712.674005] RIP  [&amp;lt;ffffffff8138ea39&amp;gt;] do_raw_spin_lock+0x9/0x150
[101712.674575]  RSP &amp;lt;ffff88007c3c7cb0&amp;gt;
[101712.675079] CR2: ffff88006dfc7954
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;(gdb) l *(osp_precreate_thread+0x2ba)
0x11f2a is in osp_precreate_thread (/home/green/git/lustre-release/lustre/osp/osp_precreate.c:1268).
1263			}
1264		}
1265	
1266		thread-&amp;gt;t_flags = SVC_STOPPED;
1267		lu_env_fini(&amp;amp;env);
1268		wake_up(&amp;amp;thread-&amp;gt;t_ctl_waitq);
1269	
1270		RETURN(0);
1271	}
1272
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;It looks like this is use after free and the osp device got freed under us because it is not apparently refcounted by the osp thread, or is it?&lt;br/&gt;
Would lu env pin it and so the lu_env_fini should be moved after the wake_up call?&lt;br/&gt;
or should the refcounting be added I wonder?&lt;/p&gt;</description>
                <environment></environment>
        <key id="42641">LU-8968</key>
            <summary>Use after free in osp_precreate_thread()</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="green">Oleg Drokin</reporter>
                        <labels>
                    </labels>
                <created>Fri, 23 Dec 2016 05:27:34 +0000</created>
                <updated>Tue, 12 Nov 2019 07:42:37 +0000</updated>
                                            <version>Lustre 2.10.0</version>
                    <version>Lustre 2.11.0</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>4</watches>
                                                                            <comments>
                            <comment id="214904" author="green" created="Wed, 29 Nov 2017 05:57:23 +0000"  >&lt;p&gt;Still a regular problem.&lt;/p&gt;</comment>
                            <comment id="258090" author="gerrit" created="Mon, 11 Nov 2019 16:22:38 +0000"  >&lt;p&gt;Alex Zhuravlev (bzzz@whamcloud.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/36730&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/36730&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-8968&quot; title=&quot;Use after free in osp_precreate_thread()&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-8968&quot;&gt;LU-8968&lt;/a&gt; osp: protect t_flags at stopping&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 920400e965e2c969778402556bbc184c7851047c&lt;/p&gt;</comment>
                            <comment id="258141" author="bzzz" created="Tue, 12 Nov 2019 07:42:15 +0000"  >&lt;p&gt;while the patch above does seem to cure the problem I doubt it&apos;s a correct solution.&lt;br/&gt;
at the moment I&apos;m not sure how the correct solution should look like.&lt;br/&gt;
so basically the problem is that osp_statfs_fini(), running in umount thread, finds t_flags=SVC_STOPPED before schedule() in wait_event() and just keep going toward freeing structure containing opd_pre_thread.&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
th1, umount                           th2,osp_pcrecreate_thread()
t_flags=SVC_STOPPING
wake_up()
&amp;lt;--- e.g. interrupt ---&amp;gt;        t_flags=SVC_STOPPED
                                              &amp;lt;---- e.g. interuupt ---&amp;gt;
wait_event()
kfree(opd_pre_thread)
                                               wake_up(opd_pre_thread.t_ctl_waitq) -&amp;gt; use-after-free
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;something like refcounter in opd_pre_thread or kfree() via RCU should help, I guess.&lt;br/&gt;
any other option?&lt;/p&gt;

&lt;p&gt;also notice, OSP isn&apos;t the only place using this technique. in many places this problem seem to be masked by cycles to kfree()&lt;/p&gt;
</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzyz7j:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>