<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:17:40 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-8453] Crash in class_manual_cleanup() on use after free OBD.</title>
                <link>https://jira.whamcloud.com/browse/LU-8453</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;I just recorded a following crash in my testing:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[127672.042005] Lustre: DEBUG MARKER: == recovery-small test 132: long punch =============================================================== 11:06:23 (1469977583)
[127672.479285] Lustre: Mounted lustre-client
[127672.479815] Lustre: Skipped 1 previous similar message
[127745.047388] LNet: Service thread pid 22124 was inactive for 72.05s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
[127745.064157] Pid: 22124, comm: ll_ost_io01_006
[127745.064689] 
Call Trace:
[127745.065676]  [&amp;lt;ffffffff81704339&amp;gt;] schedule+0x29/0x70
[127745.066206]  [&amp;lt;ffffffff81700972&amp;gt;] schedule_timeout+0x162/0x2a0
[127745.066737]  [&amp;lt;ffffffff810879f0&amp;gt;] ? process_timeout+0x0/0x10
[127745.067287]  [&amp;lt;ffffffffa02401c3&amp;gt;] __cfs_fail_timeout_set+0xe3/0x170 [libcfs]
[127745.067844]  [&amp;lt;ffffffffa0d46c0e&amp;gt;] ofd_punch_hdl+0x11e/0xa10 [ofd]
[127745.080775]  [&amp;lt;ffffffffa05f7325&amp;gt;] tgt_request_handle+0x925/0x1330 [ptlrpc]
[127745.081379]  [&amp;lt;ffffffffa05a4921&amp;gt;] ptlrpc_server_handle_request+0x231/0xac0 [ptlrpc]
[127745.082402]  [&amp;lt;ffffffffa05a24d8&amp;gt;] ? ptlrpc_wait_event+0xb8/0x370 [ptlrpc]
[127745.082976]  [&amp;lt;ffffffffa05a8738&amp;gt;] ptlrpc_main+0xa58/0x1dd0 [ptlrpc]
[127745.084901]  [&amp;lt;ffffffffa05a7ce0&amp;gt;] ? ptlrpc_main+0x0/0x1dd0 [ptlrpc]
[127745.085451]  [&amp;lt;ffffffff810a2eda&amp;gt;] kthread+0xea/0xf0
[127745.085952]  [&amp;lt;ffffffff810a2df0&amp;gt;] ? kthread+0x0/0xf0
[127745.086495]  [&amp;lt;ffffffff8170fbd8&amp;gt;] ret_from_fork+0x58/0x90
[127745.087007]  [&amp;lt;ffffffff810a2df0&amp;gt;] ? kthread+0x0/0xf0
[127745.087522] 
[127745.087965] LustreError: dumping log to /tmp/lustre-log.1469977656.22124
[127792.994255] LustreError: 22124:0:(fail.c:137:__cfs_fail_timeout_set()) cfs_fail_timeout id 236 awake
[127792.996636] LNet: Service thread pid 22124 completed after 120.00s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
[127794.212798] LustreError: 6493:0:(obd_config.c:1262:class_process_config()) no device for: lustre-OST0000-osc-ffff8800655a9800
[127794.214028] BUG: unable to handle kernel paging request at ffff88000c218000
[127794.214770] IP: [&amp;lt;ffffffff8137fa54&amp;gt;] strnlen+0x24/0x40
[127794.215885] PGD 2e75067 PUD 2e76067 PMD bcdb9067 PTE 800000000c218060
[127794.216455] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[127794.216970] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) osc(OE) mdc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) loop mbcache jbd2 sha512_generic crypto_null rpcsec_gss_krb5 syscopyarea sysfillrect sysimgblt ttm ata_generic pata_acpi drm_kms_helper i2c_piix4 drm ata_piix floppy virtio_console libata virtio_blk i2c_core virtio_balloon pcspkr serio_raw nfsd ip_tables [last unloaded: libcfs]
[127794.221635] CPU: 7 PID: 6493 Comm: umount Tainted: G           OE  ------------   3.10.0-debug #1
[127794.225176] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[127794.225765] task: ffff88008d3b0940 ti: ffff880066bd8000 task.ti: ffff880066bd8000
[127794.228638] RIP: 0010:[&amp;lt;ffffffff8137fa54&amp;gt;]  [&amp;lt;ffffffff8137fa54&amp;gt;] strnlen+0x24/0x40
[127794.229695] RSP: 0018:ffff880066bdba08  EFLAGS: 00010212
[127794.230509] RAX: ffff88000c218000 RBX: ffff88008d39d3fc RCX: fffffffffffffffe
[127794.231624] RDX: 0000000000001980 RSI: ffffffffffffffff RDI: ffff88000c216680
[127794.232753] RBP: ffff880066bdba08 R08: 000000000000ffff R09: 000000000000ffff
[127794.236251] R10: 0000000000000000 R11: ffff880066bdb9ce R12: ffff88000c216680
[127794.237295] R13: ffff88008d39e000 R14: 00000000ffffffff R15: 0000000000000000
[127794.238825] FS:  00007fc07d02a880(0000) GS:ffff8800bc7c0000(0000) knlGS:0000000000000000
[127794.239945] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[127794.240535] CR2: ffff88000c218000 CR3: 000000003e53e000 CR4: 00000000000006e0
[127794.241721] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[127794.242733] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[127794.243846] Stack:
[127794.244433]  ffff880066bdba40 ffffffff81381f0b ffff88008d39d3fc ffff88008d39e000
[127794.246021]  ffff880066bdbb50 ffffffffa03a6937 ffffffffa03a6937 ffff880066bdbab0
[127794.247081]  ffffffff813836a1 00000000579effff 000000001ab70886 0000000000000002
[127794.248371] Call Trace:
[127794.262160]  [&amp;lt;ffffffff81381f0b&amp;gt;] string.isra.7+0x3b/0xf0
[127794.263031]  [&amp;lt;ffffffff813836a1&amp;gt;] vsnprintf+0x201/0x6a0
[127794.263722]  [&amp;lt;ffffffffa024214b&amp;gt;] libcfs_debug_vmsg2+0x31b/0xc10 [libcfs]
[127794.264364]  [&amp;lt;ffffffffa0242a97&amp;gt;] ? libcfs_debug_msg+0x57/0x80 [libcfs]
[127794.265366]  [&amp;lt;ffffffffa023f6e3&amp;gt;] ? libcfs_log_goto+0x23/0x30 [libcfs]
[127794.266180]  [&amp;lt;ffffffffa03634fc&amp;gt;] ? class_process_config+0x1b9c/0x3110 [obdclass]
[127794.267557]  [&amp;lt;ffffffffa0242a97&amp;gt;] libcfs_debug_msg+0x57/0x80 [libcfs]
[127794.268426]  [&amp;lt;ffffffffa0365099&amp;gt;] class_manual_cleanup+0x629/0xab0 [obdclass]
[127794.270864]  [&amp;lt;ffffffffa0dc34c8&amp;gt;] ll_put_super+0x128/0x990 [lustre]
[127794.271749]  [&amp;lt;ffffffff8112b4ed&amp;gt;] ? call_rcu_sched+0x1d/0x20
[127794.272278]  [&amp;lt;ffffffffa0ded2ec&amp;gt;] ? ll_destroy_inode+0x1c/0x20 [lustre]
[127794.272887]  [&amp;lt;ffffffff8120a3c8&amp;gt;] ? destroy_inode+0x38/0x60
[127794.274341]  [&amp;lt;ffffffff8120a4f6&amp;gt;] ? evict+0x106/0x170
[127794.275097]  [&amp;lt;ffffffff8120a59e&amp;gt;] ? dispose_list+0x3e/0x50
[127794.275868]  [&amp;lt;ffffffff8120b244&amp;gt;] ? evict_inodes+0x114/0x140
[127794.276390]  [&amp;lt;ffffffff811efa66&amp;gt;] generic_shutdown_super+0x56/0xe0
[127794.277036]  [&amp;lt;ffffffff811efe42&amp;gt;] kill_anon_super+0x12/0x20
[127794.277765]  [&amp;lt;ffffffffa03689b5&amp;gt;] lustre_kill_super+0x45/0x50 [obdclass]
[127794.278808]  [&amp;lt;ffffffff811f0349&amp;gt;] deactivate_locked_super+0x49/0x60
[127794.279437]  [&amp;lt;ffffffff811f0946&amp;gt;] deactivate_super+0x46/0x60
[127794.280373]  [&amp;lt;ffffffff8120f135&amp;gt;] mntput_no_expire+0xc5/0x120
[127794.281073]  [&amp;lt;ffffffff812102bf&amp;gt;] SyS_umount+0x9f/0x3c0
[127794.281674]  [&amp;lt;ffffffff8170fc89&amp;gt;] system_call_fastpath+0x16/0x1b
[127794.282515] Code: 1f 84 00 00 00 00 00 55 48 85 f6 48 8d 4e ff 48 89 e5 74 2a 80 3f 00 74 25 48 89 f8 31 d2 eb 10 0f 1f 80 00 00 00 00 48 83 c2 01 &amp;lt;80&amp;gt; 38 00 74 09 48 83 c0 01 48 39 d1 75 ee 48 29 f8 5d c3 31 c0 
[127794.284816] RIP  [&amp;lt;ffffffff8137fa54&amp;gt;] strnlen+0x24/0x40
[127794.285410]  RSP &amp;lt;ffff880066bdba08&amp;gt;
[127794.285886] CR2: ffff88000c218000
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The crash location is in:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;(gdb) l *(class_manual_cleanup+0x620)
0x42090 is in class_manual_cleanup (/home/green/git/lustre-release/lustre/obdclass/obd_config.c:1943).
1938		if (lcfg == NULL)
1939			RETURN(-ENOMEM);
1940	
1941	        rc = class_process_config(lcfg);
1942	        if (rc) {
1943	                CERROR(&quot;cleanup failed %d: %s\n&quot;, rc, obd-&amp;gt;obd_name);
1944	                GOTO(out, rc);
1945	        }
1946	
1947	        /* the lcfg is almost the same for both ops */
(gdb) quit
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;So it appears that when we entered this function, obd was still valid, then somebody freed it and we indeed see that class_process_config that was called from inside already cannot find this obd.&lt;/p&gt;

&lt;p&gt;The callsite for this is in ll_put_super and does not look safe:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;        &lt;span class=&quot;code-keyword&quot;&gt;while&lt;/span&gt; ((obd = class_devices_in_group(&amp;amp;sbi-&amp;gt;ll_sb_uuid, &amp;amp;next)) !=NULL) {
                class_manual_cleanup(obd);
        }
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;So this walks OBDs without any refcounting? does not look good,&lt;br/&gt;
though I am not sure what would be the other caller that would drop the final obd recount as is.&lt;/p&gt;

&lt;p&gt;Crashump is available:&lt;br/&gt;
/exports/crashdumps/192.168.10.223-2016-07-31-11:08:32&lt;br/&gt;
tag in my source tree: master-20160731&lt;/p&gt;</description>
                <environment></environment>
        <key id="38510">LU-8453</key>
            <summary>Crash in class_manual_cleanup() on use after free OBD.</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="green">Oleg Drokin</reporter>
                        <labels>
                    </labels>
                <created>Sun, 31 Jul 2016 23:47:25 +0000</created>
                <updated>Sun, 31 Jul 2016 23:47:25 +0000</updated>
                                                                                <due></due>
                            <votes>0</votes>
                                    <watches>1</watches>
                                                                                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzyj1b:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>