<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:30:23 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-3032] crash in ll_ping</title>
                <link>https://jira.whamcloud.com/browse/LU-3032</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Running conf-sanity.sh in a loop I hit this crash:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[ 6625.425296] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 6625.425755] last sysfs file: /sys/devices/system/cpu/possible
[ 6625.426029] CPU 1 
[ 6625.426068] Modules linked in: ptlrpc(-) obdclass lvfs ksocklnd lnet libcfs exportfs jbd sha512_generic sha256_generic ext4 mbcache jbd2 virtio_balloon virtio_console i2c_piix4 i2c_core virtio_blk virtio_net virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod nfs lockd fscache nfs_acl auth_rpcgss sunrpc be2iscsi bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: fld]
[ 6625.429023] 
[ 6625.429231] Pid: 4991, comm: ll_ping Not tainted 2.6.32-debug #6 Bochs Bochs
[ 6625.429247] RIP: 0010:[&amp;lt;ffffffff8104d2e6&amp;gt;]  [&amp;lt;ffffffff8104d2e6&amp;gt;] __wake_up_common+0x56/0x90
[ 6625.429247] RSP: 0018:ffff88008f9e5de0  EFLAGS: 00010082
[ 6625.429247] RAX: ffffffffa0d963fa RBX: ffff880098e26f68 RCX: 0000000000000000
[ 6625.429247] RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffffffffa0d963fa
[ 6625.429247] RBP: ffff88008f9e5e20 R08: 0000000000000000 R09: 000000000000005c
[ 6625.429247] R10: 0000000000000001 R11: 0000000000000000 R12: c284e8fffb9889d0
[ 6625.429247] R13: 00000000ffffb502 R14: 0000000000000000 R15: 0000000000000000
[ 6625.429247] FS:  00007f7026720700(0000) GS:ffff880006240000(0000) knlGS:0000000000000000
[ 6625.429247] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 6625.429247] CR2: 000000000089ee20 CR3: 00000000905c0000 CR4: 00000000000006e0
[ 6625.429247] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 6625.429247] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 6625.429247] Process ll_ping (pid: 4991, threadinfo ffff88008f9e4000, task ffff8800af3ea5c0)
[ 6625.429247] Stack:
[ 6625.429247]  ffffffffffffffff 0000000300000001 0000000000000001 ffff880098e26f68
[ 6625.429247] &amp;lt;d&amp;gt; 0000000000000282 0000000000000003 0000000000000001 0000000000000000
[ 6625.429247] &amp;lt;d&amp;gt; ffff88008f9e5e60 ffffffff81051f68 ffff88008f9e5e40 0000000000000001
[ 6625.429247] Call Trace:
[ 6625.429247]  [&amp;lt;ffffffff81051f68&amp;gt;] __wake_up+0x48/0x70
[ 6625.429247]  [&amp;lt;ffffffffa0acf7fa&amp;gt;] cfs_waitq_signal+0x1a/0x20 [libcfs]
[ 6625.429247]  [&amp;lt;ffffffffa0d5f73f&amp;gt;] ptlrpc_pinger_main+0x5cf/0x7f0 [ptlrpc]
[ 6625.429247]  [&amp;lt;ffffffff81057d60&amp;gt;] ? default_wake_function+0x0/0x20
[ 6625.429247]  [&amp;lt;ffffffffa0d5f170&amp;gt;] ? ptlrpc_pinger_main+0x0/0x7f0 [ptlrpc]
[ 6625.429247]  [&amp;lt;ffffffff8100c14a&amp;gt;] child_rip+0xa/0x20
[ 6625.429247]  [&amp;lt;ffffffffa0d5f170&amp;gt;] ? ptlrpc_pinger_main+0x0/0x7f0 [ptlrpc]
[ 6625.429247]  [&amp;lt;ffffffffa0d5f170&amp;gt;] ? ptlrpc_pinger_main+0x0/0x7f0 [ptlrpc]
[ 6625.429247]  [&amp;lt;ffffffff8100c140&amp;gt;] ? child_rip+0x0/0x20
[ 6625.429247] Code: e8 18 48 39 c7 4c 8b 60 18 74 3d 49 83 ec 18 eb 0b 0f 1f 40 00 4c 89 e0 4c 8d 62 e8 44 8b 28 4c 89 f1 44 89 fa 8b 75 cc 48 89 c7 &amp;lt;ff&amp;gt; 50 10 85 c0 74 0c 41 83 e5 01 74 06 83 6d c8 01 74 0a 4c 39 
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;test output had this:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;== conf-sanity test 22: start a client before osts (should return errs) == 03:20:55 (1364282455)
start mds service on centos6-14.localnet
Starting mds1:   -o loop /tmp/lustre-mdt1 /mnt/mds1
Started lustre-MDT0000
Client mount with ost in logs, but none running
start ost1 service on centos6-14.localnet
Starting ost1:   -o loop /tmp/lustre-ost1 /mnt/ost1
Started lustre-OST0000
error: list_param: /proc/{fs,sys}/{lnet,lustre}/osc/lustre-OST0000-osc-MDT0000/ost_server_uuid: Found no match
error: list_param: /proc/{fs,sys}/{lnet,lustre}/osc/lustre-OST0000-osc-MDT0000/ost_server_uuid: Found no match
Stopping client centos6-14.localnet /mnt/lustre (opts:)
PASS 
Client mount with a running ost
start ost1 service on centos6-14.localnet
Starting ost1:   -o loop /tmp/lustre-ost1 /mnt/ost1
Started lustre-OST0000
mount lustre on /mnt/lustre.....
Starting client: centos6-14.localnet: -o user_xattr,flock centos6-14.localnet@tcp:/lustre /mnt/lustre
centos6-14.localnet: osc.lustre-OST0000-osc-MDT0000.ost_server_uuid in FULL state after 14 sec
centos6-14.localnet: osc.lustre-OST0000-osc-MDT0000.ost_server_uuid in FULL state after 0 sec
setup single mount lustre success
PASS 
umount lustre on /mnt/lustre.....
Stopping client centos6-14.localnet /mnt/lustre (opts:)
stop ost1 service on centos6-14.localnet
Stopping /mnt/ost1 (opts:-f) on centos6-14.localnet
waited 0 for 10 ST ost OSS OSS_uuid 0
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Crashdump and modules are in /exports/crashdumps/192.168.10.224-2013-03-26-03\:21\:55/&lt;/p&gt;</description>
                <environment></environment>
        <key id="18094">LU-3032</key>
            <summary>crash in ll_ping</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="dmiter">Dmitry Eremin</assignee>
                                    <reporter username="green">Oleg Drokin</reporter>
                        <labels>
                    </labels>
                <created>Tue, 26 Mar 2013 14:46:04 +0000</created>
                <updated>Thu, 12 Sep 2013 23:50:50 +0000</updated>
                            <resolved>Thu, 6 Jun 2013 07:28:58 +0000</resolved>
                                    <version>Lustre 2.4.0</version>
                                    <fixVersion>Lustre 2.4.1</fixVersion>
                    <fixVersion>Lustre 2.5.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>3</watches>
                                                                            <comments>
                            <comment id="54986" author="liwei" created="Thu, 28 Mar 2013 05:02:25 +0000"  >&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[ 6625.426068] Modules linked in: ptlrpc(-) [...]
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The ptlrpc kernel module was being removed.  ptlrpc_stop_pinger() must have been called.  The RIP was at&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;linux-2.6.32-279.2.1.el6-debug/kernel/sched.c: 6182
0xffffffff8104d2da &amp;lt;__wake_up_common+74&amp;gt;:       mov    %r14,%rcx
0xffffffff8104d2dd &amp;lt;__wake_up_common+77&amp;gt;:       mov    %r15d,%edx
0xffffffff8104d2e0 &amp;lt;__wake_up_common+80&amp;gt;:       mov    -0x34(%rbp),%esi
0xffffffff8104d2e3 &amp;lt;__wake_up_common+83&amp;gt;:       mov    %rax,%rdi
0xffffffff8104d2e6 &amp;lt;__wake_up_common+86&amp;gt;:       callq  *0x10(%rax)
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;From the &quot;0x10&quot; offset, I think RAX contains the address of &quot;curr&quot;, which seems to contain arbitrary data (e.g., flags):&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;$4 = {
  flags = 4294948098, 
  private = 0x33e8fffdd6f8e8ff, 
  func = 0xfffc803ee8fff8ce, 
  task_list = {
    next = 0xc284e8fffb9889e8, 
    prev = 0xc9fffacfbfe8fffb
  }
}
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I suspect the following race happened, resulting a use-after-free situation:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;~ ptlrpc_stop_pinger()                          ~ ptlrpc_pinger_main()
-----------------------------------------------------------------------------
thread_set_flags(SVC_STOPPING)
cfs_waitq_signal()
mutex_unlock()                                  ...
                                                thread_set_flags(SVC_STOPPED)
l_wait_event(thread_is_stopped): Did not sleep
OBD_FREE_PTR(pinger_thread)
                                                cfs_waitq_signal()
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The memory used by pinger_thread might have been freed and reallocated to something else, when ptlrpc_pinger_main() used it in cvs_waitq_signal().&lt;/p&gt;</comment>
                            <comment id="56201" author="dmiter" created="Fri, 12 Apr 2013 14:41:58 +0000"  >&lt;p&gt;I agree with investigation. I&apos;d like to propose a patch &lt;a href=&quot;http://review.whamcloud.com/6040&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/6040&lt;/a&gt;.&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzvmaf:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>7398</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>