<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:32:37 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-10163] kernel NULL pointer dereference on heavy load</title>
                <link>https://jira.whamcloud.com/browse/LU-10163</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;We have experienced both client and server crashes when running lustre 2.10.0. I first noticed this after upgrading our servers to 2.10.0 and had a client crash a couple of times when doing some stress tests. At the time, I was still running a 2.9.0 client. I also found this thread, which appears related.&lt;/p&gt;


&lt;p&gt;&lt;a href=&quot;http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/2017-August/014698.html&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/2017-August/014698.html&lt;/a&gt;&lt;/p&gt;


&lt;p&gt;Lately, our servers have started crashing too - we&apos;ve had about 4 or 5 crashes in the last week on various OSS&apos;s. Our clients are almost all updated to 2.10.0 or 2.10.1 now. I&apos;ve included than exerpt from the dmesg obtained from the latest server crash and I&apos;ll upload the entire vmcore-dmesg.txt shortly. Previous to this, the server crash dumps weren&apos;t working properly so this is the only server crash dump we have. I do have a couple of client crash dumps. Our current server configuration is:&lt;/p&gt;


&lt;p&gt;centOS 7.3&lt;br/&gt;
kernel 3.10.0-514.26.2.el7.x86_64&lt;br/&gt;
lustre 2.10.0&lt;br/&gt;
zfs-0.7.1-1&lt;/p&gt;


&lt;p&gt;Let me know if you need any other info. We are upgrading to lustre 2.10.1 now in the hopes this is already found and fixed. I couldn&apos;t find a related LU but my apologies if this is a duplicate of another LU.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;91954.508837&amp;#93;&lt;/span&gt; BUG: unable to handle kernel NULL pointer dereference at (null)&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;91954.510562&amp;#93;&lt;/span&gt; IP: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff8168e99a&amp;gt;&amp;#93;&lt;/span&gt; _raw_spin_unlock+0xa/0x30&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;91954.512269&amp;#93;&lt;/span&gt; PGD 0&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;91954.513878&amp;#93;&lt;/span&gt; Oops: 0002 &lt;a href=&quot;#1&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;1&lt;/a&gt; SMP&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;91954.515491&amp;#93;&lt;/span&gt; Modules linked in: osp(OE) ofd(OE) lfsck(OE) ost(OE) mgc(OE) osd_zfs(OE) lquota(OE) fid(OE) fld(OE) ptlrpc(OE) obdclass(OE) nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache ko2iblnd(OE) ksocklnd(OE) lnet(OE) libcfs(OE) rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx4_ib ib_core iTCO_wdt iTCO_vendor_support zfs(POE) zunicode(POE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) zlib_deflate intel_powerclamp coretemp kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd mei_me mei pcspkr sb_edac i2c_i801 edac_core lpc_ich ioatdma dm_service_time ses enclosure sg ipmi_devintf&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;91954.525584&amp;#93;&lt;/span&gt; ipmi_si ipmi_msghandler shpchp wmi dm_multipath nfsd auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mlx4_en mgag200 drm_kms_helper syscopyarea sysfillrect crct10dif_pclmul sysimgblt crct10dif_common fb_sys_fops crc32c_intel ttm isci igb mlx4_core ahci megaraid_sas drm libsas libahci mpt3sas ptp pps_core libata dca raid_class i2c_algo_bit scsi_transport_sas i2c_core devlink fjes dm_mirror dm_region_hash dm_log dm_mod&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;91954.534280&amp;#93;&lt;/span&gt; CPU: 14 PID: 22556 Comm: socknal_sd01_01 Tainted: P OE ------------ 3.10.0-514.26.2.el7.x86_64 #1&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;91954.536043&amp;#93;&lt;/span&gt; Hardware name: Supermicro SYS-6027TR-D71FRF/X9DRT, BIOS 3.2a 08/04/2015&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;91954.537761&amp;#93;&lt;/span&gt; task: ffff882008e8ce70 ti: ffff88203ccbc000 task.ti: ffff88203ccbc000&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;91954.539471&amp;#93;&lt;/span&gt; RIP: 0010:&lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff8168e99a&amp;gt;&amp;#93;&lt;/span&gt; &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff8168e99a&amp;gt;&amp;#93;&lt;/span&gt; _raw_spin_unlock+0xa/0x30&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;91954.541242&amp;#93;&lt;/span&gt; RSP: 0018:ffff88203ccbfd08 EFLAGS: 00010202&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;91954.542917&amp;#93;&lt;/span&gt; RAX: ffff8820362f1f30 RBX: ffff8820362f1dc0 RCX: 0000000000000000&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;91954.544572&amp;#93;&lt;/span&gt; RDX: 000000000000ab0c RSI: 0000000000005587 RDI: 0000000000000000&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;91954.546214&amp;#93;&lt;/span&gt; RBP: ffff88203ccbfd20 R08: ffff8820377d3e80 R09: 0000000000000001&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;91954.547818&amp;#93;&lt;/span&gt; R10: 0000000000000400 R11: 0000000000000800 R12: 000000000002ac38&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;91954.549427&amp;#93;&lt;/span&gt; R13: ffff882037691600 R14: ffff882007f6d074 R15: ffff881dace51810&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;91954.550999&amp;#93;&lt;/span&gt; FS: 0000000000000000(0000) GS:ffff88207fd80000(0000) knlGS:0000000000000000&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;91954.552552&amp;#93;&lt;/span&gt; CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;91954.554081&amp;#93;&lt;/span&gt; CR2: 0000000000000000 CR3: 00000000019be000 CR4: 00000000001407e0&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;91954.555593&amp;#93;&lt;/span&gt; DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;91954.557079&amp;#93;&lt;/span&gt; DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;91954.558527&amp;#93;&lt;/span&gt; Stack:&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;91954.560007&amp;#93;&lt;/span&gt; ffffffffa098d8b6 ffff881f038c7000 ffff882007f6d000 ffff88203ccbfd60&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;91954.561472&amp;#93;&lt;/span&gt; ffffffffa0a11841 ffff881dace51800 ffff881f038c7000 0000000000000001&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;91954.562870&amp;#93;&lt;/span&gt; ffff88203cea2100 0000000000000000 ffff881f038c7010 ffff88203ccbfd90&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;91954.564260&amp;#93;&lt;/span&gt; Call Trace:&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;91954.565615&amp;#93;&lt;/span&gt; &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa098d8b6&amp;gt;&amp;#93;&lt;/span&gt; ? cfs_percpt_unlock+0x36/0xc0 &lt;span class=&quot;error&quot;&gt;&amp;#91;libcfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;91954.566966&amp;#93;&lt;/span&gt; &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0a11841&amp;gt;&amp;#93;&lt;/span&gt; lnet_return_tx_credits_locked+0x211/0x480 &lt;span class=&quot;error&quot;&gt;&amp;#91;lnet&amp;#93;&lt;/span&gt;&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;91954.568305&amp;#93;&lt;/span&gt; &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0a04770&amp;gt;&amp;#93;&lt;/span&gt; lnet_msg_decommit+0xd0/0x6c0 &lt;span class=&quot;error&quot;&gt;&amp;#91;lnet&amp;#93;&lt;/span&gt;&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;91954.569604&amp;#93;&lt;/span&gt; &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0a050f9&amp;gt;&amp;#93;&lt;/span&gt; lnet_finalize+0x1e9/0x690 &lt;span class=&quot;error&quot;&gt;&amp;#91;lnet&amp;#93;&lt;/span&gt;&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;91954.570876&amp;#93;&lt;/span&gt; &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0a90f45&amp;gt;&amp;#93;&lt;/span&gt; ksocknal_tx_done+0x85/0x1c0 &lt;span class=&quot;error&quot;&gt;&amp;#91;ksocklnd&amp;#93;&lt;/span&gt;&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;91954.572149&amp;#93;&lt;/span&gt; &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0a95bb4&amp;gt;&amp;#93;&lt;/span&gt; ksocknal_scheduler+0x234/0x670 &lt;span class=&quot;error&quot;&gt;&amp;#91;ksocklnd&amp;#93;&lt;/span&gt;&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;91954.573381&amp;#93;&lt;/span&gt; &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff810b1b20&amp;gt;&amp;#93;&lt;/span&gt; ? wake_up_atomic_t+0x30/0x30&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;91954.574584&amp;#93;&lt;/span&gt; &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0a95980&amp;gt;&amp;#93;&lt;/span&gt; ? ksocknal_recv+0x2a0/0x2a0 &lt;span class=&quot;error&quot;&gt;&amp;#91;ksocklnd&amp;#93;&lt;/span&gt;&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;91954.575765&amp;#93;&lt;/span&gt; &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff810b0a4f&amp;gt;&amp;#93;&lt;/span&gt; kthread+0xcf/0xe0&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;91954.576947&amp;#93;&lt;/span&gt; &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff810b0980&amp;gt;&amp;#93;&lt;/span&gt; ? kthread_create_on_node+0x140/0x140&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;91954.578095&amp;#93;&lt;/span&gt; &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff81697758&amp;gt;&amp;#93;&lt;/span&gt; ret_from_fork+0x58/0x90&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;91954.579209&amp;#93;&lt;/span&gt; &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff810b0980&amp;gt;&amp;#93;&lt;/span&gt; ? kthread_create_on_node+0x140/0x140&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;91954.580348&amp;#93;&lt;/span&gt; Code: 90 8d 8a 00 00 02 00 89 d0 f0 0f b1 0f 39 d0 75 ea b8 01 00 00 00 5d c3 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 0f 1f 44 00 00 &amp;lt;66&amp;gt; 83 07 02 c3 90 8b 37 f0 66 83 07 02 f6 47 02 01 74 f1 55 48&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;91954.582717&amp;#93;&lt;/span&gt; RIP &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff8168e99a&amp;gt;&amp;#93;&lt;/span&gt; _raw_spin_unlock+0xa/0x30&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;91954.583886&amp;#93;&lt;/span&gt; RSP &amp;lt;ffff88203ccbfd08&amp;gt;&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;91954.585020&amp;#93;&lt;/span&gt; CR2: 0000000000000000&lt;/p&gt;</description>
                <environment></environment>
        <key id="48958">LU-10163</key>
            <summary>kernel NULL pointer dereference on heavy load</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="5">Cannot Reproduce</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="dvicker">Darby Vicker</reporter>
                        <labels>
                    </labels>
                <created>Wed, 25 Oct 2017 20:15:37 +0000</created>
                <updated>Thu, 1 Mar 2018 05:09:09 +0000</updated>
                            <resolved>Wed, 20 Dec 2017 22:05:07 +0000</resolved>
                                    <version>Lustre 2.10.0</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>3</watches>
                                                                            <comments>
                            <comment id="212013" author="pjones" created="Wed, 25 Oct 2017 22:58:44 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9817&quot; title=&quot;Multi-Rail Crash on message free&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9817&quot;&gt;&lt;del&gt;LU-9817&lt;/del&gt;&lt;/a&gt; perhaps? If so it should disappear once you complete the upgrade from 2.10.0 to 2.10.1&lt;/p&gt;</comment>
                            <comment id="212023" author="dvicker" created="Thu, 26 Oct 2017 01:32:49 +0000"  >&lt;p&gt;Well, we completed the update to 2.10.1 this afternoon so I hope so!&#160; Looking at the patch in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9817&quot; title=&quot;Multi-Rail Crash on message free&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9817&quot;&gt;&lt;del&gt;LU-9817&lt;/del&gt;&lt;/a&gt;, it touches codes in&#160;lnet_return_tx_credits_locked, which is indeed in our stack trace too.&#160; Does it make sense that bug could crash a client or server?&#160; I think so since its in lnet but just wanted to make sure.&#160;&#160;&lt;/p&gt;</comment>
                            <comment id="216872" author="jhammond" created="Wed, 20 Dec 2017 18:57:09 +0000"  >&lt;p&gt;&amp;gt; Does it make sense that bug could crash a client or server?&lt;/p&gt;

&lt;p&gt;Yes, it does. Have you seen this crash since upgrading?&lt;/p&gt;</comment>
                            <comment id="216899" author="dvicker" created="Wed, 20 Dec 2017 22:02:35 +0000"  >&lt;p&gt;Yes, these crashes went away after upgrading to 2.10.1.&#160;&#160;&lt;/p&gt;</comment>
                            <comment id="216900" author="pjones" created="Wed, 20 Dec 2017 22:05:07 +0000"  >&lt;p&gt;good news!&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="47613">LU-9817</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="28545" name="vmcore-dmesg.txt" size="221265" author="dvicker" created="Wed, 25 Oct 2017 20:17:45 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzzmhr:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>