<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:13:57 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-8022] LNet: BUG: unable to handle kernel NULL pointer dereference</title>
                <link>https://jira.whamcloud.com/browse/LU-8022</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Error happened during soak testing of build &apos;20160413&apos; (see &lt;a href=&quot;https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20160413&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20160413&lt;/a&gt;).  DNE is enabled. OST have been formatted using &lt;em&gt;zfs, MDTs using _ldiskfs&lt;/em&gt;. OSS and MDT nodes are configured in HA active-active failover configuration.&lt;/p&gt;

&lt;p&gt;During system boot, a MDS node that had been restarted, crashed with the following error during LNet initialization:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Lustre: Lustre: Build Version: 2.8.51_28_gba2ac35
LNetError: 3247:0:(o2iblnd_cb.c:2310:kiblnd_passive_connect()) Can&apos;t accept conn from 192.168.1.108@o2ib10 on NA (ib0:0:192.168.1.110): bad dst ni
d 192.168.1.110@o2ib10
BUG: unable to handle kernel NULL pointer dereference
LNet: Added LNI 192.168.1.110@o2ib10 [8/256/0/180]
 at 0000000000000080
IP: [&amp;lt;ffffffffa0b861e6&amp;gt;] kiblnd_passive_connect+0x466/0x17e0 [ko2iblnd]
PGD 839067067 PUD 8383e1067 PMD 0 
Oops: 0000 [#1] SMP 
last sysfs file: /sys/module/lnet/initstate
CPU 0 
Modules linked in: ko2iblnd(U) ptlrpc(+)(U) obdclass(U) lnet(U) sha512_generic crc32c_intel libcfs(U) ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad r
dma_cm ib_cm iw_cm dm_round_robin dm_multipath microcode iTCO_wdt iTCO_vendor_support zfs(P)(U) zcommon(P)(U) znvpair(P)(U) spl(U) zlib_deflate za
vl(P)(U) zunicode(P)(U) sb_edac edac_core joydev lpc_ich mfd_core i2c_i801 ioatdma sg igb dca i2c_algo_bit i2c_core ext3 jbd mbcache sd_mod crc_t1
0dif ahci wmi isci libsas mpt2sas scsi_transport_sas raid_class mlx4_ib ib_sa ib_mad ib_core ib_addr ipv6 mlx4_en ptp pps_core mlx4_core dm_mirror
 dm_region_hash dm_log dm_mod scsi_dh_rdac [last unloaded: scsi_wait_scan]

Pid: 3247, comm: ib_cm/0 Tainted: P           -- ------------    2.6.32-573.22.1.el6_lustre.x86_64 #1 Intel Corporation SandyBridge Platform/To be
 filled by O.E.M.
RIP: 0010:[&amp;lt;ffffffffa0b861e6&amp;gt;]  [&amp;lt;ffffffffa0b861e6&amp;gt;] kiblnd_passive_connect+0x466/0x17e0 [ko2iblnd]
RSP: 0018:ffff8804318e7b20  EFLAGS: 00010246
RAX: 0000000000000008 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000012
RBP: ffff8804318e7be0 R08: 000000000001b9c2 R09: 00000000fffffffb
R10: 0000000000000003 R11: 0000000000000000 R12: ffff880835a6dc20
R13: ffffffffa0b92263 R14: ffff880432df7800 R15: ffffffffa06a1020
FS:  0000000000000000(0000) GS:ffff880038600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000080 CR3: 0000000835bd0000 CR4: 00000000000407f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ib_cm/0 (pid: 3247, threadinfo ffff8804318e4000, task ffff880431015520)
Stack:
 ffff880835a6dc20 ffffffffa06a1020 0000000000000004 ffff8808369be5d0
&amp;lt;d&amp;gt; ffff8804318e7b80 00000000814e7731 0005000ac0a8016c ffff880800000012
&amp;lt;d&amp;gt; 000300120be91b91 0000000000000000 0000100000000008 ffffffffa011bcbc
Call Trace:
 [&amp;lt;ffffffffa011bcbc&amp;gt;] ? ib_find_cached_gid+0xec/0x110 [ib_core]
 [&amp;lt;ffffffffa0b87c3d&amp;gt;] kiblnd_cm_callback+0x6dd/0x20e0 [ko2iblnd]
 [&amp;lt;ffffffffa034a011&amp;gt;] cma_req_handler+0x371/0x640 [rdma_cm]
 [&amp;lt;ffffffffa011692b&amp;gt;] ? rdma_port_get_link_layer+0x1b/0x60 [ib_core]
 [&amp;lt;ffffffffa0322b27&amp;gt;] cm_process_work+0x27/0x110 [ib_cm]
 [&amp;lt;ffffffffa0323735&amp;gt;] cm_req_handler+0x6b5/0xac0 [ib_cm]
 [&amp;lt;ffffffffa0324140&amp;gt;] ? cm_work_handler+0x0/0x1206 [ib_cm]
 [&amp;lt;ffffffffa0324275&amp;gt;] cm_work_handler+0x135/0x1206 [ib_cm]
 [&amp;lt;ffffffffa0324140&amp;gt;] ? cm_work_handler+0x0/0x1206 [ib_cm]
 [&amp;lt;ffffffff8109ab40&amp;gt;] worker_thread+0x170/0x2a0
 [&amp;lt;ffffffff810a1820&amp;gt;] ? autoremove_wake_function+0x0/0x40
 [&amp;lt;ffffffff8109a9d0&amp;gt;] ? worker_thread+0x0/0x2a0
 [&amp;lt;ffffffff810a138e&amp;gt;] kthread+0x9e/0xc0
 [&amp;lt;ffffffff8100c28a&amp;gt;] child_rip+0xa/0x20
 [&amp;lt;ffffffff810a12f0&amp;gt;] ? kthread+0x0/0xc0
 [&amp;lt;ffffffff8100c280&amp;gt;] ? child_rip+0x0/0x20
Code: e8 90 ff a6 ff 0f b7 95 78 ff ff ff 8b bd 78 ff ff ff 48 89 de 66 89 55 84 e8 27 46 00 00 83 bd 78 ff ff ff 11 66 89 45 90 74 10 &amp;lt;48&amp;gt; 8b 83 80 00 00 00 8b 50 1c 85 d2 89 d0 75 05 b8 00 01 00 00 
RIP  [&amp;lt;ffffffffa0b861e6&amp;gt;] kiblnd_passive_connect+0x466/0x17e0 [ko2iblnd]
 RSP &amp;lt;ffff8804318e7b20&amp;gt;
CR2: 0000000000000080
---[ end trace 01db8c57e9900e3f ]---
Kernel panic - not syncing: Fatal exception
Pid: 3247, comm: ib_cm/0 Tainted: P      D    -- ------------    2.6.32-573.22.1.el6_lustre.x86_64 #1
Call Trace:
 [&amp;lt;ffffffff815394d1&amp;gt;] ? panic+0xa7/0x16f
 [&amp;lt;ffffffff8153e2d4&amp;gt;] ? oops_end+0xe4/0x100
 [&amp;lt;ffffffff8104e8cb&amp;gt;] ? no_context+0xfb/0x260
 [&amp;lt;ffffffff8104eb55&amp;gt;] ? __bad_area_nosemaphore+0x125/0x1e0
 [&amp;lt;ffffffff8104ec23&amp;gt;] ? bad_area_nosemaphore+0x13/0x20
 [&amp;lt;ffffffff8104f31c&amp;gt;] ? __do_page_fault+0x30c/0x500
 [&amp;lt;ffffffff81336a9f&amp;gt;] ? extract_buf+0x9f/0x130
 [&amp;lt;ffffffff815401fe&amp;gt;] ? do_page_fault+0x3e/0xa0
 [&amp;lt;ffffffff8153d5a5&amp;gt;] ? page_fault+0x25/0x30
 [&amp;lt;ffffffffa0b861e6&amp;gt;] ? kiblnd_passive_connect+0x466/0x17e0 [ko2iblnd]
 [&amp;lt;ffffffffa011bcbc&amp;gt;] ? ib_find_cached_gid+0xec/0x110 [ib_core]
 [&amp;lt;ffffffffa0b87c3d&amp;gt;] ? kiblnd_cm_callback+0x6dd/0x20e0 [ko2iblnd]
 [&amp;lt;ffffffffa034a011&amp;gt;] ? cma_req_handler+0x371/0x640 [rdma_cm]
 [&amp;lt;ffffffffa011692b&amp;gt;] ? rdma_port_get_link_layer+0x1b/0x60 [ib_core]
 [&amp;lt;ffffffffa0322b27&amp;gt;] ? cm_process_work+0x27/0x110 [ib_cm]
 [&amp;lt;ffffffffa0323735&amp;gt;] ? cm_req_handler+0x6b5/0xac0 [ib_cm]
 [&amp;lt;ffffffffa0324140&amp;gt;] ? cm_work_handler+0x0/0x1206 [ib_cm]
 [&amp;lt;ffffffffa0324275&amp;gt;] ? cm_work_handler+0x135/0x1206 [ib_cm]
 [&amp;lt;ffffffffa0324140&amp;gt;] ? cm_work_handler+0x0/0x1206 [ib_cm]
 [&amp;lt;ffffffff8109ab40&amp;gt;] ? worker_thread+0x170/0x2a0
 [&amp;lt;ffffffff810a1820&amp;gt;] ? autoremove_wake_function+0x0/0x40
 [&amp;lt;ffffffff8109a9d0&amp;gt;] ? worker_thread+0x0/0x2a0
 [&amp;lt;ffffffff810a138e&amp;gt;] ? kthread+0x9e/0xc0
 [&amp;lt;ffffffff8100c28a&amp;gt;] ? child_rip+0xa/0x20
 [&amp;lt;ffffffff810a12f0&amp;gt;] ? kthread+0x0/0xc0
 [&amp;lt;ffffffff8100c280&amp;gt;] ? child_rip+0x0/0x20
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Unfortunately no crash dump was written. The only error message available was extracted from console log of the node affected (&lt;tt&gt;lola-10&lt;/tt&gt;). &lt;br/&gt;
Therefore only the console log of MDS has been attached.&lt;/p&gt;</description>
                <environment>lola&lt;br/&gt;
build: &lt;a href=&quot;https://build.hpdd.intel.com/job/lustre-master/3346&quot;&gt;https://build.hpdd.intel.com/job/lustre-master/3346&lt;/a&gt;</environment>
        <key id="36138">LU-8022</key>
            <summary>LNet: BUG: unable to handle kernel NULL pointer dereference</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="1" iconUrl="https://jira.whamcloud.com/images/icons/priorities/blocker.svg">Blocker</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="doug">Doug Oucharek</assignee>
                                    <reporter username="heckes">Frank Heckes</reporter>
                        <labels>
                            <label>soak</label>
                    </labels>
                <created>Thu, 14 Apr 2016 08:11:00 +0000</created>
                <updated>Wed, 27 Nov 2019 04:03:58 +0000</updated>
                            <resolved>Tue, 31 May 2016 12:52:48 +0000</resolved>
                                                    <fixVersion>Lustre 2.9.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>11</watches>
                                                                            <comments>
                            <comment id="149118" author="jgmitter" created="Fri, 15 Apr 2016 17:37:33 +0000"  >&lt;p&gt;Hi Doug,&lt;/p&gt;

&lt;p&gt;Could you have a look at this?&lt;/p&gt;

&lt;p&gt;Thanks.&lt;br/&gt;
Joe&lt;/p&gt;</comment>
                            <comment id="149121" author="doug" created="Fri, 15 Apr 2016 18:33:24 +0000"  >&lt;p&gt;Ok, I see two problems here:&lt;/p&gt;

&lt;p&gt;1- The network interface (NI) for the IB card seems to have &quot;disappeared&quot;.  Almost as if the device ib0 went down and was removed from our list of available NIs.  However, we still received a connection request to that NI and that is the failure being reported in the error log.&lt;br/&gt;
2- The failure path then tries to dereference the NULL NI pointer which causes the core dump.  That, of course, must be fixed.&lt;/p&gt;

&lt;p&gt;I am going to use this ticket to fix problem 2 (dereferencing the NULL NI pointer) so we don&apos;t crash.  I don&apos;t have enough info to address the first problem so will have to wait for that to be reproduced with this fix in place to prevent a core dump.&lt;/p&gt;</comment>
                            <comment id="149123" author="gerrit" created="Fri, 15 Apr 2016 18:43:10 +0000"  >&lt;p&gt;Doug Oucharek (doug.s.oucharek@intel.com) uploaded a new patch: &lt;a href=&quot;http://review.whamcloud.com/19614&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/19614&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-8022&quot; title=&quot;LNet: BUG: unable to handle kernel NULL pointer dereference&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-8022&quot;&gt;&lt;del&gt;LU-8022&lt;/del&gt;&lt;/a&gt; lnet: Don&apos;t access NULL NI on failure path&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 7163df3dcd22199609539530a6a761acc6fd689e&lt;/p&gt;</comment>
                            <comment id="150347" author="heckes" created="Wed, 27 Apr 2016 07:57:51 +0000"  >&lt;p&gt;Patch has been included into build &apos;20160427&apos; (see &lt;a href=&quot;https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20160427&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20160427&lt;/a&gt;) and going to be verified in soak test session associated with this build.&lt;/p&gt;</comment>
                            <comment id="151453" author="heckes" created="Mon, 9 May 2016 09:16:50 +0000"  >&lt;p&gt;In soak test session for build &apos;20160427&apos; which includes patch 1 of #19614, the error never occurred anymore. The duration for soak is 10 days now.&lt;/p&gt;</comment>
                            <comment id="153330" author="ezell" created="Tue, 24 May 2016 14:17:57 +0000"  >&lt;p&gt;We hit this today on our LNET routers when upgrading a cluster to 2.8 with &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7101&quot; title=&quot;Lnet: Support per NI map-on-demand&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7101&quot;&gt;&lt;del&gt;LU-7101&lt;/del&gt;&lt;/a&gt;. Router pinger messages come in before all the NIs have been added, causing this failure.&lt;/p&gt;</comment>
                            <comment id="154013" author="gerrit" created="Tue, 31 May 2016 04:54:22 +0000"  >&lt;p&gt;Oleg Drokin (oleg.drokin@intel.com) merged in patch &lt;a href=&quot;http://review.whamcloud.com/19614/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/19614/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-8022&quot; title=&quot;LNet: BUG: unable to handle kernel NULL pointer dereference&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-8022&quot;&gt;&lt;del&gt;LU-8022&lt;/del&gt;&lt;/a&gt; lnet: Don&apos;t access NULL NI on failure path&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: f5c7fec23cb26219d959290a4a311119747cc609&lt;/p&gt;</comment>
                            <comment id="154067" author="pjones" created="Tue, 31 May 2016 12:52:48 +0000"  >&lt;p&gt;Landed for 2.9&lt;/p&gt;</comment>
                            <comment id="157028" author="gerrit" created="Mon, 27 Jun 2016 17:37:26 +0000"  >&lt;p&gt;Doug Oucharek (doug.s.oucharek@intel.com) uploaded a new patch: &lt;a href=&quot;http://review.whamcloud.com/21001&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/21001&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-8022&quot; title=&quot;LNet: BUG: unable to handle kernel NULL pointer dereference&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-8022&quot;&gt;&lt;del&gt;LU-8022&lt;/del&gt;&lt;/a&gt; lnet: Correct position of lnet_ni_decref()&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 046a485e69dc879bf112690c1434dee86292554b&lt;/p&gt;</comment>
                            <comment id="157719" author="gerrit" created="Tue, 5 Jul 2016 23:47:25 +0000"  >&lt;p&gt;Oleg Drokin (oleg.drokin@intel.com) merged in patch &lt;a href=&quot;http://review.whamcloud.com/21001/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/21001/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-8022&quot; title=&quot;LNet: BUG: unable to handle kernel NULL pointer dereference&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-8022&quot;&gt;&lt;del&gt;LU-8022&lt;/del&gt;&lt;/a&gt; lnet: Correct position of lnet_ni_decref()&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: e8278552cfbcf518209a38f82548a16833686ae9&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                                                <inwardlinks description="is duplicated by">
                                                        </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="31920">LU-7101</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="21120" name="lola-10.log.bz2" size="51796" author="heckes" created="Thu, 14 Apr 2016 08:18:24 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzy80f:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>