<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:31:33 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-3166] (o2iblnd_cb.c:2831:kiblnd_cm_callback()) LBUG</title>
                <link>https://jira.whamcloud.com/browse/LU-3166</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;bonding configuration is setup with IPoIB on OFED-3.5 for active/standby LNET configuration. ko2iblnd with bond0 works well, but once active slave interface is changed to another slave interface, Lustre servers crashed due to kiblnd_cm_callback() LBUG. This didn&apos;t happen on OFED-1.5.x, but only happen on OFED-3.5. &lt;/p&gt;

&lt;p&gt;Here is reproducer.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;# cat /proc/net/bonding/bond0 
Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)

Bonding Mode: fault-tolerance (active-backup) (fail_over_mac active)
Primary Slave: ib0 (primary_reselect always)
Currently Active Slave: ib0
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 5000
Down Delay (ms): 0

Slave Interface: ib0
MII Status: up
Speed: Unknown
Duplex: Unknown
Link Failure Count: 0
Permanent HW addr: 80:00:00:48:fe:80
Slave queue ID: 0

Slave Interface: ib1
MII Status: up
Speed: Unknown
Duplex: Unknown
Link Failure Count: 0
Permanent HW addr: 80:00:00:49:fe:80
Slave queue ID: 0
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Change slave interface and got LBUG.&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;# ifenslave bond0 -c ib1

Message from syslogd@s15 at Apr 14 03:51:57 ...
 kernel:LNetError: 1627:0:(o2iblnd_cb.c:2831:kiblnd_cm_callback()) LBUG

Message from syslogd@s15 at Apr 14 03:51:57 ...
 kernel:Kernel panic - not syncing: LBUG
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;here is console messages and backtrace from crashdump.&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;# cat /var/crash/127.0.0.1-2013-04-14-03\:52\:04/vmcore-dmesg.txt 
--snip--
&amp;lt;6&amp;gt;bonding: bond0: making interface ib1 the new active one.
&amp;lt;6&amp;gt;RDMA CM addr change for ndev bond0 used by id ffff88044bc15400
&amp;lt;3&amp;gt;LNetError: 1627:0:(o2iblnd_cb.c:2830:kiblnd_cm_callback()) Unexpected event: 14, status: 0
&amp;lt;0&amp;gt;LNetError: 1627:0:(o2iblnd_cb.c:2831:kiblnd_cm_callback()) LBUG
&amp;lt;4&amp;gt;Pid: 1627, comm: rdma_cm
&amp;lt;4&amp;gt;
&amp;lt;4&amp;gt;Call Trace:
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa06e4895&amp;gt;] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa06e4e97&amp;gt;] lbug_with_loc+0x47/0xb0 [libcfs]
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa0b66bda&amp;gt;] kiblnd_cm_callback+0x9a/0x1140 [ko2iblnd]
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa059da18&amp;gt;] cma_ndev_work_handler+0x48/0xa0 [rdma_cm]
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa059d9d0&amp;gt;] ? cma_ndev_work_handler+0x0/0xa0 [rdma_cm]
&amp;lt;4&amp;gt; [&amp;lt;ffffffff8108b120&amp;gt;] worker_thread+0x170/0x2a0
&amp;lt;4&amp;gt; [&amp;lt;ffffffff81090990&amp;gt;] ? autoremove_wake_function+0x0/0x40
&amp;lt;4&amp;gt; [&amp;lt;ffffffff8108afb0&amp;gt;] ? worker_thread+0x0/0x2a0
&amp;lt;4&amp;gt; [&amp;lt;ffffffff81090626&amp;gt;] kthread+0x96/0xa0
&amp;lt;4&amp;gt; [&amp;lt;ffffffff8100c0ca&amp;gt;] child_rip+0xa/0x20
&amp;lt;4&amp;gt; [&amp;lt;ffffffff81090590&amp;gt;] ? kthread+0x0/0xa0
&amp;lt;4&amp;gt; [&amp;lt;ffffffff8100c0c0&amp;gt;] ? child_rip+0x0/0x20
&amp;lt;4&amp;gt;
&amp;lt;0&amp;gt;Kernel panic - not syncing: LBUG
&amp;lt;4&amp;gt;Pid: 1627, comm: rdma_cm Not tainted 2.6.32-279.19.1.el6_lustre.x86_64 #1
&amp;lt;4&amp;gt;Call Trace:
&amp;lt;4&amp;gt; [&amp;lt;ffffffff814e9811&amp;gt;] ? panic+0xa0/0x168
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa06e4eeb&amp;gt;] ? lbug_with_loc+0x9b/0xb0 [libcfs]
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa0b66bda&amp;gt;] ? kiblnd_cm_callback+0x9a/0x1140 [ko2iblnd]
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa059da18&amp;gt;] ? cma_ndev_work_handler+0x48/0xa0 [rdma_cm]
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa059d9d0&amp;gt;] ? cma_ndev_work_handler+0x0/0xa0 [rdma_cm]
&amp;lt;4&amp;gt; [&amp;lt;ffffffff8108b120&amp;gt;] ? worker_thread+0x170/0x2a0
&amp;lt;4&amp;gt; [&amp;lt;ffffffff81090990&amp;gt;] ? autoremove_wake_function+0x0/0x40
&amp;lt;4&amp;gt; [&amp;lt;ffffffff8108afb0&amp;gt;] ? worker_thread+0x0/0x2a0
&amp;lt;4&amp;gt; [&amp;lt;ffffffff81090626&amp;gt;] ? kthread+0x96/0xa0
&amp;lt;4&amp;gt; [&amp;lt;ffffffff8100c0ca&amp;gt;] ? child_rip+0xa/0x20
&amp;lt;4&amp;gt; [&amp;lt;ffffffff81090590&amp;gt;] ? kthread+0x0/0xa0
&amp;lt;4&amp;gt; [&amp;lt;ffffffff8100c0c0&amp;gt;] ? child_rip+0x0/0x20

crash&amp;gt; bt
PID: 1627   TASK: ffff88046e13f500  CPU: 0   COMMAND: &quot;rdma_cm&quot;
 #0 [ffff880464155c08] machine_kexec at ffffffff81031f7b
 #1 [ffff880464155c68] crash_kexec at ffffffff810b8c22
 #2 [ffff880464155d38] panic at ffffffff814e9818
 #3 [ffff880464155db8] lbug_with_loc at ffffffffa06e4eeb [libcfs]
 #4 [ffff880464155dd8] kiblnd_cm_callback at ffffffffa0b66bda [ko2iblnd]
 #5 [ffff880464155e08] cma_ndev_work_handler at ffffffffa059da18 [rdma_cm]
 #6 [ffff880464155e38] worker_thread at ffffffff8108b120
 #7 [ffff880464155ee8] kthread at ffffffff81090626
 #8 [ffff880464155f48] kernel_thread at ffffffff8100c0ca
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</description>
                <environment>OFED-3.5, CentOS6.3</environment>
        <key id="18401">LU-3166</key>
            <summary>(o2iblnd_cb.c:2831:kiblnd_cm_callback()) LBUG</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="mdiep">Minh Diep</assignee>
                                    <reporter username="ihara">Shuichi Ihara</reporter>
                        <labels>
                            <label>mn1</label>
                    </labels>
                <created>Sat, 13 Apr 2013 02:19:53 +0000</created>
                <updated>Mon, 18 Nov 2013 16:03:56 +0000</updated>
                            <resolved>Fri, 26 Jul 2013 15:50:12 +0000</resolved>
                                    <version>Lustre 2.4.0</version>
                                    <fixVersion>Lustre 2.5.0</fixVersion>
                    <fixVersion>Lustre 2.4.2</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>6</watches>
                                                                            <comments>
                            <comment id="56239" author="pjones" created="Sat, 13 Apr 2013 03:20:52 +0000"  >&lt;p&gt;Ihara&lt;/p&gt;

&lt;p&gt;We have just moved up to RHEL 6.4 on master and OFED 3.5 is not supported for that release. Is this a combination that you are planning to use in production at a customer site? Could the version of OFED in the RHEL distribution meet your needs?&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="56241" author="ihara" created="Sat, 13 Apr 2013 04:02:56 +0000"  >&lt;p&gt;Peter, &lt;br/&gt;
OK.. &lt;br/&gt;
Well, As far as I know, Mellanox is going to release new their new OFED (call Mellanox OFED 2.0) which is based on OFED-3.x. There are several important improvements for SRP in this OFED.&lt;br/&gt;
&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-1468&quot; title=&quot;Support Compat RDMA for O2IB&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-1468&quot;&gt;&lt;del&gt;LU-1468&lt;/del&gt;&lt;/a&gt; helped to build Lustre with this new Mellanox OFED, but server crashed either on this Mellanox OFED. &lt;/p&gt;

&lt;p&gt;That&apos;s why I retested on OFED-3.5 against Mellanox new OFED to generalize problem. We need to check Mellanox supports RHEL6.4&apos;s kernel, but even today, potentially there are bugs in soemwhere when we use 3.x based OFED.&lt;/p&gt;</comment>
                            <comment id="56245" author="pjones" created="Sat, 13 Apr 2013 05:30:31 +0000"  >&lt;p&gt;ok Ihara. Minh can you please assist Ihara with this?&lt;/p&gt;</comment>
                            <comment id="56246" author="mdiep" created="Sat, 13 Apr 2013 05:43:02 +0000"  >&lt;p&gt;Hi Ihara,&lt;/p&gt;

&lt;p&gt;Did you apply &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2975&quot; title=&quot;Build fails on 2.6.32-279.22.1.el6 due to two functions redefined&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2975&quot;&gt;&lt;del&gt;LU-2975&lt;/del&gt;&lt;/a&gt; patch to make OFED-3.5 work on rhel6.3? &lt;/p&gt;</comment>
                            <comment id="56258" author="liang" created="Sat, 13 Apr 2013 19:14:19 +0000"  >&lt;p&gt;kiblnd_cm_callback()) Unexpected event: 14, status: 0&lt;/p&gt;

&lt;p&gt;hmm... I checked source code of ofed, event 14 is RDMA_CM_EVENT_ADDR_CHANGE&lt;br/&gt;
we do have code to check this event for very long time, unless o2iblnd is built against old OFED version...&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;#ifdef HAVE_OFED_RDMA_CMEV_ADDRCHANGE
        case RDMA_CM_EVENT_ADDR_CHANGE:
                LCONSOLE_INFO(&quot;Physical link changed (eg hca/port)\n&quot;);
                return 0;
#endif
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
</comment>
                            <comment id="56259" author="ihara" created="Sat, 13 Apr 2013 22:39:33 +0000"  >&lt;p&gt;Thanks Liang. A header file was missing when these RDMA events are checked with OFED-3.5, then it was failing..&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;checking if OFED has ib_dma_map_single... yes
checking if OFED has RDMA_CM_EVENT_ADDR_CHANGE... no
checking if OFED has RDMA_CM_EVENT_TIMEWAIT_EXIT... no
checking if OFED has rdma_set_reuseaddr... no
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I pushed patch to compile correctly for these checks. &lt;a href=&quot;http://review.whamcloud.com/6048&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/6048&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="58445" author="ihara" created="Tue, 14 May 2013 13:58:00 +0000"  >&lt;p&gt;A this moment, we didn&apos;t have any OFED option for RHEL6.4 except RHEL in-kernel tree OFED.&lt;br/&gt;
Howerver, this patches will be needed since new OFED is released out from Mellanox. MLNX_OFED_LINUX-2.0-2.0.5 which is based on OFED-3.x compat rdma headers.&lt;/p&gt;</comment>
                            <comment id="63009" author="mdiep" created="Thu, 25 Jul 2013 22:20:23 +0000"  >&lt;p&gt;This was landed in Jun 21st. Can I close this, Ihara?&lt;/p&gt;</comment>
                            <comment id="63015" author="ihara" created="Thu, 25 Jul 2013 23:31:35 +0000"  >&lt;p&gt;Minh, Yes, please.&lt;/p&gt;</comment>
                            <comment id="70983" author="mdiep" created="Thu, 7 Nov 2013 16:33:44 +0000"  >&lt;p&gt;patch for b2_1 &lt;a href=&quot;http://review.whamcloud.com/#/c/8207/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/8207/&lt;/a&gt;&lt;br/&gt;
patch for b2_4 &lt;a href=&quot;http://review.whamcloud.com/#/c/8205/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/8205/&lt;/a&gt;&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzvo0n:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>7719</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10021"><![CDATA[2]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>