Details
-
Bug
-
Resolution: Fixed
-
Major
-
None
-
Lustre 1.8.7
-
1
-
6099
Description
We are having problems installing lustre with Myrinet support on a customer site.
The building process seems fine, the MX drivers work standalone (we can load the drivers, bring up interfaces, set IP address and get communication with other servers). We also manage to build lustre with no warning or error messages.
However, when installing the RPMs a bunch of kmxlnd.ko messages pops up concerning about unknown symbols, like mx_*
This is the process we are following:
1) Files we are using:
kernel-headers-2.6.18-274.3.1.el5_lustre.g9500ebf.x86_64.rpm
kernel-2.6.18-274.3.1.el5_lustre.g9500ebf.x86_64.rpm
lustre-source-1.8.7-wc1_2.6.18_274.3.1.el5_lustre.g9500ebf.x86_64.rpm
kernel-debuginfo-common-2.6.18-274.3.1.el5_lustre.g9500ebf.x86_64.rpm
kernel-devel-2.6.18-274.3.1.el5_lustre.g9500ebf.x86_64.rpm
mx_1.2.12.tar.gz
2) Install the kernel and lustre source, and reboot
rpm Uvh --nodeps kernel* lustre-source-*
reboot
3) build the MX driver
./configure --enable-kernel-lib --enable-10g --enable-ether-mode
make rpm
rpm -Uvh mx-1.2.12-1.x86_64.rpm
4) Build Lustre
./configure --enable-quota --with-server --disable-lru-resize --enable-ext4 --disable-health-write --with-mx=/root/mx/mx-1.2.12
make rpms
cd /usr/src/redhat/RPMS/x86_64/
rpm -Uvh lustre-1.8.7* lustre-ldiskfs* lustre-modules*
All lustre packages get installed but these warning messages pops up:
... <snip>
WARNING: /lib/modules/2.6.18-274.3.1.el5_lustre.g9500ebf/updates/kernel/net/lustre/kmxlnd.ko needs unknown symbol mx_get_endpoint_addr
WARNING: /lib/modules/2.6.18-274.3.1.el5_lustre.g9500ebf/updates/kernel/net/lustre/kmxlnd.ko needs unknown symbol mx_open_endpoint
WARNING: /lib/modules/2.6.18-274.3.1.el5_lustre.g9500ebf/updates/kernel/net/lustre/kmxlnd.ko needs unknown symbol mx_finalize
WARNING: /lib/modules/2.6.18-274.3.1.el5_lustre.g9500ebf/updates/kernel/net/lustre/kmxlnd.ko needs unknown symbol mx_iconnect
WARNING: /lib/modules/2.6.18-274.3.1.el5_lustre.g9500ebf/updates/kernel/net/lustre/kmxlnd.ko needs unknown symbol mx_strerror
WARNING: /lib/modules/2.6.18-274.3.1.el5_lustre.g9500ebf/updates/kernel/net/lustre/kmxlnd.ko needs unknown symbol mx_set_endpoint_addr_context
WARNING: /lib/modules/2.6.18-274.3.1.el5_lustre.g9500ebf/updates/kernel/net/lustre/kmxlnd.ko needs unknown symbol mx_kirecv
WARNING: /lib/modules/2.6.18-274.3.1.el5_lustre.g9500ebf/updates/kernel/net/lustre/kmxlnd.ko needs unknown symbol mx_get_endpoint_addr_context
WARNING: /lib/modules/2.6.18-274.3.1.el5_lustre.g9500ebf/updates/kernel/net/lustre/kmxlnd.ko needs unknown symbol mx_wait_any
WARNING: /lib/modules/2.6.18-274.3.1.el5_lustre.g9500ebf/updates/kernel/net/lustre/kmxlnd.ko needs unknown symbol mx_nic_id_to_board_number
WARNING: /lib/modules/2.6.18-274.3.1.el5_lustre.g9500ebf/updates/kernel/net/lustre/kmxlnd.ko needs unknown symbol mx_close_endpoint
WARNING: /lib/modules/2.6.18-274.3.1.el5_lustre.g9500ebf/updates/kernel/net/lustre/kmxlnd.ko needs unknown symbol mx__init_api
WARNING: /lib/modules/2.6.18-274.3.1.el5_lustre.g9500ebf/updates/kernel/net/lustre/kmxlnd.ko needs unknown symbol mx_register_unexp_handler
WARNING: /lib/modules/2.6.18-274.3.1.el5_lustre.g9500ebf/updates/kernel/net/lustre/kmxlnd.ko needs unknown symbol mx_kisend
WARNING: /lib/modules/2.6.18-274.3.1.el5_lustre.g9500ebf/updates/kernel/net/lustre/kmxlnd.ko needs unknown symbol mx_test_any
WARNING: /lib/modules/2.6.18-274.3.1.el5_lustre.g9500ebf/updates/kernel/net/lustre/kmxlnd.ko needs unknown symbol mx_decompose_endpoint_addr
WARNING: /lib/modules/2.6.18-274.3.1.el5_lustre.g9500ebf/updates/kernel/net/lustre/kmxlnd.ko needs unknown symbol mx_strstatus
WARNING: /lib/modules/2.6.18-274.3.1.el5_lustre.g9500ebf/updates/kernel/net/lustre/kmxlnd.ko needs unknown symbol mx_cancel
WARNING: /lib/modules/2.6.18-274.3.1.el5_lustre.g9500ebf/updates/kernel/net/lustre/kmxlnd.ko needs unknown symbol mx_set_request_timeout
WARNING: /lib/modules/2.6.18-274.3.1.el5_lustre.g9500ebf/updates/kernel/net/lustre/kmxlnd.ko needs unknown symbol mx_wakeup
WARNING: /lib/modules/2.6.18-274.3.1.el5_lustre.g9500ebf/updates/kernel/net/lustre/kmxlnd.ko needs unknown symbol mx_connect
WARNING: /lib/modules/2.6.18-274.3.1.el5_lustre.g9500ebf/updates/kernel/net/lustre/kmxlnd.ko needs unknown symbol mx_decompose_endpoint_addr2
WARNING: /lib/modules/2.6.18-274.3.1.el5_lustre.g9500ebf/updates/kernel/net/lustre/kmxlnd.ko needs unknown symbol mx_disconnect
WARNING: /lib/modules/2.6.18-274.3.1.el5_lustre.g9500ebf/updates/kernel/net/lustre/kmxlnd.ko needs unknown symbol mx_get_endpoint_addr
WARNING: /lib/modules/2.6.18-274.3.1.el5_lustre.g9500ebf/updates/kernel/net/lustre/kmxlnd.ko needs unknown symbol mx_open_endpoint
WARNING: /lib/modules/2.6.18-274.3.1.el5_lustre.g9500ebf/updates/kernel/net/lustre/kmxlnd.ko needs unknown symbol mx_finalize
WARNING: /lib/modules/2.6.18-274.3.1.el5_lustre.g9500ebf/updates/kernel/net/lustre/kmxlnd.ko needs unknown symbol mx_iconnect
WARNING: /lib/modules/2.6.18-274.3.1.el5_lustre.g9500ebf/updates/kernel/net/lustre/kmxlnd.ko needs unknown symbol mx_strerror
WARNING: /lib/modules/2.6.18-274.3.1.el5_lustre.g9500ebf/updates/kernel/net/lustre/kmxlnd.ko needs unknown symbol mx_set_endpoint_addr_context
WARNING: /lib/modules/2.6.18-274.3.1.el5_lustre.g9500ebf/updates/kernel/net/lustre/kmxlnd.ko needs unknown symbol mx_kirecv
WARNING: /lib/modules/2.6.18-274.3.1.el5_lustre.g9500ebf/updates/kernel/net/lustre/kmxlnd.ko needs unknown symbol mx_get_endpoint_addr_context
WARNING: /lib/modules/2.6.18-274.3.1.el5_lustre.g9500ebf/updates/kernel/net/lustre/kmxlnd.ko needs unknown symbol mx_wait_any
WARNING: /lib/modules/2.6.18-274.3.1.el5_lustre.g9500ebf/updates/kernel/net/lustre/kmxlnd.ko needs unknown symbol mx_nic_id_to_board_number
WARNING: /lib/modules/2.6.18-274.3.1.el5_lustre.g9500ebf/updates/kernel/net/lustre/kmxlnd.ko needs unknown symbol mx_close_endpoint
... </snip>
5) Bring up the MX driver:
/opt/mx/sbin/mx_start_stop start
Loading mx driver
Creating mx devices
6) Bringing up kmxlnd
modprobe kmxlnd
FATAL: Error inserting kmxlnd (/lib/modules/2.6.18-274.3.1.el5_lustre.g9500ebf/updates/kernel/net/lustre/kmxlnd.ko): Unknown symbol in module, or unknown parameter (see dmesg)
And the same kind of messages are also logged on dmesg:
...<snip>
kmxlnd: Unknown symbol mx_get_endpoint_addr
kmxlnd: Unknown symbol mx_open_endpoint
kmxlnd: Unknown symbol mx_finalize
kmxlnd: Unknown symbol mx_iconnect
kmxlnd: Unknown symbol mx_strerror
kmxlnd: Unknown symbol mx_set_endpoint_addr_context
kmxlnd: Unknown symbol mx_kirecv
...</snip>
So, Does anyone at WC could help us to figure out what's wrong here and how can we make this configuration work?
Some sanity checklist or install guide maybe!?
Thank you.
Attachments
Issue Links
- Trackbacks
-
Lustre 1.8.x known issues tracker While testing against Lustre b18 branch, we would hit known bugs which were already reported in Lustre Bugzilla https://bugzilla.lustre.org/. In order to move away from relying on Bugzilla, we would create a JIRA
Hi Liang. Yes, we did and looks allright.
We managed to get things working and finished the building process. As explained before, seems like there some conflicts with files when upgrading some RPMs or when adding the lustre source RPM. We don't quite figure out yet what exactly is causing the problem since we are running late in this deployment. However, our plan is to continue to investigate it and understand why it happens.
Another question for now is about the MX compatibility mode. The servers (OSS and MDS) has MX cards and may run natively, but the clients are Gbit ethernet. As far as I understand we should run Lustre on tcp mode since that's the only thing the clients can communicate. However, when building the MX driver with ether-support (--enable-ether-mode --enable-10g-mode) the network seems to stop responding. We know this is not a lustre issue, but we are wondering if anyone has some suggestions on how to build the driver.
It's also interesting that even building the driver with no ethernet mode or 10g mode support we still able to bringup the interface, assign an IP address. This is an output from mx_info when built with ethernet mode and 10g mode support
[root@oss05 to-install]# /opt/mx/bin/mx_info
MX Version: 1.2.12
MX Build: root@oss08:/root/mx_ether/mx-1.2.12 Wed Mar 14 11:39:25 CDT 2012
2 Myrinet boards installed.
The MX driver is configured to support a maximum of:
8 endpoints per NIC, 1024 NICs on the network, 32 NICs per host
===================================================================
Instance #0: 364.4 MHz LANai, PCI-E x8, 2 MB SRAM
Status: Running, P0: Wrong Network
Network: Ethernet 10G
MAC Address: 00:60:dd:45:1a:20
Product code: 10G-PCIE2-8B2L-2QP
Part number: 09-04247
Serial number: 427870
Mapper: 00:60:dd:45:1a:21, version = 0x00000000, configured
Mapped hosts: 1
ROUTE COUNT
INDEX MAC ADDRESS HOST NAME P0
----- ----------- --------- —
0) 00:60:dd:45:1a:20 oss05:0 1,0
1) 00:60:dd:45:1a:21 oss05:1 D 0,0
===================================================================
Instance #1: 364.4 MHz LANai, PCI-E x8, 2 MB SRAM
Status: Running, P0: Wrong Network
Network: Ethernet 10G
MAC Address: 00:60:dd:45:1a:21
Product code: 10G-PCIE2-8B2L-2QP
Part number: 09-04247
Serial number: 427870
Mapper: 00:60:dd:45:1a:21, version = 0x00000000, configured
Mapped hosts: 1
ROUTE COUNT
INDEX MAC ADDRESS HOST NAME P0
----- ----------- --------- —
0) 00:60:dd:45:1a:20 oss05:0 D 0,0
1) 00:60:dd:45:1a:21 oss05:1 1,0
[root@oss05 to-install]#
We can't see the other host indexes, mac and host names and also there's a clear message on status line saying "Wrong message" .
this is how my modprobe.conf line looks like:
options lnet networks=mx0(myri0)
PS: We still capable to lctl ping.
Thanks
Carlos