[LU-5348] OFED Working Created: 15/Jul/14  Updated: 15/Jul/14  Resolved: 15/Jul/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Question/Request Priority: Major
Reporter: Atul Yadav Assignee: Dmitry Eremin (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

CentOS release 6.5 (Final)
lustre-2.5.2.tar.gz+ OFED-1.5.4.1.tgz+ OFED-3.5-2-MIC-beta1.tgz
lustre-2.5.2.tar.gz+ MLNX_OFED_LINUX-2.2-1.0.1-rhel6.5-x86_64.tgz


Epic/Theme: Lustre-2.5.2, Mellaonx, OFED
Story Points: 11
Rank (Obsolete): 14914

 Description   

Dear Team,

In our HPC enviorment, we are using 2 sets of computational servers.
1: CPU Group
2: Phi Group

For both the groups we are using lustre-2.5.2.tar.gz, but different OFED stack.
For CPU Group:-
lustre-2.5.2.tar.gz+ MLNX_OFED_LINUX-2.2-1.0.1-rhel6.5-x86_64.tgz
[root@test1 ~]# cat /etc/redhat-release
CentOS release 6.5 (Final)
[root@test1 ~]# uname -r
2.6.32-431.el6.x86_64
[root@test1 ~]# rpm -qa | grep kernel
kmod-mlnx-ofa_kernel-2.2-OFED.2.2.1.0.0.1.gdf6fefb.rhel6u5.x86_64
libreport-plugin-kerneloops-2.0.9-19.el6.centos.x86_64
dracut-kernel-004-335.el6.noarch
kernel-firmware-2.6.32-431.el6.noarch
kernel-devel-2.6.32-431.el6.x86_64
kernel-headers-2.6.32-431.el6.x86_64
mlnx-ofa_kernel-2.2-OFED.2.2.1.0.0.1.gdf6fefb.rhel6u5.x86_64
mlnx-ofa_kernel-devel-2.2-OFED.2.2.1.0.0.1.gdf6fefb.rhel6u5.x86_64
abrt-addon-kerneloops-2.0.8-21.el6.centos.x86_64
kernel-2.6.32-431.el6.x86_64
kmod-kernel-mft-mlnx-3.6.0-24.rhel6u5.x86_64
[root@test1 ~]# rpm -qa | grep lustre
lustre-client-modules-2.5.2-2.6.32_431.el6.x86_64.x86_64
lustre-client-2.5.2-2.6.32_431.el6.x86_64.x86_64
[root@test1 ~]# ibv_devices
device node GUID
------ ----------------
mlx4_0 f452140300832610
[root@test1 opt]# modprobe -v lnet
insmod /lib/modules/2.6.32-431.el6.x86_64/extra/kernel/net/lustre/libcfs.ko
insmod /lib/modules/2.6.32-431.el6.x86_64/extra/kernel/net/lustre/lnet.ko networks=o2ib(ib0)
[root@test1 opt]# modprobe -v lustre
insmod /lib/modules/2.6.32-431.el6.x86_64/extra/kernel/fs/lustre/lvfs.ko
insmod /lib/modules/2.6.32-431.el6.x86_64/extra/kernel/fs/lustre/obdclass.ko
insmod /lib/modules/2.6.32-431.el6.x86_64/extra/kernel/fs/lustre/ptlrpc.ko
insmod /lib/modules/2.6.32-431.el6.x86_64/extra/kernel/fs/lustre/fid.ko
insmod /lib/modules/2.6.32-431.el6.x86_64/extra/kernel/fs/lustre/mdc.ko
insmod /lib/modules/2.6.32-431.el6.x86_64/extra/kernel/fs/lustre/osc.ko
insmod /lib/modules/2.6.32-431.el6.x86_64/extra/kernel/fs/lustre/lov.ko
insmod /lib/modules/2.6.32-431.el6.x86_64/extra/kernel/fs/lustre/lustre.ko

For Phi node group:-
lustre-2.5.2.tar.gz+ OFED-1.5.4.1.tgz+ OFED-3.5-2-MIC-beta1.tgz
[root@phi2 ~]# cat /etc/redhat-release
CentOS release 6.5 (Final)
[root@phi2 ~]# uname -r
2.6.32-431.el6.x86_64
[root@phi2 ~]# rpm -qa | grep kernel
libreport-plugin-kerneloops-2.0.9-19.el6.centos.x86_64
dracut-kernel-004-335.el6.noarch
kernel-firmware-2.6.32-431.el6.noarch
kernel-devel-2.6.32-431.el6.x86_64
kernel-headers-2.6.32-431.el6.x86_64
abrt-addon-kerneloops-2.0.8-21.el6.centos.x86_64
kernel-2.6.32-431.el6.x86_64
glibc2.12.2pkg-mpss-rasmm-kernel-3.2.3-1.glibc2.12.2.x86_64
[root@phi2 ~]# rpm -qa | grep lustre
lustre-client-2.5.2-2.6.32_431.el6.x86_64.x86_64
lustre-client-modules-2.5.2-2.6.32_431.el6.x86_64.x86_64
[root@phi2 ~]# ibv_devices
device node GUID
------ ----------------
scif0 4c79bafffe540097
mlx4_0 f45214030082d930
[root@phi2 ~]# modprobe -v lnet
insmod /lib/modules/2.6.32-431.el6.x86_64/extra/kernel/net/lustre/libcfs.ko
insmod /lib/modules/2.6.32-431.el6.x86_64/extra/kernel/net/lustre/lnet.ko networks=o2ib(ib0)
[root@phi2 ~]# modprobe -v lustre
insmod /lib/modules/2.6.32-431.el6.x86_64/extra/kernel/fs/lustre/lvfs.ko
insmod /lib/modules/2.6.32-431.el6.x86_64/extra/kernel/fs/lustre/obdclass.ko
insmod /lib/modules/2.6.32-431.el6.x86_64/extra/kernel/fs/lustre/ptlrpc.ko
insmod /lib/modules/2.6.32-431.el6.x86_64/extra/kernel/fs/lustre/fid.ko
insmod /lib/modules/2.6.32-431.el6.x86_64/extra/kernel/fs/lustre/mdc.ko
insmod /lib/modules/2.6.32-431.el6.x86_64/extra/kernel/fs/lustre/osc.ko
insmod /lib/modules/2.6.32-431.el6.x86_64/extra/kernel/fs/lustre/lov.ko
insmod /lib/modules/2.6.32-431.el6.x86_64/extra/kernel/fs/lustre/lustre.ko

First Query:
Can we use 2 different OFED stack in single lustre file system.

Second Query:
We are using plain kernel in our setup. Is it advisable.

Please validate our setup and reply us.

Thank You
Atul Yadav



 Comments   
Comment by Dmitry Eremin (Inactive) [ 15/Jul/14 ]

The OFED stack from Xeon Phi depends and closely communicate with Xeon host OFED stack. So, You cannot use both stacks simultaneously. But you can switch them during reboot or modules load/unload.

Comment by Atul Yadav [ 15/Jul/14 ]

Dear Sir,

In current setup, we are using 3 type of OFED stack.

1 IO Node(MDS+ OSS) Cent OS 6.5 Native IB support
2 Compute Node Mellanox IB Driver
3 Phi Compute node OFED Driver

Is this setup is ok ?
Can we use os native IB in IO nodes and in compute nodes with mellanox.?
Is it required to maintain same OFED driver in all the environment ?

What are the basic IB modules are required by lustre to work without any problem.

Mellanox Module loaded in OS
rdma_ucm
rdma_cm
ib_addr
ib_ipoib
mlx4_core
mlx4_ib
mlx4_en
mlx5_core
mlx5_ib
ib_uverbs
ib_umad
ib_ucm
ib_sa
ib_cm
ib_mad
ib_core

Ofed 1.5.4.1 + OFED-3.5-2-MIC-beta1 loaded in OS
rdma_ucm
rdma_cm
ib_addr
ib_ipoib
mlx4_core
mlx4_ib
mlx4_en
mlx5_core
mlx5_ib
ib_mthca
ib_uverbs
ib_umad
ib_sa
ib_cm
ib_mad
ib_core
iw_cxgb3
iw_cxgb4
iw_nes

Please reply back

Thank You
Atul Yadav

Comment by Dmitry Eremin (Inactive) [ 15/Jul/14 ]

Not sure I got the question right. Let's try to describe behavior. Lustre uses Kernel VERB API to communicate through IB. This ability provided by kernel loadable modules from kernel-ib package. It's not so critical what particular version of those drivers installed if they works. The important things is the version of drivers installed should correspond the version of drivers which Lustre build with. On different nodes this combination can be different. User space libraries from OFED stack are not used by Lustre at all.

Comment by Atul Yadav [ 15/Jul/14 ]

Thanks for response,

I got my answer through your reply.

Comment by Jodi Levi (Inactive) [ 15/Jul/14 ]

Question answered.

Generated at Sat Feb 10 01:50:43 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.