[LU-2614] Add build support for HCA virtualisation Created: 14/Jan/13  Updated: 01/Mar/19  Resolved: 01/Mar/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: None

Type: Story Priority: Critical
Reporter: Frank Heckes (Inactive) Assignee: Frank Heckes (Inactive)
Resolution: Not a Bug Votes: 0
Labels: None

Issue Links:
Related
Rank (Obsolete): 6117

 Description   

Change lbuild script to support compilation of Mellanox (alpha) OFED stack
to support SR-IOV virtualisation for IB HCAs

NOTE: Work needed to support the testing of IB Virtualization on the Julich cluster,



 Comments   
Comment by James A Simmons [ 14/Jan/13 ]

Out of curiosity is this their OFED-1.9 stack?

Comment by Frank Heckes (Inactive) [ 14/Jan/13 ]

No, not as far as I know. We working with the pre-release version supporting SR-IOV for
Mellanox HCA's that is available for RHEL/CentOS 6.2 only, just to prepare the
build system and autotest framework to make use of the feature quickly for 'production'
as soon as support for 6.3 and other distros will be released.

Comment by Andreas Dilger [ 15/Jan/13 ]

Frank, any reason why this is for 2.2 and not 2.4?

Comment by Frank Heckes (Inactive) [ 16/Jan/13 ]

At the moment the Mellanox SR-IOV enabled OFED only
supports kernel version 2.6.32-220.X. (Tried to do for current
master branch, but naive build failed and I would end up in coding the ofa kernel)
This kernel seems to be available in b2_2 only (please correct me if I'm wrong).
I expect Rhel/CentOS6.3 supported OFED to be shared with us
next Monday.
Anyway, I decided to use this branch to create a prototype. Looks
very promissing all build succeed (i.e Lustre was built against
the SR-IOV OFED) and we're going to run autotest against it today.
Once that work, we can easily replace the new OFED to run build of the
master branch.

Comment by Frank Heckes (Inactive) [ 24/Jan/13 ]

Changed lbuild to compile SR-IOV OFED stack and compile Lustre against it.
Software builds executed successful (see: http://build.whamcloud.com/job/lustre-reviews-vib/arch=x86_64,build_type=server,distro=el6,ib_stack=vofa/)
Constraints: Rhel/CentOS 6.2, kernel 2.6.32.220.* for server
Autotest ran successful, too. (although test_123a failed; see: https://maloo.whamcloud.com/test_sets/c20ab082-6633-11e2-a42b-52540035b04c , it proves the concept)

Wait for Rhel/CentOS 6.3 OFED stack to use the framework changes (jenkins, lbuild) for master branch and finally land the change.

Comment by Frank Heckes (Inactive) [ 08/Apr/13 ]

Received Mellanox (SR-IOV enabled) OFED stack for CentOS 6.3.
Made changes to compile Lustre against Mellanox OFED stack (see gerrit:5295 patch 17)
Build has been started in test queue 'Frank_Heckes'; waiting for test results.

Comment by Frank Heckes (Inactive) [ 09/Apr/13 ]

Build tested almost successful, besides test_0c, test_0d and test_73a of 'replay-single' suite. These error are related to bug LU-414.
Installation and all other tests ran successful which proves the concept.
See session :

https://maloo.whamcloud.com/test_sessions/0c32a460-a0dd-11e2-b429-52540035b04c

The build is currently available on Whamcloud internal jenkins-node 'jenkins-staging'. I'll make them available asap.

Comment by Jodi Levi (Inactive) [ 02/May/13 ]

http://review.whamcloud.com/#change,5295

Comment by Frank Heckes (Inactive) [ 10/Sep/14 ]

Pushed change that will change the RPM specfile to build the Mellanox kernel modules and kernel-devel RPM to fit into the Jenkins build environment:

http://review.whamcloud.com/#/c/11593/2

The change have been successful tested on node identical to onyx Jenkins builder on Jülich autotest cluster Juliette.
The RPMs have been installed, a Lustre FS had been created and tested successful using on single SR-IOV enable node of Juliette.

Anyway change is too inflexible. Will supply new change which will only change the --o2ib-path flag for Lustre configure.

Comment by Frank Heckes (Inactive) [ 02/Oct/14 ]
  • New patch relying on the modification of variable CONFIGURE_FLAGS only to provide the correct
    path of the Mellanox OFED compat header files to Lustre configure (via --with-o2ib=...) has been
    pushed:
    http://review.whamcloud.com/#/c/11593/3
    Build ended successful, but code need a correction and additionally an extension to download the Mellanox OFED tar archive
    automatically.
  • Change need to be test also in Jenkins job 'lustre-reviews-vib', since configuration of 'lustre-reviews'
    won't cover the execution of MLNX builds (vib or mlnx label + setting in script file is missing in lustre-reviews)
    Queue has to be enable as soon as patch is review +1 in lustre-reviews.
    Code for Mellanox build has been verified on community cluster Juliette (FZ Jülich) so far only.
  • Checked the execution of normal Lustre 'inkernel' build in VM that run on top of a hypervisor with Mellanox OFED installed
    and SR-IOV enabled. It pass all basic test (creation of FS, client file operations (create, remove, ...), small performance test)
    This would increase the test coverage for 'inkernel' and if available openfabric builds, too. Anyway some technical aspects (does
    all user land libs and programs execute well without runtime surprises; especially for all MPI applications) has to clarified
    and discussed.
Comment by Minh Diep [ 01/Mar/19 ]

closing due to we have IB support in our lbuild for a while now

Generated at Sat Feb 10 01:26:42 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.