[LU-14795] NVidia GDS support in lustre Created: 28/Jun/21  Updated: 09/Jan/22  Resolved: 11/Aug/21

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Major
Reporter: Alexey Lyashkov Assignee: Alexey Lyashkov
Resolution: Won't Do Votes: 0
Labels: None
Environment:

GDS 1.0 + MOFED 5.3


Issue Links:
Related
is related to LU-14798 NVIDIA GPUDirect Storage Support Resolved
is related to LU-14212 DNE3: directory migration progress mo... Open
Rank (Obsolete): 9223372036854775807

 Description   

Nvidia release a final version for the own GDS project.
Lets add support in lustre code.



 Comments   
Comment by Gerrit Updater [ 28/Jun/21 ]

Alexey Lyashkov (alexey.lyashkov@hpe.com) uploaded a new patch: https://review.whamcloud.com/44093
Subject: LU-14795 lnet: use lnet_send_data as possible.
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: ec044b859e20a19e59b15074f5ae98d1da3d6bbd

Comment by Gerrit Updater [ 28/Jun/21 ]

Alexey Lyashkov (alexey.lyashkov@hpe.com) uploaded a new patch: https://review.whamcloud.com/44094
Subject: LU-14795 lnet: add GDS configure
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: d108819dbbfb58a980f2323906e38f510401c1d1

Comment by Gerrit Updater [ 28/Jun/21 ]

Alexey Lyashkov (alexey.lyashkov@hpe.com) uploaded a new patch: https://review.whamcloud.com/44095
Subject: LU-14795 lnet: export additional info
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 002f3c65b57aacf57bc23ca1a8fc4553ad0cdbe3

Comment by Gerrit Updater [ 28/Jun/21 ]

Alexey Lyashkov (alexey.lyashkov@hpe.com) uploaded a new patch: https://review.whamcloud.com/44096
Subject: LU-14795 llite: detect an GPU transfer
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 632892ba1c00f44c48a9125bd364e81cbff7dd5a

Comment by Gerrit Updater [ 28/Jun/21 ]

Alexey Lyashkov (alexey.lyashkov@hpe.com) uploaded a new patch: https://review.whamcloud.com/44097
Subject: LU-14795 ptlrpc: GDS bulk transfer support
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 647ce2dfab23de67b4b20b3187911db3419ff4ff

Comment by Gerrit Updater [ 28/Jun/21 ]

Alexey Lyashkov (alexey.lyashkov@hpe.com) uploaded a new patch: https://review.whamcloud.com/44098
Subject: LU-14795 lnet: LNet GDS support
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 0013ef246d0c8b5e506cdf31a5a9879f3962fe6e

Comment by Gerrit Updater [ 28/Jun/21 ]

Alexey Lyashkov (alexey.lyashkov@hpe.com) uploaded a new patch: https://review.whamcloud.com/44099
Subject: LU-14795 lnet: add GPU <> IB device affinity handing.
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 31e16fcdf48de3e59bd0156f9e7918d854380082

Comment by Cory Spitz [ 02/Jul/21 ]

This ticket "pre-dates" LU-14798.

Comment by Shuichi Ihara [ 07/Aug/21 ]

checked out head of LU-14795 https://review.whamcloud.com/#/c/44099, but build failed below.

# sh ./autogen.sh ; ./configure --with-o2ib=/usr/src/ofa_kernel/default --disable-server; make debs
- snip - 
checking if Linux kernel was built with CONFIG_MODULES... yes
checking if Linux kernel was built with CONFIG_MODVERSIONS... yes
checking if Linux kernel was built with CONFIG_KALLSYMS... yes
checking if Linux kernel module loading is possible... yes
checking for /usr/src/nvidia-470.57.02/nvidia//nv-p2p.h... yes
NVIDIA path is /usr/src/nvidia-470.57.02/nvidia/
checking for /usr/src/nvidia-fs-2.7.50/nvfs-dma.h... yes
checking for /usr/src/nvidia-fs-2.7.50/config-host.h... no
configure: error: GDS sources don't ready for build. run configure please
make: *** No rule to make target 'debs'.  Stop.

testing GDS-1.0.1.3 which is part of latest cuda 11.4.1

# /usr/local/cuda-11.4/gds/tools/gdscheck -v
 GDS release version: 1.0.1.3
 nvidia_fs version:  2.7 libcufile version: 2.4
Comment by Alexey Lyashkov [ 07/Aug/21 ]

Let's run a 'configure' script in the gds sources, it configure options removed after DKMS build.
or make nv_configure in /usr/src/nvidia-fs-2.7.50/
(i use an make all, but configure is enough).

Comment by Alexey Lyashkov [ 11/Aug/21 ]

WC approach landed already.

Generated at Sat Feb 10 03:12:51 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.