[LU-17467] Incorrect checks for Nvidia libraries in configure Created: 24/Jan/24  Updated: 26/Jan/24

Status: In Progress
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Jean-Baptiste Skutnik Assignee: Jean-Baptiste Skutnik
Resolution: Unresolved Votes: 0
Labels: None
Environment:

Linux arcV 6.7.0-arch3-1 #1 SMP PREEMPT_DYNAMIC Sat, 13 Jan 2024 14:37:14 +0000 x86_64 GNU/Linux (Host)
Container image rockylinux:8
Nvidia GPU on machine - cuda not installed in container


Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Hello all, I was compiling lustre in a container and came across what I think is an error in the configure script.

I have a nvidia gpu installed on the machine. I am trying to replicate a minimal build environment in a container and thus am not installing cuda in it. When configuring, I set `--disable-server --disable-tests`. Got a Cuda error, no big deal, went to disable.

I tried the following: `-without-cuda --without-gds`, `-with-cuda=no --with-gds=no`, as the help page said they are equivalent.

Turns out they are, because both gave me the same error:
```
checking for /no/nv-p2p.h... no
configure: error: CUDA sources don't found. nv-p2p.h don't exit
```

Looking at the code, it seems disabling them sets their value to 'no', and the code checks for an empty parameter. It is set to 'no', so the code proceeds with them defined. `lnet/autoconf/lustre-lnet.m4`:

```c
AS_IF([test -n "${CUDA_PATH}" && test -n "${GDS_PATH}"],[
LB_CHECK_FILE([$CUDA_PATH/nv-p2p.h],
[
AC_MSG_RESULT([CUDA path is $CUDA_PATH])
AC_SUBST(CUDA_PATH)
],
[AC_MSG_ERROR([CUDA sources not found. nv-p2p.h doesn't exist])]
)
```

Note the `test -n`. So yeah I thought that was worth reporting if there are no objections I would love to patch this out let me know how to proceed !



 Comments   
Comment by Jean-Baptiste Skutnik [ 24/Jan/24 ]

Meant to say that this is on a fresh pull of the git,

lustre-release> git describe
v2_15_60-25-gd30e1dc858

Comment by Gerrit Updater [ 26/Jan/24 ]

"Jean-Baptiste Skutnik <jb.skutnik@gmail.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53832
Subject: LU-17467 build: Expand CUDA source detection logic
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: b4046d705c79ae9dc89865d9a4404242cc6abc19

Generated at Sat Feb 10 03:35:41 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.