[LU-5498] Lustre client build failed with Mellanox OFED Created: 15/Aug/14  Updated: 20/Aug/14  Resolved: 20/Aug/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.3
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Jay Lan (Inactive) Assignee: Bob Glossman (Inactive)
Resolution: Won't Fix Votes: 0
Labels: None
Environment:

sles11sp3 3.0.101-0.31.1
Mellanox OFED.2.2.1.0.0.1.gdf6fefb


Severity: 3
Rank (Obsolete): 15339

 Description   

I tried to compile lustre-2.4.3 client with Mellanox OFED.2.2.1.0.0.1.gdf6fefb. The compilation failed. It failed at
/usr/src/ofa_kernel/default/include/linux/pm_qos_params.h:27: error: ‘LINUX_BACKPORT’ declared as function returning a function

I saw LU-5224 and cherry-picked the patch, but it still failed:

...
make[5]: Entering directory `/usr/src/linux-3.0.101-0.31.1.20140612nasa-obj/x86_64/nasa'
In file included from /usr/src/linux-3.0.101-0.31.1.20140612nasa/include/linux/netdevice.h:35,
from /usr/src/linux-3.0.101-0.31.1.20140612nasa/include/net/sock.h:51,
from /usr/src/packages/BUILD/lustre-2.4.3/libcfs/include/libcfs/linux/linux-tcpip.h:53,
from /usr/src/packages/BUILD/lustre-2.4.3/libcfs/include/libcfs/linux/libcfs.h:57,
from /usr/src/packages/BUILD/lustre-2.4.3/libcfs/include/libcfs/libcfs.h:48,
from /usr/src/packages/BUILD/lustre-2.4.3/libcfs/libcfs/linux/linux-tracefile.c:41:
/usr/src/ofa_kernel/nasa/include/linux/pm_qos_params.h:27: error: 'LINUX_BACKPORT' declared as function returning a function
cc1: warnings being treated as errors
/usr/src/ofa_kernel/nasa/include/linux/pm_qos_params.h:27: error: parameter names (without types) in function declaration
/usr/src/ofa_kernel/nasa/include/linux/pm_qos_params.h:29: error: 'LINUX_BACKPORT' declared as function returning a function
/usr/src/ofa_kernel/nasa/include/linux/pm_qos_params.h:29: error: parameter names (without types) in function declaration
/usr/src/ofa_kernel/nasa/include/linux/pm_qos_params.h:31: error: 'LINUX_BACKPORT' declared as function returning a function
/usr/src/ofa_kernel/nasa/include/linux/pm_qos_params.h:31: error: parameter names (without types) in function declaration
/usr/src/ofa_kernel/nasa/include/linux/pm_qos_params.h:34: error: 'LINUX_BACKPORT' declared as function returning a function
/usr/src/ofa_kernel/nasa/include/linux/pm_qos_params.h:34: error: parameter names (without types) in function declaration
/usr/src/ofa_kernel/nasa/include/linux/pm_qos_params.h:37: error: 'LINUX_BACKPORT' declared as function returning a function
/usr/src/ofa_kernel/nasa/include/linux/pm_qos_params.h:37: error: parameter names (without types) in function declaration
/usr/src/ofa_kernel/nasa/include/linux/pm_qos_params.h:39: error: 'LINUX_BACKPORT' declared as function returning a function
/usr/src/ofa_kernel/nasa/include/linux/pm_qos_params.h:39: error: parameter names (without types) in function declaration
In file included from /usr/src/linux-3.0.101-0.31.1.20140612nasa/include/net/sock.h:51,
from /usr/src/packages/BUILD/lustre-2.4.3/libcfs/include/libcfs/linux/linux-tcpip.h:53,
from /usr/src/packages/BUILD/lustre-2.4.3/libcfs/include/libcfs/linux/libcfs.h:57,
from /usr/src/packages/BUILD/lustre-2.4.3/libcfs/include/libcfs/libcfs.h:48,
from /usr/src/packages/BUILD/lustre-2.4.3/libcfs/libcfs/linux/linux-tracefile.c:41:
/usr/src/linux-3.0.101-0.31.1.20140612nasa/include/linux/netdevice.h:1064: error: field 'pm_qos_req' has incomplete type
make[10]: *** [/usr/src/packages/BUILD/lustre-2.4.3/libcfs/libcfs/linux/linux-tracefile.o] Error 1

The LU-5224 patch tried to address the compilation problem at o2iblnd.h. However, all C files that include <net/sock.h> would end up including <linux/pm_qos_params.h> and hit this problem. There are more files in the lustre code do that other than o2iblnd.c

I think the correct fix is to add the define
-DCOFIG_COMPAT_PM_QOS
to
EXTRA_LNET_INCLUDE="$EXTRA_LNET_INCLUDE -DCONFIG_COMPAT_SLES_11_$SP -DCONFIG_COMPAT_PM_QOS"
in config/lustre-build-linux.m4.

I am not sure if we need to qualify which sles11sp? kernel version? so that the fix would not break unaffected kernel versions. LU-5224 patch was not a correct fix.



 Comments   
Comment by Peter Jones [ 15/Aug/14 ]

Bob

Could you please advise?

Thanks

Peter

Comment by Bob Glossman (Inactive) [ 15/Aug/14 ]

Jay, I think you are quite right. It appears the previous fix from LU-5224 doesn't do the job. Your suggestion about adding -DCONFIG_COMPAT_PM_QOS looks like the right approach to me. More extensive changes are needed to get that back into b2_4 for sles11sp3 though. The autoconf file in b2_4 doesn't have any -DCONFIG_COMPAT flags in it at all. putting the change in master will be easier to start with.

Comment by Jay Lan (Inactive) [ 16/Aug/14 ]

Hi Bob,

Don't worry about b2_4 patch. I have my simple local patch that just works right in my situation without having to be concerned about autoconf stuff.

Comment by Bob Glossman (Inactive) [ 18/Aug/14 ]

Jay, I haven't been able to reproduce the build failure you describe in b2_5 or current master, I only see it in b2_4. I am using the latest sles11sp3 and MLNX_OFED_LINUX-2.2-1.0.1-sles11sp3-x86_64.tar. Since you say you are OK with us not pushing back the fix into b2_4 and it's not really needed for anything later I would like to close this as Won't Fix. Are you OK with that?

I can pursue getting it into b2_4 but it seems more trouble than it's worth.

Comment by Jay Lan (Inactive) [ 20/Aug/14 ]

Bob,

We have not decided to use Mellanox OFED or not, so it is not important to us whether you have a right fix to this problem or not. However, if Intel plans to support Mellanox OFED, you need to put in a right fix.

We plan to upgrade to 2.5.x from 2.4.3 in a few months. If we were to use Mellanox OFED with 2.5.x, we may hit this problem again. You understand the nature of this problem and I am sure you do not need a reproducer to come up with a correct fix.

It is a low priority to us. I can always carry my own simplified patch (ie, one without changes in autoconfig.)

Comment by Bob Glossman (Inactive) [ 20/Aug/14 ]

Closing this issue since the problem only happens with old lustre versions. We don't plan to go out of our way to support newer MLNX versions on older lustre release branches.

Generated at Sat Feb 10 01:52:00 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.