[LU-2662] Build: pthread build issue Created: 21/Jan/13  Updated: 15/Oct/13  Resolved: 24/Jul/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: Lustre 2.4.0, Lustre 2.5.0

Type: Bug Priority: Major
Reporter: Keith Mannthey (Inactive) Assignee: Minh Diep
Resolution: Fixed Votes: 0
Labels: None
Environment:

a patch pushed via git.


Severity: 3
Rank (Obsolete): 6212

 Description   

I have recreated the issue here:
http://review.whamcloud.com/5142

Basically it started when strlcpy was being used in lustre/utils/liblustreapi.c.

For some fun reason we get

...
gcc -g -Wall -fPIC -D_GNU_SOURCE -g -O2 -Werror   -o ll_dirstripe_verify ll_dirstripe_verify.o -L../../lustre/utils -llustreapi ../../libcfs/libcfs/libcfs.a 
../../libcfs/libcfs/libcfs.a(libcfs_a-user-prim.o): In function `cfs_create_thread':
/home/build/lustre-release/libcfs/libcfs/user-prim.c:241: undefined reference to `pthread_create'
...

If you drop the small change out the system builds fine.



 Comments   
Comment by Andreas Dilger [ 23/Jan/13 ]

Keith, what is the original source of this problem? I don't see strlcpy() being used in liblustreapi.c anywhere.

I can see the build failures in your patch, but we already use strlcpy() in a couple of places in the code without problems. The compile error in http://review.whamcloud.com/5142 is different than what is posted above:

Making all in mpi
../../../lustre/utils/liblustreapi.so: undefined reference to `strlcpy'
collect2: ld returned 1 exit status

yet we do use strlcpy() successfully in cfs_get_environ() and class_set_global() for the jobid code.

Comment by Keith Mannthey (Inactive) [ 23/Jan/13 ]

Sorry the build test was just an example of a change Bull is wanting. I had linked to build to this LU but I didn't check to see the fails were the same.

I get the pthread issue in my local RHEL6.3 build environment without this strlcpy one. Bull sees this same issue as well, they are the ones who found it. Some complicated string handling code was submitted in a patch where strlcpy could have been used and it brought this issue out.

The test patch is just an example of the usage Bull would like to do, it is not real code and it seems to break the build (but not on RHEL5).

I wonder why my local build environment is different the the auto build server.

Comment by Keith Mannthey (Inactive) [ 24/Jan/13 ]

From my local build:

cc -g -Wall -fPIC -D_GNU_SOURCE -g -O2 -Werror   -o ll_dirstripe_verify ll_dirstripe_verify.o -L../../lustre/utils -llustreapi ../../libcfs/libcfs/libcfs.a 
../../libcfs/libcfs/libcfs.a(libcfs_a-user-prim.o): In function `cfs_create_thread':
/home/build/lustre-release/libcfs/libcfs/user-prim.c:241: undefined reference to `pthread_create'
collect2: ld returned 1 exit status

We get the error when trying to build libcfs.a

The autobuild may have failed because there is no generally usable strlcpy

checking for strlcpy... no

I don't have one either but it seems the environment is different.

The non-glibc version of strlcpy is in user-prim.c I am not quite sure how RHEL5 worked without including libcfs/user_prim.h but I added it to the test build to see if the pthread will show itself in the build system.

For reference adding include libcfs/user-prim.h did not change the pthread issue on my local build.

I diffed a good build and a bad build and I really don't see anything that stands out. Same flags and same files are being used.

I am going to look into this more tomorrow.

Comment by Keith Mannthey (Inactive) [ 25/Jan/13 ]

The real build that fails http://build.whamcloud.com/job/lustre-reviews/12561/

The adding the libcfs/user-prim.h moved along our build system

...
Making all in mpi
../../../lustre/utils/liblustreapi.so: undefined reference to `strlcpy'
collect2: ld returned 1 exit status
make[4]: *** [cascading_rw] Error 1
make[3]: *** [all-recursive] Error 1
make[2]: *** [all-recursive] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [all] Error 2
error: Bad exit status from /var/tmp/rpm-tmp.AU4Ljc (%build)
...
Comment by Keith Mannthey (Inactive) [ 25/Jan/13 ]

Ok so I know a little more.

The non-glibc strlxxx code resides in libcfs_a-user-prim. It seems as are putting together liblustreapi we link in libcfs_a-user-prim.o (which requires pthread).

It seems we maybe able to add -pthread to liblustreapi in this case. I continue to look into this.

Comment by Keith Mannthey (Inactive) [ 25/Jan/13 ]

It seems we need add the pthread libs to the test directory makefile for a few files.

http://review.whamcloud.com/5180

I am able to build locally with these changes. I am not sure how the build system will react.

Comment by Keith Mannthey (Inactive) [ 25/Jan/13 ]

I resubmitted the build test http://review.whamcloud.com/5142 with a dependency on the makefile change.

Comment by Keith Mannthey (Inactive) [ 29/Jan/13 ]

Current Status:

http://review.whamcloud.com/5180 allows me to build local but does not allow the build system to work.

lustre/tests/Makefile.am:1: `:='-style assignments are not portable
lustre/tests/Makefile.am:81: `:='-style assignments are not portable

is seen in the build system and my local setup. Investigating further.

Comment by Keith Mannthey (Inactive) [ 29/Jan/13 ]

I submitted another set of changes. I converted the ':=' to "=". I don't get warnings on my local system with this but I am not sure it will help the build environment.

Comment by Keith Mannthey (Inactive) [ 31/Jan/13 ]

Well local build is till working at build system is still failing on on the unknown strlcpy. Spent some more time today looking at this and I don't quite see the next thing to try.

Comment by Andreas Dilger [ 01/Feb/13 ]

Keith, the ":=" warnings are spurious. Please start with http://review.whamcloud.com/3714 if you want to clean that up. That patch doesn't build for some reason, and I haven't had time to look into the details.

I think the root of the problem is mostly unrelated to strlcpy() itself, but instead the cfs_create_thread() function (also declared in user-prim.h) is the cause of the pthread dependency, which is only defined if HAVE_LIBPTHREAD is true. This is causing the libcfs/libcfs/user-prim.c file to reference pthread functions, but they are completely unnecessary for liblustreapi.

Somehow, HAVE_LIBPTHREAD is being enabled, and by virtue of some change having added user-prim.h to get access to strlcpy() it is also dragging in the cfs_create_thread() code. Not sure on the details, but hopefully this helps you.

Comment by Keith Mannthey (Inactive) [ 06/Feb/13 ]

http://review.whamcloud.com/5180 allows me to build locally.

Using in the strlxxx functions brings in the user-prim.h code that needs pthread support into liblustreapi. I added PTHREAD_LIBS in two spots. "$(LIBLUSTREAPI) $(PTHREAD_LIBS)" this allows me to build locally

The confusing part is the build system (http://review.whamcloud.com/5142) still fails with name resolution for the strlxx function.

Making all in mpi
../../../lustre/utils/liblustreapi.so: undefined reference to `strlcpy'
collect2: ld returned 1 exit status
make[4]: *** [cascading_rw] Error 1

Locally building and the build system are behaving differently. I am a little stuck with this.

Comment by Keith Mannthey (Inactive) [ 27/Feb/13 ]

I am told I need to look into "lbuild".

Comment by Keith Mannthey (Inactive) [ 11/Mar/13 ]

Peter mentioned you were the right person to look at this.

Comment by Keith Mannthey (Inactive) [ 30/Apr/13 ]

Is there a outlook for this issue?

Comment by Minh Diep [ 01/May/13 ]

I don't have the bandwidth to look into this right now.

Comment by Minh Diep [ 07/May/13 ]

hmm...I built using lbuild and did not hit the failure on latest kernel. let me rebase the patch and try again

Comment by Keith Mannthey (Inactive) [ 07/May/13 ]

Minh can you submit a job to the build system? That is where the issue has been seen, it was thought to be a general lbuild issue, but perhaps it is more specific to the build system.

http://review.whamcloud.com/4154

From a month or so ago still fails to build:
http://build.whamcloud.com/job/lustre-reviews/14514/

Comment by Minh Diep [ 08/May/13 ]

passed the build system: http://build.whamcloud.com/job/lustre-reviews/15362/

Comment by Keith Mannthey (Inactive) [ 08/May/13 ]

Well perhaps the build system has changed enough it now works with http://build.whamcloud.com/job/lustre-reviews/15362/ or without the patch?

Comment by Minh Diep [ 08/May/13 ]

Andreas,

Should we proceed to review and land this patch http://review.whamcloud.com/5180 ?

Comment by Andreas Dilger [ 08/May/13 ]

If that patch isn't strictly needed to build, I would rather abandon it. It adds pthread libraries to tools that shouldn't otherwise need them, so if the problem is gone then I'm happy to abandon this patch. It looks to me like there was some inconsistency between HAVE_LIBPTHREAD being defined in one place and not another, so it isn't surprising that some build changes cleaned it up.

Comment by Keith Mannthey (Inactive) [ 10/May/13 ]

I resubmitted the patch for LU-2074 to see if the issue without the patch still occurs.

http://review.whamcloud.com/4154

Comment by Keith Mannthey (Inactive) [ 10/May/13 ]

The build from last night failed with the issue, some sort of a fix to Master is needed.

./libptlctl.a(debug.o): In function `jt_dbg_debug_kernel':
/var/lib/jenkins/workspace/lustre-reviews/arch/i686/build_type/server/distro/el6/ib_stack/inkernel/BUILD/BUILD/lustre-2.3.65/lnet/utils/debug.c:605: undefined reference to `strlcpy'
../../libcfs/libcfs/libcfsutil.a(libcfsutil_a-parser.o): In function `Parser_help':
/var/lib/jenkins/workspace/lustre-reviews/arch/i686/build_type/server/distro/el6/ib_stack/inkernel/BUILD/BUILD/lustre-2.3.65/libcfs/libcfs/util/parser.c:463: undefined reference to `strlcat'
collect2: ld returned 1 exit status
make[3]: *** [debugctl] Error 1
make[3]: *** Waiting for unfinished jobs....
../../libcfs/libcfs/libcfs.a(libcfs_a-user-prim.o): In function `cfs_create_thread':
/var/lib/jenkins/workspace/lustre-reviews/arch/i686/build_type/server/distro/el6/ib_stack/inkernel/BUILD/BUILD/lustre-2.3.65/libcfs/libcfs/user-prim.c:241: undefined reference to `pthread_create'
Comment by Minh Diep [ 10/May/13 ]

Keith, are we working on this bug or LU-2074. Can we close this one and focus on the other?

Comment by Keith Mannthey (Inactive) [ 13/May/13 ]

This issue tracks the build issue with the Patch from 2074. I think the context is here for dealing with the build system issue and I would rather not close this issue until the patch for LU-2074 can build.

Comment by Keith Mannthey (Inactive) [ 31/May/13 ]

http://review.whamcloud.com/5180 is needed to Build the patch for LU-2074.

Comment by Keith Mannthey (Inactive) [ 29/Jun/13 ]

Well that didn't work...

LU-2074 was submitted right after this patch was merged but it didn't fix lbuild. http://build.whamcloud.com/job/lustre-reviews/16287/

Comment by Minh Diep [ 24/Jul/13 ]

the patch for this ticket has landed on master and a solution has been provided for LU-2074. This is no longe a blocker for that. closing

Generated at Sat Feb 10 01:27:07 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.