[LU-2662] Build: pthread build issue Created: 21/Jan/13 Updated: 15/Oct/13 Resolved: 24/Jul/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0 |
| Fix Version/s: | Lustre 2.4.0, Lustre 2.5.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Keith Mannthey (Inactive) | Assignee: | Minh Diep |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
a patch pushed via git. |
||
| Severity: | 3 |
| Rank (Obsolete): | 6212 |
| Description |
|
I have recreated the issue here: Basically it started when strlcpy was being used in lustre/utils/liblustreapi.c. For some fun reason we get ... gcc -g -Wall -fPIC -D_GNU_SOURCE -g -O2 -Werror -o ll_dirstripe_verify ll_dirstripe_verify.o -L../../lustre/utils -llustreapi ../../libcfs/libcfs/libcfs.a ../../libcfs/libcfs/libcfs.a(libcfs_a-user-prim.o): In function `cfs_create_thread': /home/build/lustre-release/libcfs/libcfs/user-prim.c:241: undefined reference to `pthread_create' ... If you drop the small change out the system builds fine. |
| Comments |
| Comment by Andreas Dilger [ 23/Jan/13 ] |
|
Keith, what is the original source of this problem? I don't see strlcpy() being used in liblustreapi.c anywhere. I can see the build failures in your patch, but we already use strlcpy() in a couple of places in the code without problems. The compile error in http://review.whamcloud.com/5142 is different than what is posted above: Making all in mpi ../../../lustre/utils/liblustreapi.so: undefined reference to `strlcpy' collect2: ld returned 1 exit status yet we do use strlcpy() successfully in cfs_get_environ() and class_set_global() for the jobid code. |
| Comment by Keith Mannthey (Inactive) [ 23/Jan/13 ] |
|
Sorry the build test was just an example of a change Bull is wanting. I had linked to build to this LU but I didn't check to see the fails were the same. I get the pthread issue in my local RHEL6.3 build environment without this strlcpy one. Bull sees this same issue as well, they are the ones who found it. Some complicated string handling code was submitted in a patch where strlcpy could have been used and it brought this issue out. The test patch is just an example of the usage Bull would like to do, it is not real code and it seems to break the build (but not on RHEL5). I wonder why my local build environment is different the the auto build server. |
| Comment by Keith Mannthey (Inactive) [ 24/Jan/13 ] |
|
From my local build: cc -g -Wall -fPIC -D_GNU_SOURCE -g -O2 -Werror -o ll_dirstripe_verify ll_dirstripe_verify.o -L../../lustre/utils -llustreapi ../../libcfs/libcfs/libcfs.a ../../libcfs/libcfs/libcfs.a(libcfs_a-user-prim.o): In function `cfs_create_thread': /home/build/lustre-release/libcfs/libcfs/user-prim.c:241: undefined reference to `pthread_create' collect2: ld returned 1 exit status We get the error when trying to build libcfs.a The autobuild may have failed because there is no generally usable strlcpy checking for strlcpy... no I don't have one either but it seems the environment is different. The non-glibc version of strlcpy is in user-prim.c I am not quite sure how RHEL5 worked without including libcfs/user_prim.h but I added it to the test build to see if the pthread will show itself in the build system. For reference adding include libcfs/user-prim.h did not change the pthread issue on my local build. I diffed a good build and a bad build and I really don't see anything that stands out. Same flags and same files are being used. I am going to look into this more tomorrow. |
| Comment by Keith Mannthey (Inactive) [ 25/Jan/13 ] |
|
The real build that fails http://build.whamcloud.com/job/lustre-reviews/12561/ The adding the libcfs/user-prim.h moved along our build system ... Making all in mpi ../../../lustre/utils/liblustreapi.so: undefined reference to `strlcpy' collect2: ld returned 1 exit status make[4]: *** [cascading_rw] Error 1 make[3]: *** [all-recursive] Error 1 make[2]: *** [all-recursive] Error 1 make[1]: *** [all-recursive] Error 1 make: *** [all] Error 2 error: Bad exit status from /var/tmp/rpm-tmp.AU4Ljc (%build) ... |
| Comment by Keith Mannthey (Inactive) [ 25/Jan/13 ] |
|
Ok so I know a little more. The non-glibc strlxxx code resides in libcfs_a-user-prim. It seems as are putting together liblustreapi we link in libcfs_a-user-prim.o (which requires pthread). It seems we maybe able to add -pthread to liblustreapi in this case. I continue to look into this. |
| Comment by Keith Mannthey (Inactive) [ 25/Jan/13 ] |
|
It seems we need add the pthread libs to the test directory makefile for a few files. http://review.whamcloud.com/5180 I am able to build locally with these changes. I am not sure how the build system will react. |
| Comment by Keith Mannthey (Inactive) [ 25/Jan/13 ] |
|
I resubmitted the build test http://review.whamcloud.com/5142 with a dependency on the makefile change. |
| Comment by Keith Mannthey (Inactive) [ 29/Jan/13 ] |
|
Current Status: http://review.whamcloud.com/5180 allows me to build local but does not allow the build system to work. lustre/tests/Makefile.am:1: `:='-style assignments are not portable lustre/tests/Makefile.am:81: `:='-style assignments are not portable is seen in the build system and my local setup. Investigating further. |
| Comment by Keith Mannthey (Inactive) [ 29/Jan/13 ] |
|
I submitted another set of changes. I converted the ':=' to "=". I don't get warnings on my local system with this but I am not sure it will help the build environment. |
| Comment by Keith Mannthey (Inactive) [ 31/Jan/13 ] |
|
Well local build is till working at build system is still failing on on the unknown strlcpy. Spent some more time today looking at this and I don't quite see the next thing to try. |
| Comment by Andreas Dilger [ 01/Feb/13 ] |
|
Keith, the ":=" warnings are spurious. Please start with http://review.whamcloud.com/3714 if you want to clean that up. That patch doesn't build for some reason, and I haven't had time to look into the details. I think the root of the problem is mostly unrelated to strlcpy() itself, but instead the cfs_create_thread() function (also declared in user-prim.h) is the cause of the pthread dependency, which is only defined if HAVE_LIBPTHREAD is true. This is causing the libcfs/libcfs/user-prim.c file to reference pthread functions, but they are completely unnecessary for liblustreapi. Somehow, HAVE_LIBPTHREAD is being enabled, and by virtue of some change having added user-prim.h to get access to strlcpy() it is also dragging in the cfs_create_thread() code. Not sure on the details, but hopefully this helps you. |
| Comment by Keith Mannthey (Inactive) [ 06/Feb/13 ] |
|
http://review.whamcloud.com/5180 allows me to build locally. Using in the strlxxx functions brings in the user-prim.h code that needs pthread support into liblustreapi. I added PTHREAD_LIBS in two spots. "$(LIBLUSTREAPI) $(PTHREAD_LIBS)" this allows me to build locally The confusing part is the build system (http://review.whamcloud.com/5142) still fails with name resolution for the strlxx function. Making all in mpi ../../../lustre/utils/liblustreapi.so: undefined reference to `strlcpy' collect2: ld returned 1 exit status make[4]: *** [cascading_rw] Error 1 Locally building and the build system are behaving differently. I am a little stuck with this. |
| Comment by Keith Mannthey (Inactive) [ 27/Feb/13 ] |
|
I am told I need to look into "lbuild". |
| Comment by Keith Mannthey (Inactive) [ 11/Mar/13 ] |
|
Peter mentioned you were the right person to look at this. |
| Comment by Keith Mannthey (Inactive) [ 30/Apr/13 ] |
|
Is there a outlook for this issue? |
| Comment by Minh Diep [ 01/May/13 ] |
|
I don't have the bandwidth to look into this right now. |
| Comment by Minh Diep [ 07/May/13 ] |
|
hmm...I built using lbuild and did not hit the failure on latest kernel. let me rebase the patch and try again |
| Comment by Keith Mannthey (Inactive) [ 07/May/13 ] |
|
Minh can you submit a job to the build system? That is where the issue has been seen, it was thought to be a general lbuild issue, but perhaps it is more specific to the build system. http://review.whamcloud.com/4154 From a month or so ago still fails to build: |
| Comment by Minh Diep [ 08/May/13 ] |
|
passed the build system: http://build.whamcloud.com/job/lustre-reviews/15362/ |
| Comment by Keith Mannthey (Inactive) [ 08/May/13 ] |
|
Well perhaps the build system has changed enough it now works with http://build.whamcloud.com/job/lustre-reviews/15362/ or without the patch? |
| Comment by Minh Diep [ 08/May/13 ] |
|
Andreas, Should we proceed to review and land this patch http://review.whamcloud.com/5180 ? |
| Comment by Andreas Dilger [ 08/May/13 ] |
|
If that patch isn't strictly needed to build, I would rather abandon it. It adds pthread libraries to tools that shouldn't otherwise need them, so if the problem is gone then I'm happy to abandon this patch. It looks to me like there was some inconsistency between HAVE_LIBPTHREAD being defined in one place and not another, so it isn't surprising that some build changes cleaned it up. |
| Comment by Keith Mannthey (Inactive) [ 10/May/13 ] |
|
I resubmitted the patch for |
| Comment by Keith Mannthey (Inactive) [ 10/May/13 ] |
|
The build from last night failed with the issue, some sort of a fix to Master is needed. ./libptlctl.a(debug.o): In function `jt_dbg_debug_kernel': /var/lib/jenkins/workspace/lustre-reviews/arch/i686/build_type/server/distro/el6/ib_stack/inkernel/BUILD/BUILD/lustre-2.3.65/lnet/utils/debug.c:605: undefined reference to `strlcpy' ../../libcfs/libcfs/libcfsutil.a(libcfsutil_a-parser.o): In function `Parser_help': /var/lib/jenkins/workspace/lustre-reviews/arch/i686/build_type/server/distro/el6/ib_stack/inkernel/BUILD/BUILD/lustre-2.3.65/libcfs/libcfs/util/parser.c:463: undefined reference to `strlcat' collect2: ld returned 1 exit status make[3]: *** [debugctl] Error 1 make[3]: *** Waiting for unfinished jobs.... ../../libcfs/libcfs/libcfs.a(libcfs_a-user-prim.o): In function `cfs_create_thread': /var/lib/jenkins/workspace/lustre-reviews/arch/i686/build_type/server/distro/el6/ib_stack/inkernel/BUILD/BUILD/lustre-2.3.65/libcfs/libcfs/user-prim.c:241: undefined reference to `pthread_create' |
| Comment by Minh Diep [ 10/May/13 ] |
|
Keith, are we working on this bug or |
| Comment by Keith Mannthey (Inactive) [ 13/May/13 ] |
|
This issue tracks the build issue with the Patch from 2074. I think the context is here for dealing with the build system issue and I would rather not close this issue until the patch for |
| Comment by Keith Mannthey (Inactive) [ 31/May/13 ] |
|
http://review.whamcloud.com/5180 is needed to Build the patch for |
| Comment by Keith Mannthey (Inactive) [ 29/Jun/13 ] |
|
Well that didn't work...
|
| Comment by Minh Diep [ 24/Jul/13 ] |
|
the patch for this ticket has landed on master and a solution has been provided for |