[LU-12195] sanity/43 and sanityn/14 fail on local setups Created: 18/Apr/19  Updated: 04/Jun/20  Resolved: 30/Apr/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.13.0, Lustre 2.12.3

Type: Bug Priority: Minor
Reporter: Alex Zhuravlev Assignee: James A Simmons
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
is duplicated by LU-12170 sanity 43a may fail as it's not a binary Resolved
Related
is related to LU-11742 ERROR: RPATH is not allowed Resolved
is related to LU-12261 sanity test 43b fails with 'expected ... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

this is because the test rely on binaries running/copied from lustre/tests (like multiop).
but they aren't binary anymore - they are libtool-based scripts.
I have to rollback the change to enable local testing:

--- a/lustre/tests/Makefile.am
+++ b/lustre/tests/Makefile.am
@@ -1,6 +1,7 @@
 # Lustre test Makefile
 AM_CFLAGS := -fPIC -D_GNU_SOURCE \
             -D_LARGEFILE64_SOURCE=1 -D_FILE_OFFSET_BITS=64
+AM_LDFLAGS := -no-install


 Comments   
Comment by Andreas Dilger [ 19/Apr/19 ]

It seems like for sanity/43[ab] and sanityn/14[abcd] it would be possible to use "sleep" instead of "multiop" as the test program, so that we can ignore the libtool issues:

        cp $(which sleep) $DIR/$tdir/sleep
        $DIR/$tdir/sleep 1000 < $TMP/$tfile.junk &
        PID=$!
Comment by James A Simmons [ 19/Apr/19 ]

I thought some things might be missed. The issue is that packaging was broken with rpath issues which prevented some vendors from creating their own rpms.. So we had to remove the no-install flag. I create a work around to running the test local with LU-11742. Sadly this requirement to run test in the source tree makes things much more complicated compared to other projects.

Comment by Alex Zhuravlev [ 19/Apr/19 ]

what about adding no-install flag as an option with some configure option?
@Andreas, I'm fine to change that to sleep, hopefully this is the only remaining issue with libtool wrapper.

Comment by Alex Zhuravlev [ 19/Apr/19 ]

sanityn/14* seem to be duplicate of sanity/43*. would you mind if I remove sanityn/14* tests?

Comment by Gerrit Updater [ 19/Apr/19 ]

Alex Zhuravlev (bzzz@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34721
Subject: LU-12195 tests: use sleep instead of wrapped multiop
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 82c898be49c73d983e910a2e184d5d0f5c8d6222

Comment by Andreas Dilger [ 19/Apr/19 ]

Sanity is usually for testing cross-client operations, and as you probably saw, it is checking if the binary can be removed or truncated from a second mountpoint. 

Comment by Alex Zhuravlev [ 24/Apr/19 ]

sanity: FAIL: test_118c Multiop failed to block on fsync, pid=9583
sanity: FAIL: test_118d Multiop failed to block on fsync, pid=9876

for the same reason..

and I grepped sanity:


$ grep 'MULTIOP.*[&]' lustre/tests/sanity.sh|wc -l
23
{code]

still wondering would it make sense to disable wrapping with some configure option?

Comment by James A Simmons [ 24/Apr/19 ]

The point of the libtool wrappers around the applications is to work with installed and uninstalled libraries. So when the rpm is installed running libtool with the application will see its a properly installed binary and look in the normal paths for the applications libraries. If working in the sandbox lustre/test/* then the libtool wrapper will properly select the uninstalled libraries in lustre/test/*/.lib. This is nice in that this ensures that if you do test out of sand box that the sandbox version of library is used instead of the installed one. Yes I have used this trick to debug. The trade off is that your working with the wrapper and not the binary directly so running things like pkill on the wrapper will not work.

Normally projects that want do a sand box mode do a make DESTDIR which creates a kind of chroot environment to work in. Sadly this is not the case for Lustre which just wants to work directly in the source tree. So it makes it much more difficult to make this transparent. The source tree layout doesn't match the DESTDIR tree layout. The other option often done for sandbox mode is running ./configure --prefix=/path/to/sandbox and doing a make install. The issue still ends up being where do the binaries and libraries go. Again the source tree also doesn't match the final tree structure. The test framework would have to be reworked to setup the library paths and run ldconfig. Its also a change that the developer has to be aware of. Also we have to be careful with library collisions since a properly installed version could be installed as well. Without using the libtool wrapper that is not guaranteed the proper library would be used.

The last item is when wanting to run in sandbox mode is to run ./configure --disable-shared. This avoids the whole where are the libraries and applications problem. Now the issue is the libtool wrapper still exist so doing a song and dance would be required to make it work. Requiring using the --disable-shared is of course not so transparent to the developer so you have to ask do you want to go down that road. Perhaps in that case we make disabled-shared enabled by default and then in the rpm part pass in enable-shared to configure. Then of course what do we do when enable-shared is done in sand box which would break things. The challenge again is making the test suite work with both proper installed test suite and in sandbox transparently.

As you can see their is no easy solution so I picked the lesser of all evils. Using libtool execute mode. Just throwing things out their to think about what the options entail.

Comment by Alex Zhuravlev [ 24/Apr/19 ]

I'm fine with --disable-shared as 1) configure is not very frequent procedure 2) I almost never type arguments for configure using command history instead.
for me the issue is that I can not run sanity anymore. yes, I can disable few dozen tests, but I don't think this is a good idea.

Comment by Alex Zhuravlev [ 24/Apr/19 ]

--disable-shared (after make clean) did the trick and I got binaries back.

Comment by James A Simmons [ 25/Apr/19 ]

Alright. I will look into making this the default when not building rpms.

Comment by Alex Zhuravlev [ 25/Apr/19 ]

simmonsja thanks in advance, but to be clear - --disable-shared worked for me and this is pretty much enough, I have no problem adding this option to configure args. not that I'm against making this default, but I guess you've got more interesting problems

Comment by Gerrit Updater [ 30/Apr/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34721/
Subject: LU-12195 tests: use sleep instead of wrapped multiop
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 9a1f327a76f72c7713e53d8b354ff7f0e32be870

Comment by Peter Jones [ 30/Apr/19 ]

Landed for 2.13

Comment by Gerrit Updater [ 25/May/19 ]

James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/34955
Subject: LU-12195 tests: use sleep instead of wrapped multiop
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 2a1cdc610fc3146b9cc96f6b8bc1826d36d6d764

Comment by Gerrit Updater [ 28/Jun/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34955/
Subject: LU-12195 tests: use sleep instead of wrapped multiop
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 493ce6e49de3a9de82d5fe9fc8d7b5d9a78c68c2

Generated at Sat Feb 10 02:50:28 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.