[LU-4470] replay-dual test_21b: FAIL: lustre-MDT0000: ffff88007f581100 CANNOT BE SET READONLY: rc = -95 Created: 10/Jan/14 Updated: 27/Jun/16 Resolved: 04/Feb/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.6.0, Lustre 2.5.1, Lustre 2.8.0 |
| Fix Version/s: | Lustre 2.6.0, Lustre 2.5.1 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Jian Yu | Assignee: | Bob Glossman (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | ssf | ||
| Environment: |
Lustre build: http://build.whamcloud.com/job/lustre-reviews/20841/ |
||
| Issue Links: |
|
||||||||||||||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||||||||||||||
| Rank (Obsolete): | 12241 | ||||||||||||||||||||||||||||||||
| Description |
|
replay-dual test 21b failed as follows: Started clients client-9vm1: CMD: client-9vm1 mount | grep /mnt/lustre' ' client-9vm3@tcp:/lustre on /mnt/lustre type lustre (rw,user_xattr,acl,flock) CMD: client-9vm1 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/mpi/gcc/openmpi/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/bin:/bin:/sbin:/usr/sbin::/sbin:/bin:/usr/sbin: NAME=autotest_config sh rpc.sh set_default_debug \"-1\" \" 0xffb7e3ff\" 32 replay-dual test_21b: @@@@@@ FAIL: The test cannot check whether COS works or not: all renames are replied w/o COS Maloo report: https://maloo.whamcloud.com/test_sets/e3e6ca96-7989-11e3-a27b-52540035b04c |
| Comments |
| Comment by Andreas Dilger [ 15/Jan/14 ] |
|
So far, this problem is only being seen on the most recent SLES11 test runs. |
| Comment by Bob Glossman (Inactive) [ 29/Jan/14 ] |
|
I very strongly suspect that this problem comes down to a build flaw in SLES build. Here's my reasoning: failed tests all show errors like the following in the test log: CMD: client-9vm3 /usr/sbin/lctl --device lustre-MDT0000 readonly client-9vm3: error: readonly: Operation not supported turning devices readonly is a critical part of the barriers used in test. with that feature not working it's not a surprise that many tests fail. Looking in mds dmesg log for an explanation errors like the following can be seen: 32702.560584] LustreError: 12374:0:(osd_handler.c:1267:osd_ro()) lustre-MDT0000: ffff88007f7b2100 CANNOT BE SET READONLY: rc = -95 Investigating the code in osd_ro() where the error comes from it can be seen that the error "CANNOT BE SET READONLY" can happen only if lustre is configured with #undef HAVE_DEV_SET_RDONLY. It's not clear why lustre is getting config'ed that way. autoconf should be setting #define HAVE_DEV_SET_RDONLY based on the fact the dev_set_rdonly() does really exist in the SLES kernels and autoconf should be figuring that out. When I do local builds of lustre on SLES, it does in fact configure correctly and put "#define HAVE_DEV_SET_RDONLY 1" correctly into lustre's config.h I speculate that there is some bad interaction between lbuild and lustre autoconf on SLES only. checking back in any recent jenkins build logs, the following can be seen: checking if Linux was built with symbol dev_set_rdonly exported... no configure: WARNING: kernel missing dev_set_rdonly patch for testing This confirms that lustre on SLES is indeed config'ing without HAVE_DEV_SET_RDONLY in jenkins builds. I suspect that the whole set of recent SLES only test bugs reported have this same underlying cause. |
| Comment by Bob Glossman (Inactive) [ 29/Jan/14 ] |
|
I think the build problem may come down to the way the autoconf function LB_CHECK_SYMBOL_EXPORT is defined in config/lustre-build-linux.m4 This checks in both the kernel source & symbol file for the relevant symbol. However it looks for the symbol file in $LINUX. Pretty sure it should be looking in $LINUX_OBJ. This is a don't care in centos builds where $LINUX and LINUX_OBJ are really the same. It is also a don't care in my local builds on SLES where they are also the same. On typical SLES builds they are different. I suspect a simple change from $LINUX to $LINUX_OBJ in that one autoconf function may fix things. Will create and push a patch to do that. If it passes builds in all the distros we use it may be all that is needed. |
| Comment by Bob Glossman (Inactive) [ 29/Jan/14 ] |
| Comment by Bob Glossman (Inactive) [ 29/Jan/14 ] |
|
build hasn't yet completed, but has already proceeded far enough that I can see checking if Linux was built with symbol dev_set_rdonly exported... yes in the build logs for sp2 and sp3. |
| Comment by Peter Jones [ 03/Feb/14 ] |
|
Landed for 2.6 |
| Comment by Jian Yu [ 17/Feb/14 ] |
|
Patch for Lustre b2_5 branch: http://review.whamcloud.com/9288 |
| Comment by Saurabh Tandan (Inactive) [ 03/Feb/16 ] |
|
Encountered same issue for tag 2.7.66 for FULL- EL7.1 Server/EL6.7 Client , master , build# 3314 Started lustre-MDT0000 stat: cannot read file system information for `/mnt/lustre': Input/output error replay-dual test_0a: @@@@@@ FAIL: test_0a failed with 1 |
| Comment by Andreas Dilger [ 04/Feb/16 ] |
|
Saurabh, please file a new bug and link it to this bug. It may be related, or it may just have the same symptoms, but better not to re-open this old bug which was marked fixed for 2.6.0, as it makes tracking the fixes much more difficult. |