[LU-4656] Sanity MOUNTOPT configuration variable cannot be used for "-n" option Created: 20/Feb/14  Updated: 26/Aug/14  Resolved: 26/Aug/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.7.0

Type: Bug Priority: Minor
Reporter: Ryan Haasken Assignee: Minh Diep
Resolution: Fixed Votes: 0
Labels: patch

Severity: 3
Rank (Obsolete): 12743

 Description   

The MOUNTOPT configuration variable is used in sanity.sh under the assumption that it contains only "-o optlist" where optlist is a list of options like “user_xattr” and “flock”. However, we need the "-n" option because in our configuration, /etc is on a read-only file system.

When we add "-n" to the beginning of MOUNTOPT, like this:

MOUNTOPT=“-n -o user_xattr,acl,flock"}

any place which does an “echo $MOUNTOPT” will treat the “-n” as an option to echo. For example, in the function som_mode_switch in sanity.sh:

MOUNTOPT=`echo $MOUNTOPT | sed 's/som_preview//g'`

When we add “-n” to the end of the MOUNTOPT, som_mode_switch turns MOUNTOPT into an invalid string of options to the mount command.

This line

MOUNTOPT="$MOUNTOPT,som_preview"

sets MOUNTOPT to “-o user_xattr,acl,flock -n,som_preview”, which is invalid, so the subsequent mount fails, and all subsequent tests which rely on Lustre-specific functionality (e.g. lfs setstripe) fail.

Similarly, this line, which attempts to remove the “som_preview” option

MOUNTOPT=`echo $MOUNTOPT | sed 's/som_preview//g'`

sets MOUNTOPT to “-o user_xattr,acl,flock -n,” which is invalid as well, causing the mount to fail.

I can think of a few possible solutions:

1) Don’t ever do “echo $MOUNTOPT” and ensure that “-o optlist” is at the end of MOUNTOPT (document this restriction). Instead of “echo $MOUNTOPT | sed”, we could do “sed <<<$MOUNTOPT”. This is not very robust, but it would work.

2) Add a second configuration variable, MOUNT_EXTRA, which can contain the “-n” option and leave only “-o” options in MOUNTOPT. Document this restriction. Use both MOUNT_EXTRA and MOUNTOPT in the zconf_mount and zconf_mount_clients functions.

3) Implement functions which safely add and delete options to MOUNTOPT using regular expressions with sed and grep. This seems like overkill.



 Comments   
Comment by Ryan Haasken [ 20/Feb/14 ]

In the description, I said that if we put "-n" at the beginning of MOUNTOPT, any place which does an echo of $MOUNTOPT will treat the "-n" as an option to echo. This could be fixed by quoting $MOUNTOPT, as long as MOUNTOPT contains more than just the -n. That is,

MOUNTOPT=`echo "$MOUNTOPT" | sed 's/som_preview//g'`

This gives us option (4) to fix the problem. However, I think option (2) is the best. It makes sense to split up the "-o" options list from the other mount options. It would be error-prone to require that "-o optlist" be at the end of MOUNTOPT and require that other options like "-n" be at the beginning. Option (1) is bash-specific, and option (3) is overkill for this problem.

I will submit a patch implementing option (2).

Comment by Robert Read (Inactive) [ 20/Feb/14 ]

How about redefining MOUNTOPT to be just the list of mount options, and MOUNT_FLAGS as flags for mount? They would be used like this:

  mount $MOUNT_FLAGS -o $MOUNTOPT ....

To allow for case when MOUNTOPT is empty, this could be written as:

  mount $MOUNT_FLAGS ${MOUNTOPT:+-o $MOUNTOPT}
Comment by Ryan Haasken [ 20/Feb/14 ]

Robert, I did consider that option as well.

Here is my current patch without taking into account your recommendation: http://review.whamcloud.com/9332

I'm going to look into changing MOUNTOPT to not contain the "-o".

Comment by Ryan Haasken [ 20/Feb/14 ]

Robert, I have updated the patch at http://review.whamcloud.com/#/c/9332/ with your suggestion.

Comment by Ryan Haasken [ 24/Apr/14 ]

The patch at http://review.whamcloud.com/#/c/9332/ has been verified by Jenkins and Maloo, and it needs one more review. Can somebody please take a look?

Comment by Cliff White (Inactive) [ 28/May/14 ]

The new rebase seems to have failed in Maloo.

Comment by Ryan Haasken [ 29/May/14 ]

Maloo tests started failing after I had to rebase on this change: http://review.whamcloud.com/#/c/9668/

At first the failures were my fault for rebasing incorrectly, but now the patch seems correct to me, and the failures look unrelated.

Comment by Cliff White (Inactive) [ 03/Jun/14 ]

We have added some reviewers, sorry about the delay

Comment by Ryan Haasken [ 11/Jun/14 ]

I've figured out what is causing the tests to fail, and it is potentially related to my patch. It seems that whenever the file system is remounted, it is not mounted with the "-o user_xattr,flock" options, so any test which deals with xattr or flock without checking whether the mount supports it will fail.

These sanity.sh tests check for user_xattr and then attempt to use setfattr: test_102a, test_102h, test_102ha. These sanity tests check for flock and then attempt to use it: test_105d, test_105e. These tests are skipped since they check for the necessary functionality before attempting to use it.

sanity.sh test_235 uses flock without checking for flock support first, so it fails. sanityn.sh test_54, test_72, test_73, and test_74 use setfattr without checking whether the file system supports xattr, so those tests fail as well.

I am not sure whether the initial mount of the file system is done with "-o flock,user_xattr" since I'm not sure where the first mount happens in the test logs. However, I know that the file system is remounted during sanity.sh. The first time this happens is in test_77i, with a call to remount_client, which calls zconf_mount:

test_77i() { # bug 13805
...
    remount_client $MOUNT
}
remount_client()
{
        zconf_umount `hostname` $1 || error "umount failed"
        zconf_mount `hostname` $1 || error "mount failed"
}
zconf_mount() {
    local client=$1
    local mnt=$2
    local opts=${3:-$MOUNT_OPTS}
    opts=${opts:+-o $opts}
    local flags=${4:-$MOUNT_FLAGS}
...
    echo "Starting client: $client: $flags $opts $device $mnt"
    do_node $client mkdir -p $mnt
    do_node $client $MOUNT_CMD $flags $opts $device $mnt || return 1
...
}

Here is what is shown in sanity.suite_log.shadow-12vm10.log for that test:

Starting client: shadow-12vm10.shadow.whamcloud.com:   shadow-12vm12@tcp:/lustre /mnt/lustre
CMD: shadow-12vm10.shadow.whamcloud.com mkdir -p /mnt/lustre
CMD: shadow-12vm10.shadow.whamcloud.com mount -t lustre shadow-12vm12@tcp:/lustre /mnt/lustre

Thus, you can see that the mount options are missing from the mount command. However, the variable MOUNT_OPTS is set in lustre/tests/cfg/local.sh, so it should be used in zconf_mount:

MOUNT_OPTS=${MOUNT_OPTS:-"user_xattr,flock"}

I don't know why the MOUNT_OPTS from local.sh are not being used. Does anybody have any idea why that would be? Is Maloo using a different local.sh than the one I submitted for it to use? Or am I missing something else?

Comment by Ryan Haasken [ 11/Jun/14 ]

I've found that the autotest run uses a different config file than the local.sh which I modified. Here is the auster command from the lustre-initialization-1 suite stdout:

02:39:58:auster_cmd='auster -v -r -k   -D /logdir/test_logs/2014-05-24/lustre-reviews-el6-x86_64--review-dne-part-1--2_8_1__24105__-70192652927680-020106 -f autotest_config -g test-groups/review-dne-part-1 '

That "-f autotest_config" means that the test session will use a file names autotest_config.sh instead of local.sh. There are essential changes in my version of local.sh, namely, the MOUNTOPT variable was renamed to MOUNT_OPTS, and its format has changed. I am going to try adding Test-Parameters to my commit message to see if I can get Maloo to pass. However, once this change is ready to land, the procedure which generates autotest_config.sh will need to be updated to set the new MOUNT_OPTS variable instead of the old MOUNTOPT. It should also set MOUNT_FLAGS to be whatever mount flags are required on the testing system. Who will be able to coordinate that change?

Comment by Cliff White (Inactive) [ 11/Jun/14 ]

I have asked one of the test engineers to comment for you.

Comment by Ryan Haasken [ 11/Jun/14 ]

Thanks Cliff. It looks like this line in my commit message didn't work to get Maloo to pass testing:

Test-Parameters: envdefinitions=MOUNT_OPTS="user_xattr,flock"

I'm assuming that didn't work because there is a comma in the environment variable, and that's the same character which is supposed to delimit the list of envdefinitions. This caused lustre-initialization-1 to fail due to this improperly formed mount command:

14:39:30:Starting client: shadow-49vm2.shadow.whamcloud.com:  -o user_xattr
14:39:30:export flock shadow-49vm7@tcp:/lustre /mnt/lustre
14:39:30:CMD: shadow-49vm2.shadow.whamcloud.com mkdir -p /mnt/lustre
14:39:30:CMD: shadow-49vm2.shadow.whamcloud.com mount -t lustre -o user_xattr export flock shadow-49vm7@tcp:/lustre /mnt/lustre

It looks like it ended up setting MOUNT_OPTS="user_xattr \n export flock" or something like that.

Comment by Minh Diep [ 12/Jun/14 ]

Hi Ryan,

MOUNTOPT=${MOUNTOPT:-"-o user_xattr,flock"}

user_xattr,flock is already default so you don't need to set that.

Comment by Ryan Haasken [ 12/Jun/14 ]

Hi Minh,

Please note that this patch changes MOUNTOPT to MOUNT_OPTS, and removes the "-o" from the variable. I was expecting the new MOUNT_OPTS variable to be set to the default "user_xattr,flock" by the changes I made to lustre/tests/cfg/local.sh, but that doesn't seem to be happening. That is why I tried setting it in the Test-Parameters in the commit message, just to see if I could get Maloo to pass tests when MOUNT_OPTS is set correctly.

Comment by Minh Diep [ 13/Jun/14 ]

Hi Ryan,

I believe there's a bug in our test system. I will investigate and get back to you.

Comment by Ryan Haasken [ 13/Jun/14 ]

Minh, thank you. To be clear, I am concerned that my copy of cfg/local.sh might not b used by autotest_config.sh. I do not care about the Test-Parameters issue which prevents me from using commas inside an environment variable definition.

I did some more investigation on this issue, which may be helpful to you. On this Maloo result:

https://maloo.whamcloud.com/test_sessions/b10e5ee4-e366-11e3-93d9-52540035b04c

I clicked on the test set lustre-initialization-1, and then I clicked on the subtest lustre-initialization_1. Then, under Client 1, I clicked the autotest show link. Here is a link directly to that page:

https://maloo.whamcloud.com/test_logs/9077e934-e36b-11e3-93d9-52540035b04c/show_text

Here is what I found in the log:

02:39:46:# Start of copy of cfg/local.sh
02:39:46:FSNAME=lustre
02:39:46:
02:39:46:VERBOSE=true
...
02:39:46:MOUNTOPT=${MOUNTOPT:-"-o user_xattr,flock"}
...
02:39:46:# End of copy of cfg/local.sh

This is not the same as my copy of local.sh, which no longer contains the MOUNTOPT variable and now contains MOUNT_OPTS and MOUNT_FLAGS variables.

However, later in the log, I see this:

02:39:46:cat /usr/lib64/lustre/tests/cfg/local.sh
...
02:39:46:# Comma-separated option list used as "mount [...] -o $MOUNT_OPTS [...]"
02:39:46:MOUNT_OPTS=${MOUNT_OPTS:-"user_xattr,flock"}
02:39:46:# Mount flags (e.g. "-n") used as "mount [...] $MOUNT_FLAGS [...]"
02:39:46:MOUNT_FLAGS=${MOUNT_FLAGS:-""}
...

This is the correct local.sh file, but apparently the other one is used. Why is that happening?

Comment by Minh Diep [ 13/Jun/14 ]

Hi Ryan,

Thank you for taking the time to investigate. I have found the problem. Your local.sh should be used and you don't need to use Test-Parameters. I am verifying the fix in our lab now and will deploy soon in the next cycle.

Thanks again

Comment by Ryan Haasken [ 13/Jun/14 ]

Thank you, Minh! Please let me know on this ticket when the change will be ready, and then I will rebase the patch and remove Test-Parameters to verify that it works as expected.

Comment by Ryan Haasken [ 17/Jun/14 ]

Hi Minh, have you deployed the fix for the problem?

Comment by Minh Diep [ 21/Jun/14 ]

A patch already landed into our internal test repo. We are waiting for deploy. I will rebuild and restart your test. you might not need to rebase.

Comment by Ryan Haasken [ 23/Jun/14 ]

Great, thank you! Please keep me posted if there's anything I need to do.

Comment by Ryan Haasken [ 26/Jun/14 ]

Thanks for resolving the bug in the test system, Minh! The patch has been verified by Maloo. Can somebody please review the patch now so it can land?

Comment by Cory Spitz [ 29/Jul/14 ]

Now that b2_6 has split, can we get this landed to master?

Comment by Ryan Haasken [ 01/Aug/14 ]

The patch has landed. This ticket can be resolved.

Generated at Sat Feb 10 01:44:42 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.