[LU-10461] sanity 77c failure due to variables called in cleanup routine are out of scope Created: 05/Jan/18 Updated: 02/Apr/19 Resolved: 09/Apr/18 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.11.0, Lustre 2.10.2 |
| Fix Version/s: | Lustre 2.12.0, Lustre 2.10.7 |
| Type: | Bug | Priority: | Minor |
| Reporter: | James Nunez (Inactive) | Assignee: | James Nunez (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | tests | ||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||
| Description |
|
In sanity test 77c, we call cleanup_77c() at the end of the test to cleanup at the end of the test and a trap is set to call the clean up function when the test is exited. cleanup_77c() uses variables from test_77c which are declared local. From sanity.sh: 6375 cleanup_77c() {
6376 trap 0
6377 set_checksums 0
6378 $LCTL set_param osc.*osc-[^mM]*.checksum_dump=0
6379 $check_ost &&
6380 do_facet ost1 $LCTL set_param obdfilter.*-OST*.checksum_dump=0
6381 [ -n $osc_file_prefix ] && rm -f ${osc_file_prefix}*
6382 $check_ost && [ -n $ost_file_prefix ] &&
6383 do_facet ost1 rm -f ${ost_file_prefix}\*
6384 }
6385
6386 test_77c() {
6387 [ $PARALLEL == "yes" ] && skip "skip parallel run" && return
6388 $GSS && skip "could not run with gss" && return
6389
6390 local bad1
6391 local osc_file_prefix
6392 local osc_file
6393 local check_ost=false
6394 local ost_file_prefix
6395 local ost_file
6396 local orig_cksum
6397 local dump_cksum
6398 local fid
6399
…
6417 osc_file_prefix=$($LCTL get_param -n debug_path)
6418 osc_file_prefix=${osc_file_prefix}-checksum_dump-osc-\\${fid}
6419
6420 trap cleanup_77c EXIT
…
When the test exits normally and cleanup_77c() is called at the end of the test, there is no issue; the variables from the test are still in scope. There is a problem when we exit test_77c when we encounter an error or exit the test before calling cleanup_77c. In these cases, the trap specified in line 6420 is invoked and cleanup_77c() is called. Yet, the variables that are local to test_77c are no longer visible to the cleanup routine. One fix for this is to increase the scope of the variables, remove “local”, and this should fix the problem. What are other solutions? Logs for this test failing are at: I suspect there are more examples of this issue in our test suites. |
| Comments |
| Comment by Bruno Faccini (Inactive) [ 05/Jan/18 ] |
|
Hmm, I am sure to be far to know all bash tricky behaviors but I remember that for particular sanity/test_77c case I made some testing to verify the scope of local vs exit/trap context like following : [root@eagle-29 test_trap_with_local]# cat foo.sh
#!/bin/bash
footrap() {
trap 0
echo "foo=" $foo
}
fooerror() {
exit 1
}
foofunc() {
local foo
trap footrap EXIT
foo=foo
#exit
fooerror
}
foofunc
[root@eagle-29 test_trap_with_local]# bash foo.sh
foo= foo
[root@eagle-29 test_trap_with_local]#
|
| Comment by James Nunez (Inactive) [ 05/Jan/18 ] |
|
Bruno - I agree that your example works and the variable scope in the original test code worked. Patch https://review.whamcloud.com/#/c/30653/ is adding a "return 0" after the call to skip so we exit the test code early at: $check_ost || skip "No need to check cksum dump on OSS"
This changes the behavior of the test to: #!/usr/bin/bash
footrap() {
trap 0
echo "foo=" $foo
}
fooerror() {
exit 1
}
foofunc() {
local foo
trap footrap EXIT
foo=foo
# exit early
[ $foo == 'foo' ] && return 0
fooerror
}
foofunc
Perhaps I should word the description/title of this ticket differently. |
| Comment by Bruno Faccini (Inactive) [ 06/Jan/18 ] |
|
Ok, I better understand now how the local can become out-of-scope! And may be I have also been wrong originally to make the assumption that skip() wad ending with an "exit 0", like is error() doing with an "exit 1". So, may be we should not "return 0" after calling skip(), as done now in https://review.whamcloud.com/#/c/[30402,30653]/ but instead "exit 0" !! What do you think? |
| Comment by Andreas Dilger [ 10/Jan/18 ] |
|
Quentin and I discussed this same issue in patch https://review.whamcloud.com/29653 " I'm not sure if stack_trap itself avoids the local issue, or that by using stack_trap we can just avoid having complex cleanup functions that reference local variables. |
| Comment by Quentin Bouget [ 10/Jan/18 ] |
|
James is right in that on "normal exit" (with return, rather than exit), local variables are wiped before the trap is run (I did not know until now). I can see two ways out of this:
For example in test_77c, that would give something like : test_77c() {
[...]
stack_trap "set_checksums 0" EXIT
set_checksums 1
[...]
stack_trap "$LCTL set_param osc.*osc-[^mM]*.checksum_dump=0" EXIT
$LCTL set_param osc.*osc-[^mM]*.checksum_dump=1
[...]
if $check_ost; then
stack_trap "do_facet ost1 \
$LCTL set_param obdfilter.*-OST*.checkum_dump=0" EXIT
do_facet ost1 $LCTL set_param obdfilter.*-OST*.checksum_dump=1
fi
[...]
# I am not really sure where the last two traps should go so this needs checking
[ -n "$osc_file_prefix" ] && stack_trap "rm -f '$osc_file_prefix'*" EXIT
dd if=$DIR/$tfile of=/dev/null bs=1M || error "dd read error: $?"
[...]
$check_ost || skip "No need to check cksum dump on OSS"
[ -n "$ost_file_prefix" ] &&
stack_trap "do_facet ost1 \"rm -f '$ost_file_prefix'*\"" EXIT
}
Note that until this patch lands, using single quotes inside traps does not work as intended for stack_trap(). |
| Comment by Bruno Faccini (Inactive) [ 10/Jan/18 ] |
|
Andreas, since Quentin is also ok with my previous fix idea to "exit 0" in/after skip(), may be we can just go for it as it looks pretty simple. What do you think? |
| Comment by Quentin Bouget [ 10/Jan/18 ] |
|
For the record, why isn't there a skip_noexit() and why isn't skip() exiting the test itself? |
| Comment by James Nunez (Inactive) [ 10/Jan/18 ] |
|
I agree that we are being inconsistent with error() calling exit and skip() not calling exit. Let me look into adding exit to skip and if we need a skip_noexit(). Thank you all for the comments on this issue. |
| Comment by Gerrit Updater [ 21/Jan/18 ] |
|
James Nunez (james.a.nunez@intel.com) uploaded a new patch: https://review.whamcloud.com/30964 |
| Comment by Gerrit Updater [ 09/Apr/18 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30964/ |
| Comment by Peter Jones [ 09/Apr/18 ] |
|
Landed for 2.12 |
| Comment by Gerrit Updater [ 10/Apr/18 ] |
|
James Nunez (james.a.nunez@intel.com) uploaded a new patch: https://review.whamcloud.com/31942 |
| Comment by Gerrit Updater [ 11/Jan/19 ] |
|
James Nunez (jnunez@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34015 |
| Comment by Gerrit Updater [ 14/Jan/19 ] |
|
James Nunez (jnunez@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34023 |
| Comment by Gerrit Updater [ 19/Jan/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34023/ |