[LU-15089] sanity-lnet test_230: expected number of tcp connections 18 Created: 12/Oct/21 Updated: 03/Nov/21 Resolved: 03/Nov/21 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.15.0 |
| Fix Version/s: | Lustre 2.15.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Maloo | Assignee: | Serguei Smirnov |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
This issue was created by maloo for Elena <elena.gryaznova@hpe.com> This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/9d97b8d9-9da1-4f76-a595-f827627ca5b9 test_230 failed with the following error: expected number of tcp connections 18 VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV |
| Comments |
| Comment by Andreas Dilger [ 21/Oct/21 ] |
|
Serguei, this looks like it is related to the conns_per_peer patch? |
| Comment by Andreas Dilger [ 21/Oct/21 ] |
|
jamesanunez, something is broken with sanity-lnet. This subtest is continually failing: test_230
Error: 'expected number of tcp connections 18'
Failure Rate: 35.00% of most recent 100 runs, 0 skipped (all branches)
(70/207 in the past 3 days) but the sanity-lnet test run is always being marked as "PASS": I suspect there is something missing at the end of sanity-lnet.sh that is supposed to check for the failure status? |
| Comment by Andreas Dilger [ 21/Oct/21 ] |
|
The test failure was clearly introduced by patch https://review.whamcloud.com/44417 " |
| Comment by Gerrit Updater [ 21/Oct/21 ] |
|
"Serguei Smirnov <ssmirnov@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/45331 |
| Comment by Andreas Dilger [ 22/Oct/21 ] |
|
It isn't obvious why sanity-lnet.sh is not reporting a failure when test_230() is failing. The script has mostly the same bits at the end as sanity.sh: complete $SECONDS cleanup_testsuite exit_status and the test logs clearly show that complete() is finding the FAIL in $TESTSUITELOG, and cleanup_testsuite() is run as seen by "Cleaning up LNet": == sanity-lnet test complete, duration 824 sec =========== 15:31:11 (1634830271) sanity-lnet: FAIL: test_230 expected number of tcp connections 26 Cleaning up LNet It then proceeds to format and mount a new filesystem (setupall()), and the last line in the log is from init_param_vars(), and then exit_status() returns "0" (PASS) instead of finding any error: CMD: trevis-5vm9 /usr/sbin/lctl set_param lod.*.mdt_hash=crush lod.lustre-MDT0000-mdtlov.mdt_hash=crush CMD: trevis-5vm9 /usr/sbin/lctl set_param -P *.*.lbug_on_grant_miscount=1 The only thing that I can think of is that $TESTSUITELOG is somehow unset/removed/truncated and the FAIL line is missed? However, I only see $TESTSUITELOG being set/removed in init_test_env() and that is only called at the start of the script. It might be worthwhile to print out a line in exit_status() to help debug this:
exit_status () {
local status=0
local logs="$TESTSUITELOG $1"
+ echo "== checking for failures in logs: '$logs' =="
for log in $logs; do
if [ -f "$log" ]; then
grep -qw FAIL $log && status=1
fi
done
+ echo "== exiting with status=$status =="
exit $status
}
|
| Comment by Chris Horn [ 22/Oct/21 ] |
|
I think we are deleting $TESTSUITELOG in sanity-lnet.sh:cleanup_testsuite(): cleanup_testsuite() {
trap "" EXIT
# Cleanup any tmp files created by the sub tests
rm -f $TMP/sanity-lnet*
cleanup_netns
cleanup_lnet
if $restore_mounts; then
setupall || error "Failed to setup Lustre after test execution"
elif $restore_modules; then
load_modules ||
error "Couldn't load modules after test execution"
fi
return 0
}
That rm -f $TMP/sanity-lnet* should probably be rm -f $TMP/sanity-lnet-*.yaml |
| Comment by Chris Horn [ 22/Oct/21 ] |
|
Opened https://jira.whamcloud.com/browse/LU-15150 for ^ |
| Comment by Gerrit Updater [ 03/Nov/21 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/45331/ |
| Comment by Peter Jones [ 03/Nov/21 ] |
|
Landed for 2.15 |