[LU-15571] iotrace debug mask causing interop testing failures Created: 20/Feb/22 Updated: 27/Mar/22 Resolved: 27/Mar/22 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.15.0 |
| Fix Version/s: | Lustre 2.15.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Andreas Dilger | Assignee: | Patrick Farrell |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||
| Description |
|
The addition of iotrace to the default debug mask is causing interop testing failures with new clients against older servers (eg. 2.14.0) for subtests that restore the debug mask at the end of the test. For example, sanity test_24v: onyx-61vm3: error: set_param: setting /sys/kernel/debug/lnet/debug=trace inode super iotrace malloc cache info ioctl neterror net warning buffs other dentry nettrace page dlmtrace error emerg ha rpctrace vfstrace reada mmap config console quota sec lfsck hsm snapshot layout: Invalid argument pdsh@onyx-61vm1: onyx-61vm3: ssh exited with exit code 22 and on the console logs it shows: cfs_str2mask()) unknown mask 'iotrace'. This is likely caused by the test-framework using the client debug mask (which contains iotrace by default) being used on all of the remote nodes. It probably is enough to filter out the "iotrace" string from the saved debug mask before using it on the remote node, if the server version is older than 2.14.57 (or whatever version the patch was included in). |
| Comments |
| Comment by Andreas Dilger [ 20/Feb/22 ] |
|
Tests that are affected include at least sanity 24v, 24A, 27U, but may be more. I'm not sure if there is a 2.15.0 release tracker that may show the other interop issues, but if so then this should be linked there. There do appear to be at least some sanity-flr interop failures that appear functional rather than just test issues. |
| Comment by Patrick Farrell [ 25/Feb/22 ] |
|
You said adding it to the default debug mask - I don't think we did that, did we? Just its existence in the debug mask at all, right? |
| Comment by Andreas Dilger [ 26/Feb/22 ] |
|
It may not be in the default debug mask, but iotrace is at least always enabled when these tests are run. The client saves the output only from "lctl get_param debug" and assumes it is the same across the clients and servers, then "restores" it to the older servers, where the error is hit. It looks like this is happening in: # wrappers for createmany and unlinkmany # to set debug=0 if number of creates is high enough # this is to speedup testing function createmany() { local count=${!#} (( count > 100 )) && { local saved_debug=$($LCTL get_param -n debug) local list=$(comma_list $(all_nodes)) do_nodes $list $LCTL set_param -n debug=0 } $LUSTRE/tests/createmany $* local rc=$? (( count > 100 )) && do_nodes $list "$LCTL set_param -n debug=\\\"$saved_debug\\\"" return $rc } I'll push a patch shortly. |
| Comment by Peter Jones [ 07/Mar/22 ] |
|
"Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/46636 |
| Comment by Gerrit Updater [ 27/Mar/22 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/46636/ |
| Comment by Peter Jones [ 27/Mar/22 ] |
|
Landed for 2.15 |