Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15571

iotrace debug mask causing interop testing failures

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.15.0
    • Lustre 2.15.0
    • None
    • 3
    • 9223372036854775807

    Description

      The addition of iotrace to the default debug mask is causing interop testing failures with new clients against older servers (eg. 2.14.0) for subtests that restore the debug mask at the end of the test. For example, sanity test_24v:
      https://testing.whamcloud.com/test_sets/96c360e4-cbca-46b4-8a8e-0371f9a8f4b4

      onyx-61vm3: error: set_param: setting /sys/kernel/debug/lnet/debug=trace inode super iotrace malloc cache info ioctl neterror net warning buffs other dentry nettrace page dlmtrace error emerg ha rpctrace vfstrace reada mmap config console quota sec lfsck hsm snapshot layout: Invalid argument
      pdsh@onyx-61vm1: onyx-61vm3: ssh exited with exit code 22
      

      and on the console logs it shows:

      cfs_str2mask()) unknown mask 'iotrace'.
      

      This is likely caused by the test-framework using the client debug mask (which contains iotrace by default) being used on all of the remote nodes.

      It probably is enough to filter out the "iotrace" string from the saved debug mask before using it on the remote node, if the server version is older than 2.14.57 (or whatever version the patch was included in).

      Attachments

        Issue Links

          Activity

            [LU-15571] iotrace debug mask causing interop testing failures
            pjones Peter Jones added a comment -

            Landed for 2.15

            pjones Peter Jones added a comment - Landed for 2.15

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/46636/
            Subject: LU-15571 tests: save/restore debug mask for interop
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: f236119e6e264b00c20533336303f694d9cfe766

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/46636/ Subject: LU-15571 tests: save/restore debug mask for interop Project: fs/lustre-release Branch: master Current Patch Set: Commit: f236119e6e264b00c20533336303f694d9cfe766
            pjones Peter Jones added a comment - - edited

            "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/46636
            Subject: LU-15571 tests: save/restore debug mask for interop
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 2
            Commit: 183e00aa0c2892d977f45cda67d6f3352de3429e

            pjones Peter Jones added a comment - - edited "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/46636 Subject: LU-15571 tests: save/restore debug mask for interop Project: fs/lustre-release Branch: master Current Patch Set: 2 Commit: 183e00aa0c2892d977f45cda67d6f3352de3429e

            It may not be in the default debug mask, but iotrace is at least always enabled when these tests are run. The client saves the output only from "lctl get_param debug" and assumes it is the same across the clients and servers, then "restores" it to the older servers, where the error is hit. It looks like this is happening in:

            # wrappers for createmany and unlinkmany
            # to set debug=0 if number of creates is high enough
            # this is to speedup testing
            function createmany() {
                    local count=${!#}
            
                    (( count > 100 )) && {
                            local saved_debug=$($LCTL get_param -n debug) 
                            local list=$(comma_list $(all_nodes))
            
                            do_nodes $list $LCTL set_param -n debug=0
                    }
                    $LUSTRE/tests/createmany $*
                    local rc=$?
                    (( count > 100 )) &&
                            do_nodes $list "$LCTL set_param -n debug=\\\"$saved_debug\\\""
                    return $rc
            }
            

            I'll push a patch shortly.

            adilger Andreas Dilger added a comment - It may not be in the default debug mask, but iotrace is at least always enabled when these tests are run. The client saves the output only from " lctl get_param debug " and assumes it is the same across the clients and servers, then "restores" it to the older servers, where the error is hit. It looks like this is happening in: # wrappers for createmany and unlinkmany # to set debug=0 if number of creates is high enough # this is to speedup testing function createmany() { local count=${!#} (( count > 100 )) && { local saved_debug=$($LCTL get_param -n debug) local list=$(comma_list $(all_nodes)) do_nodes $list $LCTL set_param -n debug=0 } $LUSTRE/tests/createmany $* local rc=$? (( count > 100 )) && do_nodes $list "$LCTL set_param -n debug=\\\" $saved_debug\\\"" return $rc } I'll push a patch shortly.

            You said adding it to the default debug mask - I don't think we did that, did we?  Just its existence in the debug mask at all, right?

            paf0186 Patrick Farrell added a comment - You said adding it to the default debug mask - I don't think we did that , did we?  Just its existence in the debug mask at all, right?

            Tests that are affected include at least sanity 24v, 24A, 27U, but may be more.

            I'm not sure if there is a 2.15.0 release tracker that may show the other interop issues, but if so then this should be linked there. There do appear to be at least some sanity-flr interop failures that appear functional rather than just test issues.

            adilger Andreas Dilger added a comment - Tests that are affected include at least sanity 24v, 24A, 27U, but may be more. I'm not sure if there is a 2.15.0 release tracker that may show the other interop issues, but if so then this should be linked there. There do appear to be at least some sanity-flr interop failures that appear functional rather than just test issues.

            People

              paf0186 Patrick Farrell
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: