Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4116

replay-dual test_18: Correct error message search

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.6.0
    • Lustre 2.5.0
    • None
    • 3
    • 11102

    Description

      In a patch cleaning up calls to mkdir in Lustre tests (http://review.whamcloud.com/#/c/5022), Andreas requested a patch be made to replay-dual test 18. Commenting on the test 18 code:

      	dmesg | grep "entering recovery in server" &&
      		error "client not evicted" || true
      

      the request:

      This error message as written doesn't exist in the Lustre code anywhere. Checking back in b1_8 it comes from ldlm_expired_completion_wait(), and I see that this string does exist in master, but is split across multiple lines... The other problem is that it is LDLM_DEBUG() now instead of LDLM_ERROR(), since it was changed in http://review.whamcloud.com/2201 (commit 57373a29) "Quiet/cleanup various common console message".

      The string "not entering recovery" is visible in all releases, but is still in LDLM_DEBUG() since 2.3.59. If this test enabled D_DLMTRACE at the start, it could consistently find this in the MDS debug log. I also observe that this test is only checking the local console log instead of the MDS console log, so it has probably been broken for multi-node testing for a long time (though I've never seen it in my local node testing either). In a separate patch, could you please fix this to be:

      local DLMTRACE=$(do_facet $SINGLEMDS lctl get_param debug)
      do_facet $SINGLEMDS lctl set_param debug=+dlmtrace
      mkdir $MOUNT1/$tdir ...
      :
      :
      wait $OPENPID
      do_facet $SINGLEMDS lctl debug_kernel |
      grep "not entering recovery" && error "client not evicted"

      Attachments

        Issue Links

          Activity

            [LU-4116] replay-dual test_18: Correct error message search
            pjones Peter Jones made changes -
            Link Original: This issue is related to LDEV-14 [ LDEV-14 ]
            jamesanunez James Nunez (Inactive) made changes -
            Link New: This issue is related to LU-6652 [ LU-6652 ]
            yujian Jian Yu made changes -
            Link New: This issue is related to LDEV-14 [ LDEV-14 ]
            jamesanunez James Nunez (Inactive) made changes -
            Fix Version/s New: Lustre 2.6.0 [ 10595 ]
            Resolution New: Fixed [ 1 ]
            Status Original: Open [ 1 ] New: Resolved [ 5 ]

            Landed to master

            jamesanunez James Nunez (Inactive) added a comment - Landed to master
            jamesanunez James Nunez (Inactive) added a comment - The proposed patch is at http://review.whamcloud.com/#/c/8129/
            adilger Andreas Dilger made changes -
            Description Original: In a patch cleaning up calls to mkdir in Lustre tests (http://review.whamcloud.com/#/c/5022), Andreas requested a patch be made to replay-dual test 18. Commenting on the test 18 code:
            {noformat}
            dmesg | grep "entering recovery in server" &&
            error "client not evicted" || true
            {noformat}

            the request:
            {noformat}
            This error message as written doesn't exist in the Lustre code anywhere. Checking back in b1_8 it comes from ldlm_expired_completion_wait(), and I see that this string does exist in master, but is split across multiple lines... The other problem is that it is LDLM_DEBUG() now instead of LDLM_ERROR(), since it was changed in http://review.whamcloud.com/2201 (commit 57373a29) "Quiet/cleanup various common console message".

            The string "not entering recovery" is visible in all releases, but is still in LDLM_DEBUG() since 2.3.59. If this test enabled D_DLMTRACE at the start, it could consistently find this in the MDS debug log. I also observe that this test is only checking the local console log instead of the MDS console log, so it has probably been broken for multi-node testing for a long time (though I've never seen it in my local node testing either). In a separate patch, could you please fix this to be:

            local DLMTRACE=$(do_facet $SINGLEMDS lctl get_param debug)
            do_facet $SINGLEMDS lctl set_param debug=+dlmtrace
            mkdir $MOUNT1/$tdir ...
            :
            :
            wait $OPENPID
            do_facet $SINGLEMDS lctl debug_kernel |
            grep "not entering recovery" && error "client not evicted"
            {noformat}

            New: In a patch cleaning up calls to mkdir in Lustre tests (http://review.whamcloud.com/#/c/5022), Andreas requested a patch be made to replay-dual test 18. Commenting on the test 18 code:
            {noformat}
            dmesg | grep "entering recovery in server" &&
            error "client not evicted" || true
            {noformat}

            the request:
            {quote}
            This error message as written doesn't exist in the Lustre code anywhere. Checking back in b1_8 it comes from ldlm_expired_completion_wait(), and I see that this string does exist in master, but is split across multiple lines... The other problem is that it is LDLM_DEBUG() now instead of LDLM_ERROR(), since it was changed in http://review.whamcloud.com/2201 (commit 57373a29) "Quiet/cleanup various common console message".

            The string "not entering recovery" is visible in all releases, but is still in LDLM_DEBUG() since 2.3.59. If this test enabled D_DLMTRACE at the start, it could consistently find this in the MDS debug log. I also observe that this test is only checking the local console log instead of the MDS console log, so it has probably been broken for multi-node testing for a long time (though I've never seen it in my local node testing either). In a separate patch, could you please fix this to be:

            local DLMTRACE=$(do_facet $SINGLEMDS lctl get_param debug)
            do_facet $SINGLEMDS lctl set_param debug=+dlmtrace
            mkdir $MOUNT1/$tdir ...
            :
            :
            wait $OPENPID
            do_facet $SINGLEMDS lctl debug_kernel |
            grep "not entering recovery" && error "client not evicted"
            {quote}

            jamesanunez James Nunez (Inactive) created issue -

            People

              jamesanunez James Nunez (Inactive)
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: