Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4341

Failure on test suite sanity test_170: expected 31 bad lines, but got 34

Details

    • Bug
    • Resolution: Unresolved
    • Blocker
    • None
    • Lustre 2.5.0, Lustre 2.6.0, Lustre 2.5.1, Lustre 2.7.0, Lustre 2.5.3, Lustre 2.8.0
    • server and client: lustre-master build # 1784
      client is running SLES11 SP3
    • 3
    • 11880

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/7756e5f2-5bb9-11e3-8d79-52540035b04c.

      The sub-test test_170 failed with the following error:

      expected 31 bad lines, but got 34

      == sanity test 170: test lctl df to handle corrupted log ============================================= 00:50:22 (1385974222)
       sanity test_170: @@@@@@ FAIL: expected 31 bad lines, but got 34 
      

      Attachments

        Issue Links

          Activity

            [LU-4341] Failure on test suite sanity test_170: expected 31 bad lines, but got 34
            sarah Sarah Liu added a comment - another instance https://testing.hpdd.intel.com/test_sets/834ace12-d75c-11e4-a678-5254006e85c2
            bogl Bob Glossman (Inactive) added a comment - another seen in master: https://testing.hpdd.intel.com/test_sets/83277a58-d3cd-11e4-8c98-5254006e85c2
            sarah Sarah Liu added a comment - hit this error in tag-2.6.94 test: https://testing.hpdd.intel.com/test_sets/c53f0196-b22a-11e4-af8e-5254006e85c2
            bogl Bob Glossman (Inactive) added a comment - seen in master with sles11sp3 client/server: https://testing.hpdd.intel.com/test_sets/14d6386e-8c7e-11e4-b81b-5254006e85c2
            yujian Jian Yu added a comment - Lustre Build: https://build.hpdd.intel.com/job/lustre-b2_5/86/ (2.5.3 RC1) The same failure occurred: https://testing.hpdd.intel.com/test_sets/b2a57a6e-30a3-11e4-9f57-5254006e85c2
            yujian Jian Yu added a comment -

            Thanks, Di. I can reproduce the failure every time with debug="rpctrace". I'll look into "$TMP/${tfile}_log_good".

            yujian Jian Yu added a comment - Thanks, Di. I can reproduce the failure every time with debug="rpctrace". I'll look into "$TMP/${tfile}_log_good".
            di.wang Di Wang added a comment -

            Hi, Yujian

            test_170 is supposed to verify "lctl df can identify and skip corrupted debug records", instead of abandon the whole debug log file. I guess there are some debug format problem for "rpctrace", though not sure what is the real reason here, I think you need get "$TMP/${file}_log_good" to have a look or follow the test step to repeat the test locally? Thanks.

            di.wang Di Wang added a comment - Hi, Yujian test_170 is supposed to verify "lctl df can identify and skip corrupted debug records", instead of abandon the whole debug log file. I guess there are some debug format problem for "rpctrace", though not sure what is the real reason here, I think you need get "$TMP/${file}_log_good" to have a look or follow the test step to repeat the test locally? Thanks.
            yujian Jian Yu added a comment -

            Hi Di,

            I saw that sanity test_170() was added by you in commit d9bf86ae95a599bf10bbb05818317b48eb71db1b. Could you please give me some hints about why debug="rpctrace" affects the test results of sanity test 170 on SLES client? Thanks a lot.

            yujian Jian Yu added a comment - Hi Di, I saw that sanity test_170() was added by you in commit d9bf86ae95a599bf10bbb05818317b48eb71db1b. Could you please give me some hints about why debug="rpctrace" affects the test results of sanity test 170 on SLES client? Thanks a lot.
            yujian Jian Yu added a comment -

            It turns out that sanity test 170 is affected by the lctl debug value. With debug=-1, it passed, and with debug="rpctrace", the test failed.

            In sanity.sh, debug=-1 is set before running sub-tests, which is why only running test 170 passed. In test 150, "set_default_debug_nodes $client" made the debug value change to debug="vfstrace rpctrace dlmtrace neterror ha config ioctl super", which caused test 170 fail.

            yujian Jian Yu added a comment - It turns out that sanity test 170 is affected by the lctl debug value. With debug=-1, it passed, and with debug="rpctrace", the test failed. In sanity.sh, debug=-1 is set before running sub-tests, which is why only running test 170 passed. In test 150, "set_default_debug_nodes $client" made the debug value change to debug="vfstrace rpctrace dlmtrace neterror ha config ioctl super", which caused test 170 fail.
            yujian Jian Yu added a comment - - edited

            Just narrowed down that it was the following operation in sanity test 150 which caused test 170 fail:

            remount_client $MOUNT -> zconf_mount `hostname` $1 -> set_default_debug_nodes $client
            

            After commenting out "set_default_debug_nodes $client", the failure disappeared.

            yujian Jian Yu added a comment - - edited Just narrowed down that it was the following operation in sanity test 150 which caused test 170 fail: remount_client $MOUNT -> zconf_mount `hostname` $1 -> set_default_debug_nodes $client After commenting out "set_default_debug_nodes $client", the failure disappeared.
            yujian Jian Yu added a comment -

            Finally, I found that it was sanity test 150 which caused test 170 fail on SLES11SP3 client:

            run_test 150 "truncate/append tests"
            

            I've tried several ways to fix the issue but failed. Still digging.

            yujian Jian Yu added a comment - Finally, I found that it was sanity test 150 which caused test 170 fail on SLES11SP3 client: run_test 150 "truncate/append tests" I've tried several ways to fix the issue but failed. Still digging.

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated: