Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4341

Failure on test suite sanity test_170: expected 31 bad lines, but got 34

Details

    • Bug
    • Resolution: Unresolved
    • Blocker
    • None
    • Lustre 2.5.0, Lustre 2.6.0, Lustre 2.5.1, Lustre 2.7.0, Lustre 2.5.3, Lustre 2.8.0
    • server and client: lustre-master build # 1784
      client is running SLES11 SP3
    • 3
    • 11880

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/7756e5f2-5bb9-11e3-8d79-52540035b04c.

      The sub-test test_170 failed with the following error:

      expected 31 bad lines, but got 34

      == sanity test 170: test lctl df to handle corrupted log ============================================= 00:50:22 (1385974222)
       sanity test_170: @@@@@@ FAIL: expected 31 bad lines, but got 34 
      

      Attachments

        Issue Links

          Activity

            [LU-4341] Failure on test suite sanity test_170: expected 31 bad lines, but got 34

            This is still marked "always_except" for all SLES testing. Reopen and push a patch to see if this is still failing.

            adilger Andreas Dilger added a comment - This is still marked " always_except " for all SLES testing. Reopen and push a patch to see if this is still failing.

            This appears to be failing on master on SLES11.3 tests in the past week:
            https://testing.hpdd.intel.com/sub_tests/05b5baa6-73f7-11e5-ada9-5254006e85c2
            https://testing.hpdd.intel.com/sub_tests/07fb912c-73df-11e5-ab44-5254006e85c2
            https://testing.hpdd.intel.com/sub_tests/9c5611ca-73ea-11e5-ab44-5254006e85c2
            https://testing.hpdd.intel.com/sub_tests/bae5eb2c-722f-11e5-b344-5254006e85c2
            https://testing.hpdd.intel.com/sub_tests/49587898-7073-11e5-b705-5254006e85c2
            https://testing.hpdd.intel.com/sub_tests/8bfb6e00-7071-11e5-b705-5254006e85c2
            https://testing.hpdd.intel.com/sub_tests/31313890-6f24-11e5-83a9-5254006e85c2

            The patch http://review.whamcloud.com/16146 has landed, but that doesn't solve the problem itself. An improved patch would only skip the "expected N bad lines, but got M" check in test_170 for SLES11.3+ and RHEL7 as well rather than the whole test_170. While in there, it should also remove the "-rf" from rm -rf $DIR/$tfile since that is a file and not a directory and shouldn't fail in any case.

            One possible source of the bug is that the first cat $TMP/${tfile}_log_good >> $TMP/${tfile}_logs_corrupt is appending to a file (>>) instead of first truncating it (>) so if the ${tfile}_logs_corrupt file is lingering around from a previous test run for some reason it might cause problems. It does seem like the number of bad lines is always higher than the number of expected lines, so this seems like a candidate.

            In any case, this bug cannot be closed until the actual test failure is understood and fixed.

            adilger Andreas Dilger added a comment - This appears to be failing on master on SLES11.3 tests in the past week: https://testing.hpdd.intel.com/sub_tests/05b5baa6-73f7-11e5-ada9-5254006e85c2 https://testing.hpdd.intel.com/sub_tests/07fb912c-73df-11e5-ab44-5254006e85c2 https://testing.hpdd.intel.com/sub_tests/9c5611ca-73ea-11e5-ab44-5254006e85c2 https://testing.hpdd.intel.com/sub_tests/bae5eb2c-722f-11e5-b344-5254006e85c2 https://testing.hpdd.intel.com/sub_tests/49587898-7073-11e5-b705-5254006e85c2 https://testing.hpdd.intel.com/sub_tests/8bfb6e00-7071-11e5-b705-5254006e85c2 https://testing.hpdd.intel.com/sub_tests/31313890-6f24-11e5-83a9-5254006e85c2 The patch http://review.whamcloud.com/16146 has landed, but that doesn't solve the problem itself. An improved patch would only skip the "expected N bad lines, but got M" check in test_170 for SLES11.3+ and RHEL7 as well rather than the whole test_170. While in there, it should also remove the "-rf" from rm -rf $DIR/$tfile since that is a file and not a directory and shouldn't fail in any case. One possible source of the bug is that the first cat $TMP/${tfile}_log_good >> $TMP/${tfile}_logs_corrupt is appending to a file ( >> ) instead of first truncating it ( > ) so if the ${tfile}_logs_corrupt file is lingering around from a previous test run for some reason it might cause problems. It does seem like the number of bad lines is always higher than the number of expected lines, so this seems like a candidate. In any case, this bug cannot be closed until the actual test failure is understood and fixed.

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16146/
            Subject: LU-4341 test: skip failing sanity test 170
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: ef63c034b437d47cd10fe7ee94ed614ac1359f44

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16146/ Subject: LU-4341 test: skip failing sanity test 170 Project: fs/lustre-release Branch: master Current Patch Set: Commit: ef63c034b437d47cd10fe7ee94ed614ac1359f44

            best I can do for now is push a mod to ALWAYS_EXCEPT on sles11

            bogl Bob Glossman (Inactive) added a comment - best I can do for now is push a mod to ALWAYS_EXCEPT on sles11

            Bob Glossman (bob.glossman@intel.com) uploaded a new patch: http://review.whamcloud.com/16146
            Subject: LU-4341 test: skip failing sanity test 170
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 7f5668027d0b1393640c8185e8084c2957c8bdbe

            gerrit Gerrit Updater added a comment - Bob Glossman (bob.glossman@intel.com) uploaded a new patch: http://review.whamcloud.com/16146 Subject: LU-4341 test: skip failing sanity test 170 Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 7f5668027d0b1393640c8185e8084c2957c8bdbe

            Can someone please definitively understand why this test is failing for SLES, and either fix it or add it to the ALWAYS_EXCEPT list for SLES. It doesn't make sense to exclude this via envdefinitions for SLES patches, when it will still fail when someone forgets to except it.

            adilger Andreas Dilger added a comment - Can someone please definitively understand why this test is failing for SLES, and either fix it or add it to the ALWAYS_EXCEPT list for SLES. It doesn't make sense to exclude this via envdefinitions for SLES patches, when it will still fail when someone forgets to except it.
            sarah Sarah Liu added a comment - another instance: https://testing.hpdd.intel.com/test_sets/02e36236-fe29-11e4-be9d-5254006e85c2
            sarah Sarah Liu added a comment - another instance https://testing.hpdd.intel.com/test_sets/834ace12-d75c-11e4-a678-5254006e85c2
            bogl Bob Glossman (Inactive) added a comment - another seen in master: https://testing.hpdd.intel.com/test_sets/83277a58-d3cd-11e4-8c98-5254006e85c2
            sarah Sarah Liu added a comment - hit this error in tag-2.6.94 test: https://testing.hpdd.intel.com/test_sets/c53f0196-b22a-11e4-af8e-5254006e85c2
            bogl Bob Glossman (Inactive) added a comment - seen in master with sles11sp3 client/server: https://testing.hpdd.intel.com/test_sets/14d6386e-8c7e-11e4-b81b-5254006e85c2

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated: