Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4341

Failure on test suite sanity test_170: expected 31 bad lines, but got 34

Details

    • Bug
    • Resolution: Unresolved
    • Blocker
    • None
    • Lustre 2.5.0, Lustre 2.6.0, Lustre 2.5.1, Lustre 2.7.0, Lustre 2.5.3, Lustre 2.8.0
    • server and client: lustre-master build # 1784
      client is running SLES11 SP3
    • 3
    • 11880

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/7756e5f2-5bb9-11e3-8d79-52540035b04c.

      The sub-test test_170 failed with the following error:

      expected 31 bad lines, but got 34

      == sanity test 170: test lctl df to handle corrupted log ============================================= 00:50:22 (1385974222)
       sanity test_170: @@@@@@ FAIL: expected 31 bad lines, but got 34 
      

      Attachments

        Issue Links

          Activity

            [LU-4341] Failure on test suite sanity test_170: expected 31 bad lines, but got 34

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/54777/
            Subject: LU-4341 tests: re-enable SLES sanity test_170 test_243
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: b671d0904e4af33e72764891b0170db9e5a2f75c

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/54777/ Subject: LU-4341 tests: re-enable SLES sanity test_170 test_243 Project: fs/lustre-release Branch: master Current Patch Set: Commit: b671d0904e4af33e72764891b0170db9e5a2f75c

            It also looks like Jian's patch https://review.whamcloud.com/10296 "LU-4341 tests: overwrite corrupted log in sanity test 170" from this ticket was never landed. It might be a more permanent solution, if the test ever started failing again, but it looks like it is passing reliably.

            adilger Andreas Dilger added a comment - It also looks like Jian's patch https://review.whamcloud.com/10296 " LU-4341 tests: overwrite corrupted log in sanity test 170 " from this ticket was never landed. It might be a more permanent solution, if the test ever started failing again, but it looks like it is passing reliably.
            adilger Andreas Dilger added a comment - It looks like the SLES version checking in sanity.sh was not working anyway, because test_170 was running on sles15sp5 without issue: https://testing.whamcloud.com/search?client_branch_type_id=24a6947e-04a9-11e1-bb5f-52540025f9af&client_distribution_type_id=11213c4f-3869-4ef2-be79-350206e26db2&horizon=518400&test_set_script_id=f9516376-32bc-11e0-aaee-52540025f9ae&sub_test_script_id=144cdcae-32be-11e0-b685-52540025f9ae&source=sub_tests#redirect

            "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/54777
            Subject: LU-4341 tests: re-enable sanity test_170 on SLES
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: e4d1f0c9e35bdda428381d4bc76029d1d3059c59

            gerrit Gerrit Updater added a comment - "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/54777 Subject: LU-4341 tests: re-enable sanity test_170 on SLES Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: e4d1f0c9e35bdda428381d4bc76029d1d3059c59

            This is still marked "always_except" for all SLES testing. Reopen and push a patch to see if this is still failing.

            adilger Andreas Dilger added a comment - This is still marked " always_except " for all SLES testing. Reopen and push a patch to see if this is still failing.

            This appears to be failing on master on SLES11.3 tests in the past week:
            https://testing.hpdd.intel.com/sub_tests/05b5baa6-73f7-11e5-ada9-5254006e85c2
            https://testing.hpdd.intel.com/sub_tests/07fb912c-73df-11e5-ab44-5254006e85c2
            https://testing.hpdd.intel.com/sub_tests/9c5611ca-73ea-11e5-ab44-5254006e85c2
            https://testing.hpdd.intel.com/sub_tests/bae5eb2c-722f-11e5-b344-5254006e85c2
            https://testing.hpdd.intel.com/sub_tests/49587898-7073-11e5-b705-5254006e85c2
            https://testing.hpdd.intel.com/sub_tests/8bfb6e00-7071-11e5-b705-5254006e85c2
            https://testing.hpdd.intel.com/sub_tests/31313890-6f24-11e5-83a9-5254006e85c2

            The patch http://review.whamcloud.com/16146 has landed, but that doesn't solve the problem itself. An improved patch would only skip the "expected N bad lines, but got M" check in test_170 for SLES11.3+ and RHEL7 as well rather than the whole test_170. While in there, it should also remove the "-rf" from rm -rf $DIR/$tfile since that is a file and not a directory and shouldn't fail in any case.

            One possible source of the bug is that the first cat $TMP/${tfile}_log_good >> $TMP/${tfile}_logs_corrupt is appending to a file (>>) instead of first truncating it (>) so if the ${tfile}_logs_corrupt file is lingering around from a previous test run for some reason it might cause problems. It does seem like the number of bad lines is always higher than the number of expected lines, so this seems like a candidate.

            In any case, this bug cannot be closed until the actual test failure is understood and fixed.

            adilger Andreas Dilger added a comment - This appears to be failing on master on SLES11.3 tests in the past week: https://testing.hpdd.intel.com/sub_tests/05b5baa6-73f7-11e5-ada9-5254006e85c2 https://testing.hpdd.intel.com/sub_tests/07fb912c-73df-11e5-ab44-5254006e85c2 https://testing.hpdd.intel.com/sub_tests/9c5611ca-73ea-11e5-ab44-5254006e85c2 https://testing.hpdd.intel.com/sub_tests/bae5eb2c-722f-11e5-b344-5254006e85c2 https://testing.hpdd.intel.com/sub_tests/49587898-7073-11e5-b705-5254006e85c2 https://testing.hpdd.intel.com/sub_tests/8bfb6e00-7071-11e5-b705-5254006e85c2 https://testing.hpdd.intel.com/sub_tests/31313890-6f24-11e5-83a9-5254006e85c2 The patch http://review.whamcloud.com/16146 has landed, but that doesn't solve the problem itself. An improved patch would only skip the "expected N bad lines, but got M" check in test_170 for SLES11.3+ and RHEL7 as well rather than the whole test_170. While in there, it should also remove the "-rf" from rm -rf $DIR/$tfile since that is a file and not a directory and shouldn't fail in any case. One possible source of the bug is that the first cat $TMP/${tfile}_log_good >> $TMP/${tfile}_logs_corrupt is appending to a file ( >> ) instead of first truncating it ( > ) so if the ${tfile}_logs_corrupt file is lingering around from a previous test run for some reason it might cause problems. It does seem like the number of bad lines is always higher than the number of expected lines, so this seems like a candidate. In any case, this bug cannot be closed until the actual test failure is understood and fixed.

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16146/
            Subject: LU-4341 test: skip failing sanity test 170
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: ef63c034b437d47cd10fe7ee94ed614ac1359f44

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16146/ Subject: LU-4341 test: skip failing sanity test 170 Project: fs/lustre-release Branch: master Current Patch Set: Commit: ef63c034b437d47cd10fe7ee94ed614ac1359f44

            best I can do for now is push a mod to ALWAYS_EXCEPT on sles11

            bogl Bob Glossman (Inactive) added a comment - best I can do for now is push a mod to ALWAYS_EXCEPT on sles11

            Bob Glossman (bob.glossman@intel.com) uploaded a new patch: http://review.whamcloud.com/16146
            Subject: LU-4341 test: skip failing sanity test 170
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 7f5668027d0b1393640c8185e8084c2957c8bdbe

            gerrit Gerrit Updater added a comment - Bob Glossman (bob.glossman@intel.com) uploaded a new patch: http://review.whamcloud.com/16146 Subject: LU-4341 test: skip failing sanity test 170 Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 7f5668027d0b1393640c8185e8084c2957c8bdbe

            Can someone please definitively understand why this test is failing for SLES, and either fix it or add it to the ALWAYS_EXCEPT list for SLES. It doesn't make sense to exclude this via envdefinitions for SLES patches, when it will still fail when someone forgets to except it.

            adilger Andreas Dilger added a comment - Can someone please definitively understand why this test is failing for SLES, and either fix it or add it to the ALWAYS_EXCEPT list for SLES. It doesn't make sense to exclude this via envdefinitions for SLES patches, when it will still fail when someone forgets to except it.

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated: