[LU-4341] Failure on test suite sanity test_170: expected 31 bad lines, but got 34 - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Unresolved
Priority: Blocker
Fix Version/s: None
Affects Version/s: Lustre 2.5.0, Lustre 2.6.0, Lustre 2.5.1, Lustre 2.7.0, Lustre 2.5.3, Lustre 2.8.0
Labels:
- always_except
Environment:
server and client: lustre-master build # 1784
client is running SLES11 SP3

Severity:
3
Rank (Obsolete):
11880

Description

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/7756e5f2-5bb9-11e3-8d79-52540035b04c.

The sub-test test_170 failed with the following error:

expected 31 bad lines, but got 34

== sanity test 170: test lctl df to handle corrupted log ============================================= 00:50:22 (1385974222)
 sanity test_170: @@@@@@ FAIL: expected 31 bad lines, but got 34

Attachments

Issue Links

mentioned in: Page No Confluence page found with the given URL.; Page No Confluence page found with the given URL.; Page No Confluence page found with the given URL.; Page No Confluence page found with the given URL.; Page No Confluence page found with the given URL.; Page Loading...; Page Loading...; Page Loading...

(3 mentioned in)

Activity

[LU-4341] Failure on test suite sanity test_170: expected 31 bad lines, but got 34

Andreas Dilger added a comment - 16/Oct/15 7:37 PM

This appears to be failing on master on SLES11.3 tests in the past week:
https://testing.hpdd.intel.com/sub_tests/05b5baa6-73f7-11e5-ada9-5254006e85c2
https://testing.hpdd.intel.com/sub_tests/07fb912c-73df-11e5-ab44-5254006e85c2
https://testing.hpdd.intel.com/sub_tests/9c5611ca-73ea-11e5-ab44-5254006e85c2
https://testing.hpdd.intel.com/sub_tests/bae5eb2c-722f-11e5-b344-5254006e85c2
https://testing.hpdd.intel.com/sub_tests/49587898-7073-11e5-b705-5254006e85c2
https://testing.hpdd.intel.com/sub_tests/8bfb6e00-7071-11e5-b705-5254006e85c2
https://testing.hpdd.intel.com/sub_tests/31313890-6f24-11e5-83a9-5254006e85c2

The patch http://review.whamcloud.com/16146 has landed, but that doesn't solve the problem itself. An improved patch would only skip the "expected N bad lines, but got M" check in test_170 for SLES11.3+ and RHEL7 as well rather than the whole test_170. While in there, it should also remove the "-rf" from rm -rf $DIR/$tfile since that is a file and not a directory and shouldn't fail in any case.

One possible source of the bug is that the first cat $TMP/${tfile}_log_good >> $TMP/${tfile}_logs_corrupt is appending to a file (>>) instead of first truncating it (>) so if the ${tfile}_logs_corrupt file is lingering around from a previous test run for some reason it might cause problems. It does seem like the number of bad lines is always higher than the number of expected lines, so this seems like a candidate.

In any case, this bug cannot be closed until the actual test failure is understood and fixed.

Andreas Dilger added a comment - 16/Oct/15 7:37 PM This appears to be failing on master on SLES11.3 tests in the past week: https://testing.hpdd.intel.com/sub_tests/05b5baa6-73f7-11e5-ada9-5254006e85c2 https://testing.hpdd.intel.com/sub_tests/07fb912c-73df-11e5-ab44-5254006e85c2 https://testing.hpdd.intel.com/sub_tests/9c5611ca-73ea-11e5-ab44-5254006e85c2 https://testing.hpdd.intel.com/sub_tests/bae5eb2c-722f-11e5-b344-5254006e85c2 https://testing.hpdd.intel.com/sub_tests/49587898-7073-11e5-b705-5254006e85c2 https://testing.hpdd.intel.com/sub_tests/8bfb6e00-7071-11e5-b705-5254006e85c2 https://testing.hpdd.intel.com/sub_tests/31313890-6f24-11e5-83a9-5254006e85c2 The patch http://review.whamcloud.com/16146 has landed, but that doesn't solve the problem itself. An improved patch would only skip the "expected N bad lines, but got M" check in test_170 for SLES11.3+ and RHEL7 as well rather than the whole test_170. While in there, it should also remove the "-rf" from rm -rf $DIR/$tfile since that is a file and not a directory and shouldn't fail in any case. One possible source of the bug is that the first cat $TMP/${tfile}_log_good >> $TMP/${tfile}_logs_corrupt is appending to a file ( >> ) instead of first truncating it ( > ) so if the ${tfile}_logs_corrupt file is lingering around from a previous test run for some reason it might cause problems. It does seem like the number of bad lines is always higher than the number of expected lines, so this seems like a candidate. In any case, this bug cannot be closed until the actual test failure is understood and fixed.

Gerrit Updater added a comment - 16/Oct/15 6:05 PM

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16146/
Subject: LU-4341 test: skip failing sanity test 170
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: ef63c034b437d47cd10fe7ee94ed614ac1359f44

Gerrit Updater added a comment - 16/Oct/15 6:05 PM Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16146/ Subject: LU-4341 test: skip failing sanity test 170 Project: fs/lustre-release Branch: master Current Patch Set: Commit: ef63c034b437d47cd10fe7ee94ed614ac1359f44

Bob Glossman (Inactive) added a comment - 31/Aug/15 7:34 PM

best I can do for now is push a mod to ALWAYS_EXCEPT on sles11

Bob Glossman (Inactive) added a comment - 31/Aug/15 7:34 PM best I can do for now is push a mod to ALWAYS_EXCEPT on sles11

Gerrit Updater added a comment - 31/Aug/15 7:28 PM

Bob Glossman (bob.glossman@intel.com) uploaded a new patch: http://review.whamcloud.com/16146
Subject: LU-4341 test: skip failing sanity test 170
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 7f5668027d0b1393640c8185e8084c2957c8bdbe

Gerrit Updater added a comment - 31/Aug/15 7:28 PM Bob Glossman (bob.glossman@intel.com) uploaded a new patch: http://review.whamcloud.com/16146 Subject: LU-4341 test: skip failing sanity test 170 Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 7f5668027d0b1393640c8185e8084c2957c8bdbe

Andreas Dilger added a comment - 31/Aug/15 6:32 PM

Can someone please definitively understand why this test is failing for SLES, and either fix it or add it to the ALWAYS_EXCEPT list for SLES. It doesn't make sense to exclude this via envdefinitions for SLES patches, when it will still fail when someone forgets to except it.

Andreas Dilger added a comment - 31/Aug/15 6:32 PM Can someone please definitively understand why this test is failing for SLES, and either fix it or add it to the ALWAYS_EXCEPT list for SLES. It doesn't make sense to exclude this via envdefinitions for SLES patches, when it will still fail when someone forgets to except it.

Sarah Liu added a comment - 20/May/15 12:46 AM

another instance:
https://testing.hpdd.intel.com/test_sets/02e36236-fe29-11e4-be9d-5254006e85c2

Sarah Liu added a comment - 20/May/15 12:46 AM another instance: https://testing.hpdd.intel.com/test_sets/02e36236-fe29-11e4-be9d-5254006e85c2

Sarah Liu added a comment - 01/Apr/15 7:00 PM

another instance
https://testing.hpdd.intel.com/test_sets/834ace12-d75c-11e4-a678-5254006e85c2

Sarah Liu added a comment - 01/Apr/15 7:00 PM another instance https://testing.hpdd.intel.com/test_sets/834ace12-d75c-11e4-a678-5254006e85c2

Bob Glossman (Inactive) added a comment - 26/Mar/15 3:54 PM

another seen in master:
https://testing.hpdd.intel.com/test_sets/83277a58-d3cd-11e4-8c98-5254006e85c2

Bob Glossman (Inactive) added a comment - 26/Mar/15 3:54 PM another seen in master: https://testing.hpdd.intel.com/test_sets/83277a58-d3cd-11e4-8c98-5254006e85c2

Sarah Liu added a comment - 17/Feb/15 7:53 PM

hit this error in tag-2.6.94 test:

https://testing.hpdd.intel.com/test_sets/c53f0196-b22a-11e4-af8e-5254006e85c2

Sarah Liu added a comment - 17/Feb/15 7:53 PM hit this error in tag-2.6.94 test: https://testing.hpdd.intel.com/test_sets/c53f0196-b22a-11e4-af8e-5254006e85c2

Bob Glossman (Inactive) added a comment - 26/Dec/14 12:44 AM

seen in master with sles11sp3 client/server:
https://testing.hpdd.intel.com/test_sets/14d6386e-8c7e-11e4-b81b-5254006e85c2

Bob Glossman (Inactive) added a comment - 26/Dec/14 12:44 AM seen in master with sles11sp3 client/server: https://testing.hpdd.intel.com/test_sets/14d6386e-8c7e-11e4-b81b-5254006e85c2

Jian Yu added a comment - 31/Aug/14 7:19 AM

Lustre Build: https://build.hpdd.intel.com/job/lustre-b2_5/86/ (2.5.3 RC1)

The same failure occurred:
https://testing.hpdd.intel.com/test_sets/b2a57a6e-30a3-11e4-9f57-5254006e85c2

Jian Yu added a comment - 31/Aug/14 7:19 AM Lustre Build: https://build.hpdd.intel.com/job/lustre-b2_5/86/ (2.5.3 RC1) The same failure occurred: https://testing.hpdd.intel.com/test_sets/b2a57a6e-30a3-11e4-9f57-5254006e85c2

People

Assignee:: WC Triage

Reporter:: Maloo

Votes:: 0 Vote for this issue

Watchers:: 11 Start watching this issue

Dates

Created:: 03/Dec/13 9:23 PM

Updated:: 30/Apr/24 6:46 AM