[LU-4341] Failure on test suite sanity test_170: expected 31 bad lines, but got 34 - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Unresolved
Priority: Blocker
Fix Version/s: None
Affects Version/s: Lustre 2.5.0, Lustre 2.6.0, Lustre 2.5.1, Lustre 2.7.0, Lustre 2.5.3, Lustre 2.8.0
Labels:
- always_except
Environment:
server and client: lustre-master build # 1784
client is running SLES11 SP3

Severity:
3
Rank (Obsolete):
11880

Description

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/7756e5f2-5bb9-11e3-8d79-52540035b04c.

The sub-test test_170 failed with the following error:

expected 31 bad lines, but got 34

== sanity test 170: test lctl df to handle corrupted log ============================================= 00:50:22 (1385974222)
 sanity test_170: @@@@@@ FAIL: expected 31 bad lines, but got 34

Attachments

Issue Links

mentioned in: Page No Confluence page found with the given URL.; Page No Confluence page found with the given URL.; Page No Confluence page found with the given URL.; Page No Confluence page found with the given URL.; Page No Confluence page found with the given URL.; Page Loading...; Page Loading...; Page Loading...

(3 mentioned in)

Activity

[LU-4341] Failure on test suite sanity test_170: expected 31 bad lines, but got 34

Sarah Liu added a comment - 01/Apr/15 7:00 PM

another instance
https://testing.hpdd.intel.com/test_sets/834ace12-d75c-11e4-a678-5254006e85c2

Sarah Liu added a comment - 01/Apr/15 7:00 PM another instance https://testing.hpdd.intel.com/test_sets/834ace12-d75c-11e4-a678-5254006e85c2

Bob Glossman (Inactive) added a comment - 26/Mar/15 3:54 PM

another seen in master:
https://testing.hpdd.intel.com/test_sets/83277a58-d3cd-11e4-8c98-5254006e85c2

Bob Glossman (Inactive) added a comment - 26/Mar/15 3:54 PM another seen in master: https://testing.hpdd.intel.com/test_sets/83277a58-d3cd-11e4-8c98-5254006e85c2

Sarah Liu added a comment - 17/Feb/15 7:53 PM

hit this error in tag-2.6.94 test:

https://testing.hpdd.intel.com/test_sets/c53f0196-b22a-11e4-af8e-5254006e85c2

Sarah Liu added a comment - 17/Feb/15 7:53 PM hit this error in tag-2.6.94 test: https://testing.hpdd.intel.com/test_sets/c53f0196-b22a-11e4-af8e-5254006e85c2

Bob Glossman (Inactive) added a comment - 26/Dec/14 12:44 AM

seen in master with sles11sp3 client/server:
https://testing.hpdd.intel.com/test_sets/14d6386e-8c7e-11e4-b81b-5254006e85c2

Bob Glossman (Inactive) added a comment - 26/Dec/14 12:44 AM seen in master with sles11sp3 client/server: https://testing.hpdd.intel.com/test_sets/14d6386e-8c7e-11e4-b81b-5254006e85c2

Jian Yu added a comment - 31/Aug/14 7:19 AM

Lustre Build: https://build.hpdd.intel.com/job/lustre-b2_5/86/ (2.5.3 RC1)

The same failure occurred:
https://testing.hpdd.intel.com/test_sets/b2a57a6e-30a3-11e4-9f57-5254006e85c2

Jian Yu added a comment - 31/Aug/14 7:19 AM Lustre Build: https://build.hpdd.intel.com/job/lustre-b2_5/86/ (2.5.3 RC1) The same failure occurred: https://testing.hpdd.intel.com/test_sets/b2a57a6e-30a3-11e4-9f57-5254006e85c2

Jian Yu added a comment - 14/Jul/14 4:38 AM

Thanks, Di. I can reproduce the failure every time with debug="rpctrace". I'll look into "$TMP/${tfile}_log_good".

Jian Yu added a comment - 14/Jul/14 4:38 AM Thanks, Di. I can reproduce the failure every time with debug="rpctrace". I'll look into "$TMP/${tfile}_log_good".

Di Wang added a comment - 27/May/14 5:10 PM

Hi, Yujian

test_170 is supposed to verify "lctl df can identify and skip corrupted debug records", instead of abandon the whole debug log file. I guess there are some debug format problem for "rpctrace", though not sure what is the real reason here, I think you need get "$TMP/${file}_log_good" to have a look or follow the test step to repeat the test locally? Thanks.

Di Wang added a comment - 27/May/14 5:10 PM Hi, Yujian test_170 is supposed to verify "lctl df can identify and skip corrupted debug records", instead of abandon the whole debug log file. I guess there are some debug format problem for "rpctrace", though not sure what is the real reason here, I think you need get "$TMP/${file}_log_good" to have a look or follow the test step to repeat the test locally? Thanks.

Jian Yu added a comment - 19/May/14 10:13 AM

Hi Di,

I saw that sanity test_170() was added by you in commit d9bf86ae95a599bf10bbb05818317b48eb71db1b. Could you please give me some hints about why debug="rpctrace" affects the test results of sanity test 170 on SLES client? Thanks a lot.

Jian Yu added a comment - 19/May/14 10:13 AM Hi Di, I saw that sanity test_170() was added by you in commit d9bf86ae95a599bf10bbb05818317b48eb71db1b. Could you please give me some hints about why debug="rpctrace" affects the test results of sanity test 170 on SLES client? Thanks a lot.

Jian Yu added a comment - 16/May/14 4:20 PM

It turns out that sanity test 170 is affected by the lctl debug value. With debug=-1, it passed, and with debug="rpctrace", the test failed.

In sanity.sh, debug=-1 is set before running sub-tests, which is why only running test 170 passed. In test 150, "set_default_debug_nodes $client" made the debug value change to debug="vfstrace rpctrace dlmtrace neterror ha config ioctl super", which caused test 170 fail.

Jian Yu added a comment - 16/May/14 4:20 PM It turns out that sanity test 170 is affected by the lctl debug value. With debug=-1, it passed, and with debug="rpctrace", the test failed. In sanity.sh, debug=-1 is set before running sub-tests, which is why only running test 170 passed. In test 150, "set_default_debug_nodes $client" made the debug value change to debug="vfstrace rpctrace dlmtrace neterror ha config ioctl super", which caused test 170 fail.

Jian Yu added a comment - 15/May/14 2:02 PM - edited

Just narrowed down that it was the following operation in sanity test 150 which caused test 170 fail:

remount_client $MOUNT -> zconf_mount `hostname` $1 -> set_default_debug_nodes $client

After commenting out "set_default_debug_nodes $client", the failure disappeared.

Jian Yu added a comment - 15/May/14 2:02 PM - edited Just narrowed down that it was the following operation in sanity test 150 which caused test 170 fail: remount_client $MOUNT -> zconf_mount `hostname` $1 -> set_default_debug_nodes $client After commenting out "set_default_debug_nodes $client", the failure disappeared.

Jian Yu added a comment - 13/May/14 3:24 PM

Finally, I found that it was sanity test 150 which caused test 170 fail on SLES11SP3 client:

run_test 150 "truncate/append tests"

I've tried several ways to fix the issue but failed. Still digging.

Jian Yu added a comment - 13/May/14 3:24 PM Finally, I found that it was sanity test 150 which caused test 170 fail on SLES11SP3 client: run_test 150 "truncate/append tests" I've tried several ways to fix the issue but failed. Still digging.

People

Assignee:: WC Triage

Reporter:: Maloo

Votes:: 0 Vote for this issue

Watchers:: 11 Start watching this issue

Dates

Created:: 03/Dec/13 9:23 PM

Updated:: 30/Apr/24 6:46 AM