[LU-3304] Test failure on sanity-quota test_18: watchdog triggered Created: 09/May/13 Updated: 10/Jul/13 Resolved: 10/Jul/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.5.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | James Nunez (Inactive) | Assignee: | James Nunez (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 8186 | ||||||||||||
| Description |
|
sanity-quota test_18 is failing due to watchdog picking up messages from other sanity-quota tests. This issue relates to the following test suite run: Error message for test_18 is: Lustre: DEBUG MARKER: sanity-quota test_18: @@@@@@ FAIL: Lustre: DEBUG MARKER: sanity-quota test_6: @@@@@@ FAIL: LNet: Service thread pid 21586 was inactive for 40.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugg |
| Comments |
| Comment by James Nunez (Inactive) [ 09/May/13 ] |
|
sanity-quota test_6 and test_18 check watchdog in same way, i.e. very similar code. Fixes will be similar. |
| Comment by James Nunez (Inactive) [ 10/May/13 ] |
|
The proposed patch is at: |
| Comment by James Nunez (Inactive) [ 10/May/13 ] |
|
The problem here is that test_18 is picking up lines from dmesg from any test. The awk command local watchdog=$(awk '/sanity-quota test 18/ {start = 1;}
/Service thread pid/ && /was inactive/ {
if (start) {
print;
}
}' $TMP/lustre-log-${TESTNAME}.log)
sets the start flag to 1 when it sees "sanity-quota test 18" in dmesg, but the flag is never turned off. So, if "sanity-quota test 18" is seen once in dmesg, we get a false positive if any other test generated a message with "Service thread pid" and "was inactive" in it. I chose to implement Oleg's suggested fix for |
| Comment by James Nunez (Inactive) [ 10/Jul/13 ] |
|
Patch landed to master. |