Details
-
Improvement
-
Resolution: Unresolved
-
Minor
-
None
-
None
-
None
-
9223372036854775807
Description
I'm wondering about is whether we could add a check in test-framework.sh::cleanup_check() that looks for excess console error messages on the console, like we do for LBUG, Busy inode, and memory leaks? It seems that we've had several situations like this (LU-13136, LU-12712, LU-11579, LU-8294, LU-1095, ...) that are not detected during normal testing because they do not actually cause any tests to fail, but are annoying to end users.
One option would be to scan the whole dmesg log looking for Lustre: and LustreError: messages, possibly excluding D_CONSOLE, MARKER, and similar messages, and instead checking for duplicate output from the same line, like mdt_lvb.c:163:mdt_lvbo_fill() in this case, to avoid differences in the details of the message:
LustreError: 2456:0:(mdt_lvb.c:163:mdt_lvbo_fill()) myth-MDT0000: expected 336 actual 240.
then sort and count the number of such messages and trigger an error above a certain threshold.
We might have to make a 'whitelist' for a specific number of errors that are generated during specific test that are not necessarily a sign of problems (e.g. the llog-test runs in sanity test-60a), but they should be confined to a specific test script and an approximate count of failures (e.g. SANITY_CONSOLE_MDS_EXCEPT="mdt_lvbo_fill:100 ...", SANITY_CONSOLE_CLIENT_EXCEPT="ptlrpc_expire_one_request:250 ...", etc.).
While this may cause some spurious test failures as new subtests are added to a specific script, that would be an exception rather than the rule, and would at least give us a chance to detect unusual errors being printed to the console during testing.
Attachments
Issue Links
- is related to
-
LU-1095 Console message cleanup
- Reopened
-
LU-8294 Noisy gss_svc_upcall_handle_init
- Resolved
-
LU-11579 cl_file_inode_init()) ASSERTION(inode->i_state & (1 << 3) ) failed:
- Resolved
-
LU-12712 sanity-pfl tests triggering “not SEL magic on SEL file”
- Resolved
-
LU-13136 (layout.c:2121:__req_capsule_get()) @@@ Wrong buffer for field 'niobuf_inline' (7 of 7) in format 'LDLM_INTENT_OPEN', 0 vs. 0 (server)
- Resolved