Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
This issue was created by maloo for Bob Glossman <bob.glossman@intel.com>
This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/a76f9770-b57e-11e7-9d39-52540065bddc.
The sub-test test_77c failed with the following error:
Timeout occurred after 271 mins, last suite running was sanityn, restarting cluster to continue tests
The following seen in console log from OST:
[12776.179843] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [ll_ost_io00_025:31888] [12776.180678] Modules linked in: osp(OE) ofd(OE) lfsck(OE) ost(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 nfsv4 dns_resolver[12776.182841] NMI watchdog: BUG: soft lo
This looks like the beginning of a report of a kernel oops, but the bulk of the report is lost or thrown away. Since switching the mode of collection for console logs from per-test to per-session this happens a lot.
I think it's an artifact of keeping and using the same console log on a node both before and after a failure. Having incomplete or overwritten failure information in console logs makes it nearly impossible to see what really happened.
This fail in test 77c has only been seen for a few days. Seems likely something bad has landed in master very recently to cause it.
Info required for matching: sanityn 77c