Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Minor
Fix Version/s: None
Affects Version/s: None
Labels:
None

Severity:
3
Rank (Obsolete):
9223372036854775807

This issue was created by maloo for Bob Glossman <bob.glossman@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/a76f9770-b57e-11e7-9d39-52540065bddc.

The sub-test test_77c failed with the following error:

Timeout occurred after 271 mins, last suite running was sanityn, restarting cluster to continue tests

The following seen in console log from OST:

[12776.179843] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [ll_ost_io00_025:31888]
[12776.180678] Modules linked in: osp(OE) ofd(OE) lfsck(OE) ost(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 nfsv4 dns_resolver[12776.182841] NMI watchdog: BUG: soft lo

This looks like the beginning of a report of a kernel oops, but the bulk of the report is lost or thrown away. Since switching the mode of collection for console logs from per-test to per-session this happens a lot.
I think it's an artifact of keeping and using the same console log on a node both before and after a failure. Having incomplete or overwritten failure information in console logs makes it nearly impossible to see what really happened.

This fail in test 77c has only been seen for a few days. Seems likely something bad has landed in master very recently to cause it.

Info required for matching: sanityn 77c

Assignee:: WC Triage

Reporter:: Maloo

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 20/Oct/17 1:11 PM

Updated:: 13/May/18 3:13 PM

Details

Description

Attachments

Activity

People

Dates