Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Cannot Reproduce
Priority: Critical
Fix Version/s: None
Affects Version/s: Lustre 2.4.2
Labels:
- llnl
Environment:
Sequoia, 2.6.32-431.23.3.1chaos, github.com/chaos/lustre

Severity:
3
Rank (Obsolete):
16049

Description

The node becomes unresponsive to users and the lustre client appears to be hung after being evicted by the MDT. The node remains responsive to SysRq. After crashing the node, it boots and mounts lustre successfully.

The symptoms develop as follows:

First the node starts reporting connection lost/connection restored
notices for an OST (same one repeatedly). Then the node reports it has
been evicted by the MDT. There are then a series of failure messages
that appear to be the normal consequence of the eviction.

We then start seeing "spinning too long" messages from
ptlrpc_check_set() within the ptlrpcd_rcv task, and the kernel starts
reporting soft lockups on tasks ptlrpcd* and ll_ping. The node becomes
unresponsive to everything other than SysRq. The operators then crash
the node, and it comes up and mounts lustre successfully.

Attachments

Activity

People

Assignee:: Niu Yawei (Inactive)

Reporter:: Olaf Faaland

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 08/Oct/14 11:44 PM

Updated:: 21/Nov/16 5:46 PM

Resolved:: 21/Nov/16 5:46 PM