[LU-2331] LOCK UP! Created: 14/Nov/12  Updated: 27/Nov/12  Resolved: 27/Nov/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: None

Type: Bug Priority: Blocker
Reporter: Christopher Morrone Assignee: Jinshan Xiong (Inactive)
Resolution: Duplicate Votes: 0
Labels: topsequoia
Environment:

Sequoia client, lustre 2.3.54-2chaos, github.com/chaos/lustre. Servers were running lustre 2.3.54-6chaos.


Attachments: Text File seqio685_backtraces.txt     Text File seqio685_console.txt     Text File seqio685_lustre_log.txt    
Severity: 3
Rank (Obsolete): 5563

 Description   

When running a 98,304 task ior, one of the lustre clients (Sequoia I/O Node) hit this:

2012-11-14 15:26:41.248076 {DefaultControlEventListener} [mmcs]{692}.1.2: Lustre: LOCK UP! the lock c00000039550af80 was acquired by <ptlrpcd_49:3330:brw_interpret:1998> 502 ti
me, I'm ptlrpcd_7:3288
2012-11-14 15:26:41.287858 {DefaultControlEventListener} [mmcs]{692}.1.2: Lustre: LOCK UP! the lock c00000039550af80 was acquired by <ptlrpcd_49:3330:brw_interpret:1998> 502 ti
me, I'm sysiod:3752

I believe there were then tasks stuck in read(). sysiod is the process on the I/O Node that is part of the I/O forwarding system, and is doing I/O on behalf of an ior process on a Sequoia compute node.

The attached file "seqio685_console.txt" shows more of the console output when the problem hit. "seqio685_lustre_log.txt" contains the "lctl dk" output. "seqio685_backtraces.txt" contains the output of sysrq "l" and sysrq "t".



 Comments   
Comment by Peter Jones [ 15/Nov/12 ]

Alex will triage this one

Comment by Jinshan Xiong (Inactive) [ 27/Nov/12 ]

This should be the same/similar problem as LU-2332, let's track it there.

Generated at Sat Feb 10 01:24:18 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.