[LU-2331] LOCK UP! Created: 14/Nov/12 Updated: 27/Nov/12 Resolved: 27/Nov/12 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Christopher Morrone | Assignee: | Jinshan Xiong (Inactive) |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | topsequoia | ||
| Environment: |
Sequoia client, lustre 2.3.54-2chaos, github.com/chaos/lustre. Servers were running lustre 2.3.54-6chaos. |
||
| Attachments: |
|
| Severity: | 3 |
| Rank (Obsolete): | 5563 |
| Description |
|
When running a 98,304 task ior, one of the lustre clients (Sequoia I/O Node) hit this: 2012-11-14 15:26:41.248076 {DefaultControlEventListener} [mmcs]{692}.1.2: Lustre: LOCK UP! the lock c00000039550af80 was acquired by <ptlrpcd_49:3330:brw_interpret:1998> 502 ti
me, I'm ptlrpcd_7:3288
2012-11-14 15:26:41.287858 {DefaultControlEventListener} [mmcs]{692}.1.2: Lustre: LOCK UP! the lock c00000039550af80 was acquired by <ptlrpcd_49:3330:brw_interpret:1998> 502 ti
me, I'm sysiod:3752
I believe there were then tasks stuck in read(). sysiod is the process on the I/O Node that is part of the I/O forwarding system, and is doing I/O on behalf of an ior process on a Sequoia compute node. The attached file "seqio685_console.txt" shows more of the console output when the problem hit. "seqio685_lustre_log.txt" contains the "lctl dk" output. "seqio685_backtraces.txt" contains the output of sysrq "l" and sysrq "t". |
| Comments |
| Comment by Peter Jones [ 15/Nov/12 ] |
|
Alex will triage this one |
| Comment by Jinshan Xiong (Inactive) [ 27/Nov/12 ] |
|
This should be the same/similar problem as |