Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
Lustre 2.6.0
-
Hyperion - 2.5.60 build 2538
-
3
-
14694
Description
After hard failover of devices to server iws19, server wedged, then hit LBUG.
Services never complete recovery, and appear to either restart the timer or something:
Jun 27 10:08:53 iws19 kernel: Lustre: lustre-OST000c: Will be in recovery for at least 2:30, or until 316 clients reconnect Jun 27 10:11:53 iws19 kernel: Lustre: lustre-OST000c: recovery is timed out, evict stale exports Jun 27 10:18:30 iws19 kernel: Lustre: lustre-OST000c: Client c52d4856-d1df-b87b-911c-f1bfbc23a24d (at 192.168.124.182@o2ib) reconnecting, waiting for 316 clients in recovery for 2:27
The server reports being cpu-bound prior to the failure
Console log attached - unfortunately the dump after the LBUG failed.
Attachments
Issue Links
Activity
Labels | Original: HB mq414 | New: HB |
Fix Version/s | New: Lustre 2.5.4 [ 11190 ] |
Labels | Original: HB | New: HB mq414 |
Attachment | New: iws29.messages.txt [ 15388 ] | |
Attachment | New: iws29.lustre-log.1405540562.8181.txt [ 15389 ] | |
Attachment | New: iws29.lustre-log.1405540543.8118.txt [ 15390 ] |
Resolution | New: Fixed [ 1 ] | |
Status | Original: Open [ 1 ] | New: Resolved [ 5 ] |
Priority | Original: Critical [ 2 ] | New: Blocker [ 1 ] |
Attachment | New: iws23.dmesg [ 15352 ] | |
Attachment | New: lustre-log.1404916388.10827.txt [ 15353 ] | |
Attachment | New: lustre-log.1404916402.10701.txt [ 15354 ] | |
Attachment | New: lustre-log.1404916421.10764.txt [ 15355 ] |
Assignee | Original: WC Triage [ wc-triage ] | New: Hongchao Zhang [ hongchao.zhang ] |