Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
Lustre 2.6.0
-
Hyperion - 2.5.60 build 2538
-
3
-
14694
Description
After hard failover of devices to server iws19, server wedged, then hit LBUG.
Services never complete recovery, and appear to either restart the timer or something:
Jun 27 10:08:53 iws19 kernel: Lustre: lustre-OST000c: Will be in recovery for at least 2:30, or until 316 clients reconnect Jun 27 10:11:53 iws19 kernel: Lustre: lustre-OST000c: recovery is timed out, evict stale exports Jun 27 10:18:30 iws19 kernel: Lustre: lustre-OST000c: Client c52d4856-d1df-b87b-911c-f1bfbc23a24d (at 192.168.124.182@o2ib) reconnecting, waiting for 316 clients in recovery for 2:27
The server reports being cpu-bound prior to the failure
Console log attached - unfortunately the dump after the LBUG failed.