Details
-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
Lustre 2.10.0
-
Soak performance cluster
-
3
-
9223372036854775807
Description
We have now two routers, from IB to OPA clients, soak-14/15
After several hours of running, soak-15 crashed hard. Multiple NMI on multiple CPU.
Output is somewhat messy
Crash dump is available on the node. vmcore-dmesg and console log attached
Attachments
Issue Links
- is related to
-
LU-9769 Exit from function with acquired lock (lost lock).
-
- Resolved
-
After looking at the core, it appears that all the CPUs are stuck on CPT 0.
LU-9769would have an impact if an older lnetctl was used to delete a net, but provided the wrong net_id. So it could potentially be an issue in that scenario. It would be good to run with that patch just in case..