Details
-
Improvement
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.15.0
-
None
-
9223372036854775807
Description
With idle-disconnect code a situation can happen where entire cluster is idle for some time and as all the servers restart, the recovery on OSTs does not start as there are no client connections. The MDTs connections to OSTs are rejected because those are considered to be new connections.
We need to either accept new MDTs in similar to how we do when MDT and OST are colocated on the same node or we need to start the recovry time on first such connection and then proceed with the eviction as the timeout expires to allow them to rejoin as the new clients they are.
Failing to do this would cause entire cluster delay as the idle-disconnected clients become active again and would need to wait for the recovery to finish first even if the servers restart happened long ago