Details
-
Improvement
-
Resolution: Unresolved
-
Minor
-
None
-
None
-
None
-
9223372036854775807
Description
when server is in recovery simple umount forces recovery abort and then clients can get IO error. one practical example was seen with corosync/pacemaker when a failed server returns and HA decides to move service-in-recovery to the old node:
Lustre: lustre-MDT0000: Will be in recovery for at least 1:00, or until 1 client reconnects
Lustre: Failing over lustre-MDT0000
LustreError: 6277:0:(ldlm_lib.c:2876:target_stop_recovery_thread()) lustre-MDT0000: Aborting recovery
Lustre: 6186:0:(ldlm_lib.c:2283:target_recovery_overseer()) recovery is aborted, evict exports in recovery
Lustre: lustre-MDT0000: Recovery over after 0:03, of 1 clients 0 recovered and 1 was evicted.