[LU-15961] umount shouldn't cause recovery abort Created: 20/Jun/22  Updated: 13/Jul/22

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Minor
Reporter: Alex Zhuravlev Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Related
Rank (Obsolete): 9223372036854775807

 Description   

when server is in recovery simple umount forces recovery abort and then clients can get IO error. one practical example was seen with corosync/pacemaker when a failed server returns and HA decides to move service-in-recovery to the old node:

Lustre: lustre-MDT0000: Will be in recovery for at least 1:00, or until 1 client reconnects
Lustre: Failing over lustre-MDT0000
LustreError: 6277:0:(ldlm_lib.c:2876:target_stop_recovery_thread()) lustre-MDT0000: Aborting recovery
Lustre: 6186:0:(ldlm_lib.c:2283:target_recovery_overseer()) recovery is aborted, evict exports in recovery
Lustre: lustre-MDT0000: Recovery over after 0:03, of 1 clients 0 recovered and 1 was evicted.

Generated at Sat Feb 10 03:22:47 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.