[LU-15961] umount shouldn't cause recovery abort Created: 20/Jun/22 Updated: 13/Jul/22 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Minor |
| Reporter: | Alex Zhuravlev | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
when server is in recovery simple umount forces recovery abort and then clients can get IO error. one practical example was seen with corosync/pacemaker when a failed server returns and HA decides to move service-in-recovery to the old node:
Lustre: lustre-MDT0000: Will be in recovery for at least 1:00, or until 1 client reconnects
Lustre: Failing over lustre-MDT0000
LustreError: 6277:0:(ldlm_lib.c:2876:target_stop_recovery_thread()) lustre-MDT0000: Aborting recovery
Lustre: 6186:0:(ldlm_lib.c:2283:target_recovery_overseer()) recovery is aborted, evict exports in recovery
Lustre: lustre-MDT0000: Recovery over after 0:03, of 1 clients 0 recovered and 1 was evicted.
|