Loading...

XML

Word

Printable

Type: New Feature
Resolution: Not a Bug
Priority: Major
Fix Version/s: None
Affects Version/s: Lustre 2.10.0, Lustre 2.11.0
Labels:
None

Epic/Theme:
- DNE
- DNE2
- dne
Rank (Obsolete):
9223372036854775807

'lctl abort_recovery' doesn't abort recovery between MDTs, because unlike abort recovery on single MDT system which only fail unfinished operations, this may break system consistency, so as a tradeoff, Lustre chose consistency over availability. But there are two major causes if recovery between MDTs doesn't finish, the first is network issue, for this type, we can wait indefinitely for network to recover, while the second is software bug, which is difficult for user to fix manually on backend filesystem.

Now lfsck is ready, which can fix inconsistency in the system. So we should provide an option to allow user to abort recovery between MDTs, and then fix inconsistencies.

is related to

LU-11111 crash doing LFSCK: orph_index_insert()) ASSERTION( !(obj->mod_flags & ORPHAN_OBJ)

Resolved

LU-11419 lfsck does not complete phase2

Resolved

LU-12546 add option to abort recovery between MDTs but not between client/MDT

Resolved

Assignee:: Hongchao Zhang

Reporter:: Lai Siyao

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Due:: 09/Jul/19

Created:: 09/Apr/18 7:28 AM

Updated:: 25/Nov/19 8:07 PM

Resolved:: 15/Jul/19 12:26 PM

Details

Description

Attachments

Issue Links

Activity

People

Dates