[LU-5292] LFSCK 5: log failed items during the first-stage scanning, and retry them before the handling orphan objects Created: 03/Jul/14 Updated: 13/Feb/19 |
|
| Status: | In Progress |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | New Feature | Priority: | Major |
| Reporter: | nasf (Inactive) | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Rank (Obsolete): | 14763 |
| Description |
|
For the MDT-OST consistency verification (LFSCK 2), during the first-stage scanning, if the LFSCK on the MDT wants to verify some LOV EA slot (OST-object) but related OST is unavailable at that time (maybe because of the bad network), then from the OST view, it does not know whether the OST-object is reference by some MDT-object or not. And if the OST-object stores an invalid PFID EA, then it may misguide the LFSCK to take some incorrect reparation when handles orphan OST-objects. To prevent that, the LFSCK marks in the LFSCK tracing file (lfsck_layout) to indicate that the LFSCK failed to verify some LOV EA slots during the first-stage scanning and may cannot handle orphan OST-objects properly. Then the LFSCK will skip skip orphan OST-objects handling. For MDT-MDT consistency verification (LFSCK 3), during the first-stage scanning, if the LFSCK on the MDT wants to verify some name entry for remote MDT-object but related MDT is unavailable at that time (maybe because of the bad network), then from the failed MDT view, it does not know whether related MDT-object is referenced by some name entry or not. And if the MDT-object stores an invalid linkEA, then it may misguide the LFSCK to take some incorrect reparation when handles orphan MDT-objects. Such case is quite similar as above MDT-OST case. So it is better to use the same repairing logic to unify the LFSCK behavior. Means the LFSCK will mark in the LFSCK tracing file (lfsck_namespace) to indicate that the LFSCK failed to verify some name entries during the first-stage scanning and may cannot handle orphan MDT-objects properly. Then the LFSCK will skip skip orphan MDT-objects handling. Currently, above marking mechanism is per-system based. Means as long as one target (MDT or OST) is detected by the LFSCK as unavailable during the first-stage scanning, then the LFSCK will skip related orphan (MDT or OST)-objects handing on all targets. Honestly speaking, above mechanism is NOT efficient. In fact, one target failure should not affect others. For example, if the MDT_A tries to verify the OST-object on the OST_B that is unavailable, then only the orphans on the OST_B will be affected, the orphan OST-objects on other OSTs can be handled normally. So as an improvement, the LFSCK will record the failed targets in the LFSCK tracing file (lfsck_layout, lfsck_namespace), and only skip the orphan objects handling for the failed targets that are recorded in the LFSCK tracing file. Further more, if a target become unavailable because of bad network, it maybe not a fatal issue for subsequent orphan objects handling. If the LFSCK can record the failed items, and re-verfify those failed items successfully before the orphan objects handling, then it is unnecessary to skip orphan objects handling on the failed targets at all. Currently, the LFSCK uses two-stages scanning framework for the whole system consistency verification. To support to retry failed items before orphan objects handling, we will introduce a new stages scanning between the original first-stage scanning and the original second-stage scanning. The orphan object handling phase will be moved to the new third-stage scanning, and the new second-satge scanning will focus on the failed items re-verifcation. As for how to record the failed items, there are two possible solution: 1) For each LFSCK instance, it will maintain per-target based log file, mean each target has its own log file. The advantage is that: It is each to know whether some target has failed items or not, and what the failed items are. The short-coming is the scalability: if there are N targets in the system, the total count of the log files for that will be (N - 1) * N at the worst case, we need some efficient way to maintain so much log files. 2) For each LFSCK instance, it will maintain per-system based log file, mean all targets shares a single log file. The advantage and short-coming are contrary against the solution 1. From the implementation view, the solution 2 may be relative easy. Anyway, it will be as common solution for both layout LFSCK and namespace LFSCK within the same framework. The basic improvement of recording failed targets in the LFSCK tracing file can be done in Lustre-2.7; as for the further improvement of recording failed items and re-verifying them, it will be done in LFSCK phase IV, but if we have enough time for Lustre-2.7, we can take some LFSCK phase 4 tasks into Lustre-2.7, and this will be one candidate. |