LFSCK phase II technical debts
(LU-4701)
|
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.6.0 |
| Fix Version/s: | Lustre 2.6.0 |
| Type: | Technical task | Priority: | Blocker |
| Reporter: | Andreas Dilger | Assignee: | nasf (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Rank (Obsolete): | 13531 | ||||||||
| Description |
|
Per discussion today, LFSCK phase 2 should not create empty OST objects by default for MDT LOV layouts that reference missing OST objects. By default LFSCK should log an error (see The administrator can specify an option to delete files with dangling links, or create empty objects to fix the dangling reference. Otherwise, it should leave the dangling reference unfixed. There should be a generic mechanism for specifying different repair options, including a way of specifying defaults for all of the repair options in a file. |
| Comments |
| Comment by nasf (Inactive) [ 14/Apr/14 ] |
|
Sorry, I still not quite understand the shortcoming (or bad effect) of creating the lost OST-object. On the other hand, as my understand, the option of deleting the file just because of losing some OST-object may be not a good choice. Be as a distributed filesystem, lost one stripe does not mean lost all. We should try to keep the data as much as possible instead of destroying something just like requirement in the solution architecture document. Another reason for NOT deleting the file which lost some of its stripe: some of the LOV EA slot may be wrong, means that the LOV EA may be invalid and claims non-exist OST-object, but as the LFSCK processing, we can find out the lost OST-object when handle orphans. If the LFSCK deletes the file at the first-stage scanning, then we will lose the chance to repair the bad LOV EA. The third reason is that: the layout LFSCK does not understand the namespace, so it it some hack for the layout LFSCK to remove name entry from its parent directory (the worse case is that it may has no [valid] linkEA), especially for multiple linked files. So my suggestion is that: if we really want to give an option to the administrator to delete the file that lost some of stripe(s), then we can link the file to .lustre/lost+found/MDTxxxx/ with special name, and if the LFSCK can repair it finally, then unlink it from the .lustre/lost+found/MDTxxxx/, and its original name entry is still in the normal namespace; otherwise, the administrator can easily to know which files have dangling reference, and can destroy them manually if want. So the options for dangling reference case will be two: 1) Link the file to .lustre/lost+found/MDTxxxx/ with special name without creating the lost OST-object. (by default) 2) Keep it in the namespace and re-create the lost OST-object (as the current LFSCK does). Similar for MDT-MDT consistency. The MDT-MDT LFSCK can find back the linkEA entries as much as possible (unless the linkEA entries exceeds the limitation). Then after the whole LFSCK, the administrator can easily check something under .lustre/lost+found/MDTxxxx. Honestly, I do NOT hope that the LFSCK keeps the dangling inconsistency cases there only with some error reported unless "dryrun" is specified; otherwise, add something under .lustre/lost+found/MDTxxxx/ is more convenient for the administrator. Because the error log can be removed and may be over-wriiten, but the record under .lustre/lost+found/ will be there. |
| Comment by Andreas Dilger [ 17/Apr/14 ] |
|
The bad effect of automatically creating objects for dangling layout references is that this hides filesystem corruption from users, and means that users may read bad (zero) data from the repaired file. That may cause the application to compute the wrong result instead of causing an error and alerting the user that the file data was lost. |
| Comment by nasf (Inactive) [ 17/Apr/14 ] |
|
Then the LFSCK will give two options for dangling reference case: 1) report error via LFSCK log, currently it is CDEBUG(D_LFSCK). (by default) Generally, there should be another option to delete the file with dangling reference, but because layout LFSCK does not understand the namespace, we can consider to enhance it in the LFSCK phase 3. |
| Comment by nasf (Inactive) [ 17/Apr/14 ] |
|
Here is the patch: |
| Comment by nasf (Inactive) [ 30/Apr/14 ] |
|
The patch has been landed to master. |