[LU-4553] LFSCK 5: LFSCK behaviour if an OST is permanently unavailable Created: 28/Jan/14 Updated: 13/Feb/19 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.6.0 |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major |
| Reporter: | Andreas Dilger | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||
| Rank (Obsolete): | 12441 | ||||||||||||
| Description |
|
We had a discussion during the 2.6 testing concall about what the behaviour of LFSCK is, or should be, if there an OST that is permanently unavailable? If the administrator has permanently marked the OST inactive, will LFSCK ever be able to complete? Will it be able to detect and report objects that are on the failed OST? Is there an option to delete such files? Will it create new objects on a different OST? If there are no files referencing objects on the inactive OST (e.g. if administrator has previously run "lfs getstripe -O {OST}" or LFSCK to find and delete such files) will it be possible for LFSCK to complete? |
| Comments |
| Comment by nasf (Inactive) [ 29/Jan/14 ] |
|
According to current implementation, it allows to run lfsck_layout on part of OSTs, means as long as at least 1 OST joined the lfsck_layout, the scanning can start. If the LFSCK found that some OST-object exists on the OST which does not join the lfsck_layout (may be permanently unavailable, or crashed during the LFSCK scanning), it will add some flags, and continue the scanning, but the last status will be "incomplete" because some OST-objects have not been verified. Currently, the LFSCK can detect the case of missed some OST(s), but it will neither delete related file(s) nor related OST-object(s), it also cannot create new OST-object on different OST, because it does not know the data of the original OST-object. |
| Comment by nasf (Inactive) [ 14/Feb/14 ] |
|
> We had a discussion during the 2.6 testing concall about what the behaviour of LFSCK is, or should be, if there an OST that is permanently unavailable? > If the administrator has permanently marked the OST inactive, will LFSCK ever be able to complete? > Will it be able to detect and report objects that are on the failed OST? Is there an option to delete such files? > Will it create new objects on a different OST? > If there are no files referencing objects on the inactive OST (e.g. if administrator has previously run "lfs getstripe -O {OST}" or LFSCK to find and delete such files) will it be possible for LFSCK to complete? |
| Comment by nasf (Inactive) [ 07/Apr/14 ] |
|
Andreas, what do you want me to do next step for this ticket? |
| Comment by Andreas Dilger [ 13/Apr/14 ] |
|
Make sure that the process in the Lustre manual is correct for deleting files with stripes on the removed OST using "lfs find". This partly relates to |
| Comment by Andreas Dilger [ 13/Apr/14 ] |
|
Preferred solution is to add an option to LFSCK to delete all files on the specified OST index, which it will verify is currently marked inactive and/or not present in the filesystem configuration. |
| Comment by nasf (Inactive) [ 17/Apr/14 ] |
|
To make the LFSCK robust enough, we allow the LFSCK to run with some OSTs unavailable (or not join the LFSCK). It is normal that some OSTs may mount up after the LFSCK started. So from the LFSCK view, even if some stipe resides on some OST which is not up yet, the LFSCK cannot know whether the target OST is really permanently unavailable or not. To be safe, I prefer to make another "lfs destroy" tool to delete the files on specified OST, in spite of it is permanently unavailable or not. It seems more clean for me. |