|
We've seen that the rm_entry workaround to "hide" the bad entry is only temporary, and running LFSCK on the filesystem will restore the broken entry back to .lustre/lost+found/<fsname>-MDT0000 where it will again be undeletable.
We either need to be able to delete such a directory with missing stripes using "rmdir" if we are sure the MDT is available but the the stripe is missing, or have LFSCK fix the missing stripe in the directory so that it can be removed normally.
Is it possible that patch https://review.whamcloud.com/47385 "LU-14470 dne: striped mkdir replay by request" will avoid such recovery failures by allowing the client to recover the broken directory even when the MDT recovery is aborted?
|
|
Yes, LU-14470 can help create failure, and beyond that, we need to consider other distributed transaction replay as well, e.g. migration and restripe. Besides, if client replay is aborted as well, it may still leave dangling name entries.
I didn't test yet, IMHO LFSCK won't simply move dangling name entries to lost+found.
|