Details
-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
Lustre 2.16.0
-
None
-
3
-
9223372036854775807
Description
In LU-16159, the update logs are canceled upon recovery, which will cause inconsistencies in the filesystem. LFSCK should be able to fix these inconsistencies.
This is visible in tests like replay-single test_70b that sometimes leave an undeletable directory behind after test completion (LU-10616). There are various workarounds (e.g. LU-16335 to use "lfs rm_entry" to unlink the directory from the namespace, or EX-6692 to reformat the filesystem), but it would be much better to have LFSCK fix these directories and/or allow them to actually be unlinked from the filesystem.
Attachments
Issue Links
- is related to
-
LU-16159 remove update llog files after recovery abort
-
- Resolved
-
-
LU-16335 "lfs rm_entry" failed to remove broken directories
-
- Resolved
-
- is related to
-
LU-10616 replay-single test_70b fails with 'rundbench load on <hostname(s)> failed!'
-
- Open
-
-
LU-15624 replay-single and ost-pools failed: rm: cannot remove 'd70b.replay-single': Directory not empty
-
- Open
-
-
LU-16065 replay-single test_81a: rm remote dir failed
-
- Open
-
-
LU-14470 striped directory layout mismatch after failover
-
- Resolved
-
Yes,
LU-14470can help create failure, and beyond that, we need to consider other distributed transaction replay as well, e.g. migration and restripe. Besides, if client replay is aborted as well, it may still leave dangling name entries.I didn't test yet, IMHO LFSCK won't simply move dangling name entries to lost+found.