Details
-
Bug
-
Resolution: Won't Fix
-
Major
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
When e2fsck detects multiply-claimed blocks, the default repair behavior is to clone the duplicate blocks. This is guaranteed to result in data corruption and is also a security hole. Typically, one of the inodes with multiply-claimed blocks is valid, the others have corrupt extent data referencing some of the same disk blocks as the valid inode. e2fsck has no way to determine which inode is the rightful owner of the blocks. When e2fsck is run with the -y option and duplicate blocks are cloned, those duplicate data blocks from the valid inode or object are replicated to other objects.
In some cases it has been possible to identify which of the inodes has valid extent data (based on parent fid/file name, examination of data blocks). In that case, the problem inodes with conflicting disk block references can be cleared. This avoids the security problem, but it requires extensive manual intervention, and isn't always possible.
e2fsck has some extended options that provide different ways of handling duplicate blocks. From the e2fsck man page:
clone=dup|zero |
Resolve files with shared blocks in pass 1D by giving each file a private copy of the blocks (dup); or replacing the shared blocks with private, zero-filled blocks (zero). The default is dup. |
shared=preserve|lost+found|delete |
Files with shared blocks discovered in pass 1D are cloned and then left in place (preserve); cloned and then disconnected from their parent directory, then reconnected to /lost+found in pass 3 (lost+found); or simply deleted (delete). The default is preserve. |
The default behavior can be changed with modifications to the e2fsck.conf file. The default behavior for our CS systems should be changed, but not sure of the best option. Initially clone=dup with shared=lost+found is the best choice, since that should preserve the valid objects, which could potentially be manually recovered later. And it would not leave the invalid objects around, accessible to users. But for OSTs, the automatic restore of lost+found OST objects would interfere, putting those objects back into the OST namespace making the bad data available to users.
The 'clone=zero' option is probably safest in terms of avoiding sharing user data, but that would trash the good objects, the rightful "owners" of the duplicate disk blocks.
It would be better if there were some way to identify the inode to which those multiply-claimed blocks actually belong, then e2fsck could clear the other inodes, or allocate new blocks to those and zero out the data.
This issue affects all releases, all versions of e2fsprogs. It can be a problem on any ext or ldiskfs file system. The security angle is more of an issue for OSTs today, since that's where actual user data resides. Of course that changes with DoM.