[LU-13446] Security hole in default e2fsck behavior for duplicate blocks Created: 10/Apr/20 Updated: 19/Sep/22 Resolved: 19/Sep/22 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Artem Blagodarenko (Inactive) | Assignee: | WC Triage |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
When e2fsck detects multiply-claimed blocks, the default repair behavior is to clone the duplicate blocks. This is guaranteed to result in data corruption and is also a security hole. Typically, one of the inodes with multiply-claimed blocks is valid, the others have corrupt extent data referencing some of the same disk blocks as the valid inode. e2fsck has no way to determine which inode is the rightful owner of the blocks. When e2fsck is run with the -y option and duplicate blocks are cloned, those duplicate data blocks from the valid inode or object are replicated to other objects. In some cases it has been possible to identify which of the inodes has valid extent data (based on parent fid/file name, examination of data blocks). In that case, the problem inodes with conflicting disk block references can be cleared. This avoids the security problem, but it requires extensive manual intervention, and isn't always possible. e2fsck has some extended options that provide different ways of handling duplicate blocks. From the e2fsck man page:
The default behavior can be changed with modifications to the e2fsck.conf file. The default behavior for our CS systems should be changed, but not sure of the best option. Initially clone=dup with shared=lost+found is the best choice, since that should preserve the valid objects, which could potentially be manually recovered later. And it would not leave the invalid objects around, accessible to users. But for OSTs, the automatic restore of lost+found OST objects would interfere, putting those objects back into the OST namespace making the bad data available to users. The 'clone=zero' option is probably safest in terms of avoiding sharing user data, but that would trash the good objects, the rightful "owners" of the duplicate disk blocks. It would be better if there were some way to identify the inode to which those multiply-claimed blocks actually belong, then e2fsck could clear the other inodes, or allocate new blocks to those and zero out the data. This issue affects all releases, all versions of e2fsprogs. It can be a problem on any ext or ldiskfs file system. The security angle is more of an issue for OSTs today, since that's where actual user data resides. Of course that changes with DoM. |
| Comments |
| Comment by Andreas Dilger [ 14/Apr/20 ] |
|
Getting the patch included upstream for the duplicate blocks handling would be good. AFAIR, the obstacle to this in the past was figuring out how to set the dup|zero|preserve|unlink options in e2fsck.conf but being able to override them from the command line. Maybe that has since been fixed in the patch? There are definitely some options for automatically detecting which inode is the correct parent for the duplicate blocks. The badblocks check can identify if one of the inodes has a bunch of other errors, and clear that inode first, avoiding the clone completely, since pass1b/1c can be very slow. It may be that the bad blocks code has broken over time and no longer detects this correctly, or checks it too late to prevent the duplicate blocks handling? It may also be that we could add some extra Lustre-specific checks to increase inode badness, like validating the content of the "fid" xattr to compare against the object name in O/<seq>/dN in case an inode block was written to the wrong location. If the metadata_csum feature is enabled then it would be possible to increase the inode badness for inodes that reference the wrong indirect/index blocks for regular files and directory leaf blocks, since the checksum contain the inode number as part of the seed. We haven't really tested metadata_csum with Lustre due to past conflicts with dir_data on the MDT, but it should be OK on the OST, and getting it working on the MDT would also be good. I think whether the global behavior is changed by default depends on the site, whether the file data is more precious or security is more important. Obviously, making an automatic (but correct) decision using the above information would be best. Since this can be set via e2fsck.conf it can be decided on a per-site or per-build basis? |
| Comment by Gerrit Updater [ 19/Apr/21 ] |
|
|
| Comment by Andreas Dilger [ 19/Sep/22 ] |
|
Patch was abandoned. |