[LU-13446] Security hole in default e2fsck behavior for duplicate blocks Created: 10/Apr/20  Updated: 19/Sep/22  Resolved: 19/Sep/22

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Artem Blagodarenko (Inactive) Assignee: WC Triage
Resolution: Won't Fix Votes: 0
Labels: None

Issue Links:
Related
is related to LU-13650 ldiskfs: enable metadata_csum on MDT/OST Open
is related to LU-16171 e2fsck should handle multiply-claimed... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

When e2fsck detects multiply-claimed blocks, the default repair behavior is to clone the duplicate blocks. This is guaranteed to result in data corruption and is also a security hole. Typically, one of the inodes with multiply-claimed blocks is valid, the others have corrupt extent data referencing some of the same disk blocks as the valid inode. e2fsck has no way to determine which inode is the rightful owner of the blocks. When e2fsck is run with the -y option and duplicate blocks are cloned, those duplicate data blocks from the valid inode or object are replicated to other objects.

In some cases it has been possible to identify which of the inodes has valid extent data (based on parent fid/file name, examination of data blocks). In that case, the problem inodes with conflicting disk block references can be cleared. This avoids the security problem, but it requires extensive manual intervention, and isn't always possible.

e2fsck has some extended options that provide different ways of handling duplicate blocks. From the e2fsck man page:

 

 
clone=dup|zero
Resolve files with shared blocks in pass 1D by giving each file a private copy of the blocks (dup); or replacing the shared blocks with private, zero-filled blocks (zero). The default is dup.
 
shared=preserve|lost+found|delete
Files with shared blocks discovered in pass 1D are cloned and then left in place (preserve); cloned and then disconnected from their parent directory, then reconnected to /lost+found in pass 3 (lost+found); or simply deleted (delete). The default is preserve.

 

The default behavior can be changed with modifications to the e2fsck.conf file. The default behavior for our CS systems should be changed, but not sure of the best option. Initially clone=dup with shared=lost+found is the best choice, since that should preserve the valid objects, which could potentially be manually recovered later. And it would not leave the invalid objects around, accessible to users. But for OSTs, the automatic restore of lost+found OST objects would interfere, putting those objects back into the OST namespace making the bad data available to users.

The 'clone=zero' option is probably safest in terms of avoiding sharing user data, but that would trash the good objects, the rightful "owners" of the duplicate disk blocks.

It would be better if there were some way to identify the inode to which those multiply-claimed blocks actually belong, then e2fsck could clear the other inodes, or allocate new blocks to those and zero out the data.

This issue affects all releases, all versions of e2fsprogs. It can be a problem on any ext or ldiskfs file system. The security angle is more of an issue for OSTs today, since that's where actual user data resides. Of course that changes with DoM.



 Comments   
Comment by Andreas Dilger [ 14/Apr/20 ]

Getting the patch included upstream for the duplicate blocks handling would be good. AFAIR, the obstacle to this in the past was figuring out how to set the dup|zero|preserve|unlink options in e2fsck.conf but being able to override them from the command line. Maybe that has since been fixed in the patch?

There are definitely some options for automatically detecting which inode is the correct parent for the duplicate blocks. The badblocks check can identify if one of the inodes has a bunch of other errors, and clear that inode first, avoiding the clone completely, since pass1b/1c can be very slow. It may be that the bad blocks code has broken over time and no longer detects this correctly, or checks it too late to prevent the duplicate blocks handling? It may also be that we could add some extra Lustre-specific checks to increase inode badness, like validating the content of the "fid" xattr to compare against the object name in O/<seq>/dN in case an inode block was written to the wrong location.

If the metadata_csum feature is enabled then it would be possible to increase the inode badness for inodes that reference the wrong indirect/index blocks for regular files and directory leaf blocks, since the checksum contain the inode number as part of the seed. We haven't really tested metadata_csum with Lustre due to past conflicts with dir_data on the MDT, but it should be OK on the OST, and getting it working on the MDT would also be good.

I think whether the global behavior is changed by default depends on the site, whether the file data is more precious or security is more important. Obviously, making an automatic (but correct) decision using the above information would be best. Since this can be set via e2fsck.conf it can be decided on a per-site or per-build basis?

Comment by Gerrit Updater [ 19/Apr/21 ]

Artem Blagodarenko (artem.blagodarenko@hpe.com) uploaded a new patch: https://review.whamcloud.com/43370
Subject: LU-13446 e2fsck: zero-fill shared blocks by default
Project: tools/e2fsprogs
Branch: master-lustre
Current Patch Set: 1
Commit: a28451e5ebc4628678d39c8a57ff241268fad32a

Comment by Andreas Dilger [ 19/Sep/22 ]

Patch was abandoned.

Generated at Sat Feb 10 03:01:21 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.