[LU-13717] Client-side encryption - support file name encryption Created: 25/Jun/20  Updated: 20/Sep/22  Resolved: 18/Jan/22

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.15.0

Type: New Feature Priority: Minor
Reporter: Sebastien Buisson Assignee: Sebastien Buisson
Resolution: Fixed Votes: 0
Labels: encryption, sec

Issue Links:
Related
is related to LU-15928 pass key length as part of index API Open
is related to LU-15787 clients built without encryption supp... Resolved
is related to LU-15855 Return -ENOKEY to encrypt-unaware cli... Resolved
is related to LU-15911 Make enable_filename_encryption tunab... Resolved
is related to LU-12275 Client-side file data encryption Resolved
is related to LU-14677 lfs migrate/mirror of encrypted files Resolved
is related to LU-15027 sanity-sec: crash in __list_add_valid... Resolved
is related to LU-15406 Using the native fscrypt for Ubuntu20... Resolved
is related to LU-15407 fscrypt does not work on Ubuntu 5.8 k... Resolved
is related to LU-15810 Migrated encrypted directory lacks on... Resolved
is related to LU-15922 Interop: sanity-sec test_46: File nam... Resolved
is related to LU-15848 ldiskfs: escape encrypted file names Resolved
is related to LU-15827 BUG: KASAN: slab-out-of-bounds in osd... Resolved
is related to LU-16085 Ubuntu 22.04 sanityn test_106c: suppo... Resolved
Sub-Tasks:
Key
Summary
Type
Status
Assignee
LU-15654 Create Test Plan for File Name Encryp... Technical task Resolved Sebastien Buisson  
LU-15790 Fix mirror read/write in Test Plan fo... Technical task Open WC Triage  
Rank (Obsolete): 9223372036854775807

 Description   

This ticket is a place-holder to describe work to be done for client-side file name encryption. This is a follow-up of LU-12275, so all principles and concepts detailed there are still valid.

In particular, filename encryption principles are explained in this document:
https://jira.whamcloud.com/secure/attachment/32657/lustre_encryption_modes_usage.txt

Further details will be added as the feature design makes progress.



 Comments   
Comment by Sebastien Buisson [ 25/Jun/20 ]

One of the consequences of file name encryption is that ciphertext names are no longer valid file names, and not even well-formed strings, because they are binary data. These ciphertext names are sent between clients and servers to ensure data privacy.

Because the underlying fscrypt API supports file names of up to NAME_MAX length, the ciphertext names cannot be base64 encoded (or similar) before being sent to server side. Otherwise the NAME_MAX limit would be exceeded, as encoding binary into text does not have a 100% space efficiency (e.g. base64 has 75%).

  • this implies that RMF_NAME cannot be considered any longer as a string in ptlrpc requests. Similarly, when packing file names, they cannot be checked with lu_name_is_valid*().
  • another implication is in the OSD layer on server side. All file names are currently treated as char *, which cannot work with ciphertext names.

For the OSD layer, it could be possible to change how struct dt_key * parameters are passed: instead of just a char *, it could be turned into a struct lu_name *, so that the actual size of the parameter is correctly handled, and carefully passed to the backend fs layer. Even on CentOS 7 ldiskfs seems capable of handling 'binary names', and ZFS should as well.
However, this change draws many code modifications under lustre/lfsck, lustre/md* (server side), lustre/obdclass, lustre/osd-* and lustre/target, so it has to be thought thoroughly.

Comment by Andreas Dilger [ 25/Jun/20 ]

I think it is important to check how this is handled with upstream ext4? My understanding is that they use some form of encoding (base64 or similar), which allows encrypted filenames of up to NAME_MAX-16 = 240 characters. Having semi-normal filenames (e.g. avoiding non-printable characters) avoids significant problems with the code and tools.

For ext4 htree directory hashing, these encoded names are used to determine the htree leaf/bucket that they are inserted into, and the names are only decrypted for display if the user has the key in their keyring, and incoming names are encrypted before being passed to the lower layers. All filesystem operations are done on the cyphertext name so that it still works properly if the user does not have the key (e.g. the admin can still list and delete such files).

Comment by Sebastien Buisson [ 26/Jun/20 ]

ext4 relies entirely on fscrypt for encryption support. As fscrypt allows the full 255 bytes (NAME_MAX), so does ext4. Clear text file names are NUL-padded to a 32-byte boundary, in order to reduce leakage of filename lengths via their ciphertexts, and then encrypted. The resulting ciphertext has the same length as the NUL-padded clear text.
This means ext4 cannot make use of any form of binary-to-text encoding internally, otherwise file names would exceed the NAME_MAX limit.

fscrypt does make use of base64 encoding however, in order to present files in case the encryption key has not been provided. This is what they call the conversion from disk space to user space, and it still allows for lookup and unlink for instance when the key is not available. When the base64 encoding of the ciphertext name to be presented to 'user space' would exceed NAME_MAX, fscrypt uses an abbreviated form, that is built from base64-encoded partial ciphertext concatenated to hash and minor hash provided by the file system (see fscrypt_digest_name in include/linux/fscrypt.h).

Comment by Sebastien Buisson [ 08/Sep/20 ]

Another consequence of file name encryption comes from the need to allow a number of syscalls when accessing Lustre without the key:

  • read file metadata (stat);
  • list directories;
  • remove files and directories.

The standard way for fscrypt to address this use case is to provide users with an encoded version of the cipher text names. More specifically, cipher text names are base64 encoded and presented to users. These 'scrambled' names can be used to stat, list, and remove directory entries without the encryption key.
Problem is base64 encoding has a 75% space efficiency. So it would not be possible to present to users the encoded version of a NAME_MAX long cipher text name. To work around this, fscrypt instead computes a digested, abbreviated form of long cipher text names (longer than 32 bytes currently). This means lookup operations have to compared this digested name with the digested form of directory entries in order to find a match (this is what fscrypt_match_name() function does).

With Lustre, we cannot leverage this mechanism, as server side is not (entirely) aware of encryption. So we differ a little bit from fscrypt internal implementation, and benefit from Lustre specificities to provide users with our own digested form of cipher text names: this is the base64 encoding of this structure:

struct ll_digest_filename {
	struct lu_fid ldf_fid;
	char ldf_hash[0];
};

ldf_fid is the file's FID, and is 16 bytes long.
ldf_hash is the computed hash of the cipher text name.
Because we do not want fscrypt to try to make a digest of this Lustre-specific digested name, its whole length must fit into 32 bytes. So we use md5sum hash algorithm to generate a 16 bytes long hash of the cipher text name. Moreover, base64 encoding is in this case guaranteed to be (far) shorter than NAME_MAX.

When users do lookup, getattr or unlink operations on such digested names, Lustre client side extracts the FID and the hash. Those are put in the request sent to server side, the hash being substituted to the usual name. So we leverage Lustre's ability to perform operations on FIDs, but the hash is still necessary to make the difference between hard links. Indeed, server side needs to read LinkEA on the object corresponding to FID, and find the name whose md5sum matches the provided hash.

Comment by Patrick Farrell [ 09/Sep/20 ]

I like this choice to use the FID here - It seems sound.

One thought - It's probably unnecessary to do any md5 hashing of the cipher text name, we could simply truncate the on-disk cipher text name to create the name derived (rather than FID derived) portion of the file name.  There should be no issue with revealing a particular number of bytes of the ciphertext name (given that it is bytes of the ciphertext name, without connection to the plain text name).  And there's no reason to expect an md5 hash of the full name in to the smaller byte range to be less (or more) likely to collide vs a truncated cipher text name - both should be effectively random from an outside perspective.

Comment by Andreas Dilger [ 09/Sep/20 ]

I agree with Patrick, I don't think there is a requirement to re-hash the cyphertext name to use for ldf_hash[], it could just be truncated to fit the 16-byte name. The tradeoff is lack of uniqueness vs. a longer name, but with relatively uniform hash distribution a 16-byte base64 encoded name would have 2^96 different filename combinations, so the chance of a birthday paradox collision would be about 1-in-2^48, which is still very large (1-in-256T) and we are not likely to have directories that large any time soon. We would still need to base64 encode ll_digest_filename because the FID will very likely have NUL bytes in it.

One potential problem I see is that standard base64 is using '/' as one of the characters, in addition to [a-zA-Z+], so presumably fscrypt is using a modified base64 encoding that uses some other character (e.g. '-' or '_')?

I was also temporarily concerned that the inclusion of the FID in the digest filename would expose some information, but in fact, the FID is still exposed with readdir() and any number of other interfaces, and the FID (inode number) is in fact not considered a "secret" for fscrypt at all.

A larger problem is that the linkEA is not guaranteed to contain all of the links of a filename. Also, by that point, if the server needs to use the hashed name to disambiguate hard links, it will have some awareness of fscrypt, and we could just implement full support for looking up the digest name directly as fscrypt does? One main issue would be that since the digest name is truncated, it may hash to a different MDT in a striped directory than the full name, so either we still need to include the FID, or fix the hash function to only use the first N bytes of the name for the hash. Since there was a new CRUSH directory hash function added in 2.14, it could still be possible to modify the hash function to limit the name length for the hash to the first 16 bytes so that the full and truncated hash both map to the same MDT? That would need to be done quickly I think...

Comment by Andreas Dilger [ 09/Sep/20 ]

Another option would be for fscrypt files to limit the hard link count and then require that all the hard links are stored in linkEA? That would mean a hard link limit somewhere between 128-160 names (in theory up to 238 links could fit into 64KiB). Currently the linkEA only stores 4KiB of links, so 14 NAME_MAX=256 links or 119 16-byte names, but this could be increased for fscrypt directories/inodes if necessary, but at some reduction in performance. The main performance concern for large link xattrs is that these are read sequentially and written atomically, so any modification would require the whole xattr (up to 64KiB in with this change) to be rewritten entirely.

How does fscrypt handle the lookup internal to ext4 + htree? Since the digest name is non-reversible, does it keep a lookup table in memory for mapping the digest->filename, or does it do linear scan of every directory entry and compute the digest of each to find the matching filename, or (as proposed above) does htree+fscrypt result in a modified directory hash function that uses the (truncated?) hash of the digest to find the htree leaf block to store/lookup the cyphertext name? Using the (truncated) digest of the filename to locate the htree block would definitely minimize the number of directory entries that needed to be encoded/scanned to find the matching name.

Comment by Andreas Dilger [ 09/Sep/20 ]

Finally, one other thing to consider is that eventually we are going to be asked to handle case-insensitive name lookups. It makes sense to prepare for this eventuality and do e.g. "tolower()" on the filename before generating the name digest/hash. That reduces the randomess of the input slightly, but only affects the digest name and not the actual filename. We would still have the FID to compare against for disambiguation if needed.

Comment by Sebastien Buisson [ 10/Sep/20 ]

Hi,

Thanks all for your comments!

Indeed, using the second group of 16 bytes of the cipher text name could do the trick, instead of a hash. So the structure would become:

struct ll_digest_filename {
	struct lu_fid ldf_fid;
	char ldf_excerpt[0];
};

Taking the second group instead of the first is interesting because it depends on the full plaintext, if I understand correctly how CBS-CTS encryption mode works. I have to think about what to do with shorter names. If the cipher text name can be shorter than 16 bytes, then I would need to treat differently short and long names, by just base64 encoding the cipher text name when it is small for instance. BTW, fscrypt uses an adapted version of base64 encoding, to not have '/' from the possible output characters.

By using this struct made of a FID and an excerpt of the cipher text name, I think the risk of collision is reduced because only hard links can have the same FID. And hard links can have the same clear text names only if they are in different directories, but in this case the names are encoded with different key.

Apart from that, the idea to read the linkEA came from the study of how 'rm by fid' is implemented on server side. So I did not consider it could not contain all links
Looking again at functions mdt_rmfid_one() and mdt_links_read(), I cannot see how the case of links stored outside of the linkEA is handled. Could you point me to somewhere in the code where this is done?

Thanks,
Sebastien.

Comment by Andreas Dilger [ 10/Sep/20 ]

The assumption for "rm-by-FID" is that this is only used for "purge" operations, and if a file with many links does not have all filenames stored in the linkEA, then on the next pass the file will be found under the other filename(s) and removed that way, so 100% accuracy is not required. Also "rm-by-FID" is meant to unlink all names of a file for purge, while "unlink()" in an fscrypt directory is definitely not supposed to do that.

I think if there was some flag like "lookup by hash" or "lookup by prefix" in the RPC then we could pass the encoded name to the MDS and it could do the searching for the few operations that could not be done directly by FID (unlink() of the directory entry being one of those cases). In the most common case where the name is in the link xattr it could find the cyphertext name quickly by only searching for links on the inode (normally only 1), but in the very uncommon case where there are more links than fit into linkea the MDS may need to scan the whole parent directory, compare FIDs for a match, then verify the encoded cyphertext against the encoded name from the client. That would minimize the overhead to a tiny fraction of files unless someone is storing encrypted hard link trees for a backup.

Comment by Sebastien Buisson [ 11/Sep/20 ]

The assumption for "rm-by-FID" is that this is only used for "purge" operations, and if a file with many links does not have all filenames stored in the linkEA, then on the next pass the file will be found under the other filename(s) and removed that way, so 100% accuracy is not required. Also "rm-by-FID" is meant to unlink all names of a file for purge, while "unlink()" in an fscrypt directory is definitely not supposed to do that.

Absolutely, no question about that. This is just that in case of access without the key, lookup and unlink operations are like 'by FID' operations, so I was looking for inspiration in this direction.

I think if there was some flag like "lookup by hash" or "lookup by prefix" in the RPC then we could pass the encoded name to the MDS and it could do the searching for the few operations that could not be done directly by FID (unlink() of the directory entry being one of those cases). In the most common case where the name is in the link xattr it could find the cyphertext name quickly by only searching for links on the inode (normally only 1), but in the very uncommon case where there are more links than fit into linkea the MDS may need to scan the whole parent directory, compare FIDs for a match, then verify the encoded cyphertext against the encoded name from the client. That would minimize the overhead to a tiny fraction of files unless someone is storing encrypted hard link trees for a backup.

I think scanning the whole parent directory is what ext4 does? By calling ext4_match() from ext4_search_dir()? And fscrypt_match_name() (from ext4_match() ) does the job of comparing digests. But what is not clear to me is in which cases ext4_search_dir() is called. Maybe not for all lookups?

How is it possible to know if there are more links outside of linkEA? If it is not, then that would imply scanning the whole parent directory every time. In which case I would tend to prefer to limit the hard link count for encrypted files, so that all the hard links are stored in linkEA.

Comment by Andreas Dilger [ 13/Sep/20 ]

How is it possible to know if there are more links outside of linkEA?

There are a couple of ways to check this:

  • compare the inode i_nlink count with the number of entries in the link xattr
  • the link xattr has the leh_overflow_time field, which is set if a link is created that cannot fit into the xattr
Comment by Sebastien Buisson [ 04/Jan/21 ]

In order to cope with ciphertext names, it was previously considered to switch from char * to struct lu_name * for struct dt_key * parameters everywhere in the OSD layer. The challenge is that ciphertext names are no longer valid file names, and not even well-formed strings, because they are binary data, so their actual size needs to be correctly handled, and carefully passed to the backend fs layer.

An alternative approach could be to base64 encode ciphertext names on client side, before they are put into requests. But because the underlying fscrypt API supports file names of up to NAME_MAX length, base64 encoded ciphertext names can exceed the NAME_MAX limit, as encoding binary into text does not have a 100% space efficiency (e.g. base64 has 75%). So these base64 encoded ciphertext names will need to be decoded before being passed to the backend file system layer. That should be possible to determine if the names we handle in the OSD layer belong to encrypted files or not thanks to the LMAI_ENCRYPT flag.

Comment by Gerrit Updater [ 21/Apr/21 ]

Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/43386
Subject: LU-13717 sec: rework includes for client encryption
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 5f5383f2861858d1c6160f5b621e20ff0fc581bb

Comment by Gerrit Updater [ 21/Apr/21 ]

Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/43394
Subject: LU-13717 sec: filename encryption - client symlink support
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 7fe5c1b87e041d22ebebb1652065f49a6ce52511

Comment by Gerrit Updater [ 21/Apr/21 ]

Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/43387
Subject: LU-13717 sec: limit hard links to linkEA size for enc files
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: a114b2d21485483b53e28f4c68fe161add1dbf94

Comment by Gerrit Updater [ 21/Apr/21 ]

Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/43388
Subject: LU-13717 sec: handle null algo for filename encryption
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 8c89713ee351fc1bd069be83d98ae23ae88188d9

Comment by Gerrit Updater [ 21/Apr/21 ]

Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/43389
Subject: LU-13717 sec: filename encryption - server side
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: d82d3a70963872cace32579f8910118323779e9a

Comment by Gerrit Updater [ 21/Apr/21 ]

Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/43390
Subject: LU-13717 sec: filename encryption - client side
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 4b482a91cd709cb06b00d84fd83f1edf8871ce00

Comment by Gerrit Updater [ 21/Apr/21 ]

Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/43391
Subject: LU-13717 sec: filename encryption - server digest support
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: bc63304dbdc1e4b4c4ff5a85a59144968ada5d98

Comment by Gerrit Updater [ 21/Apr/21 ]

Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/43392
Subject: LU-13717 sec: filename encryption - client digest support
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: b59bc34484229bce3dd8baca64cab73f57dde7e3

Comment by Gerrit Updater [ 21/Apr/21 ]

Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/43393
Subject: LU-13717 sec: filename encryption - server symlink support
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 24773451e2e3f344edc63b941c645ba7417cb6d1

Comment by Gerrit Updater [ 21/Apr/21 ]

Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/43395
Subject: LU-13717 sec: filename encryption - tests
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 9aa45963eaf8e70de07c1304e10aa170e9a3a6da

Comment by Sebastien Buisson [ 21/Apr/21 ]

A series of patches to implement filename encryption has been pushed to Gerrit.

One difficulty is to support "null" algo for filename encryption, inherited from 2.14. This will not be possible with the in-kernel fscrypt library, so the patches modify the behaviour of configure to build with embedded llcrypt by default, and only build against in-kernel fscrypt if explicitly specified via --enable-crypto=in-kernel configure option.

The objective is to urge users to convert their encrypted directories made with 2.14, to the new fashion that encrypts filenames. We have identified some limitations on the old encryption fashion, due to the new code, for instance the inability to perform migrate. So the idea would be to cp the content of an old-fashion encrypted directory to a new-fashion encrypted directory (which would automatically encrypt names), and then remove the old one.

Also note the submitted patches only implement filename encryption for ldiskfs backend for now.

Comment by Sebastien Buisson [ 28/Apr/21 ]

For ZFS backend, I have not managed to find a way to store cipher text names yet. So currently, OSD-ZFS passes base64 encoded names (as received from the client) to/from the ZFS backend. This means we are limited to 160 characters in clear text names of encrypted files/directories. Indeed, if a Lustre file name is longer than 160 chars, its cipher text representation will use at least 192 bytes (name encryption is done in blocks of 32 bytes), and once base64 encoded, we will get a string longer than NAME_MAX because base64 encoding has a 33% overhead.

The problem with ZFS is that the primitives used by the OSD-ZFS layer to lookup, add and delete all assume a char *, and internally call strlen on that parameter:

int zap_lookup(objset_t *os, uint64_t zapobj, const char *name,
    uint64_t integer_size, uint64_t num_integers, void *buf);

int zap_add(objset_t *os, uint64_t zapobj, const char *key,
    int integer_size, uint64_t num_integers,
    const void *val, dmu_tx_t *tx);

int zap_remove(objset_t *os, uint64_t zapobj, const char *name,
    dmu_tx_t *tx);

So cipher text names cannot be passed to these functions.

I realized these functions had a uint64 alternate form, that could be used to pass binary data in, so I tried to substitue one with the other. Unfortunately, it does not seem to work. To illustrate my attempts, and abstract that from the encryption aspects, I have uploaded patch https://review.whamcloud.com/43477. With this patch, osd_dir_lookup calls zap_lookup_uint64 instead of osd_zap_lookup if Project ID on the object is set to 999, and osd_dir_insert calls zap_add_uint64 instead of osd_zap_add if Project ID on the parent is set to 999. So the workflow to test the use of ZAP uint64 alternatives is something along those lines:

# mount -t lustre -o user_xattr,flock lnode-vm2@tcp:/testfs /mnt/testfs
# mkdir /mnt/testfs/dir1
# lfs project -p 999 /mnt/testfs/dir1
# touch /mnt/testfs/dir1/fileA  <-- this adds the dir entry with zap_add_uint64
# umount /mnt/testfs
# mount -t lustre -o user_xattr,flock lnode-vm2@tcp:/testfs /mnt/testfs
# stat /mnt/testfs/dir1/fileA     <-- this looks up with zap_lookup_uint64
stat: cannot stat '/mnt/testfs/dir1/fileA': No such file or directory

I am not sure if it does not work because the uint64 functions are called inappropriately in the patch, or just because those functions are not meant to be used for regular directory entries.

Any help from a ZFS expert would be welcome here.

Comment by Andreas Dilger [ 28/Apr/21 ]

What about an encoding that is more efficient than base64 that could just avoid '/' and '\0' in the filenames? ZFS must be able to handle almost everything else for ISO-8859-1, UTF-8 and other encodings.

There are 16x4-2=126 printable characters in ASCII (excluding '/' and DEL) and 16x12-2=190 printable characters in ISO-8859-1, so making a "base128" from characters 48-111 and 192-255 would be possible (and shouldn't cause too much headache if files need to be accessed unencrypted), which would mean only 8/7 = 14% expansion instead of 6/4 = 33%.

Comment by Sebastien Buisson [ 29/Apr/21 ]

Absolutely, one possibility to mitigate the problem would be to make use of a more efficient encoding algorithm in the osd-zfs layer. On writes, we would have to base64 decode the names received from the client, and re-encode them with the more effective algorithm and write to the ZFS backend. Conversely, we would decode with the more effective algorithm what is read from the ZFS backend, then base64 encode and send that to the client.

Or maybe we could use a more efficient algorithm directly from the client. That would save some unnecessary re-encoding.

Comment by Sebastien Buisson [ 30/Apr/21 ]

I have reworked the way client and server parts are exchanging encrypted file names. Instead of using a base64 encoding that has a 33% overhead, I am using a custom encoding inspired from the yEnc principles (http://www.yenc.org/yenc-draft.1.3.txt).
Basically, it just takes care of a few critical characters, namely NULL, LF, CR, /, DEL and =. Those are replaced with '=' followed by the char value + 64. And all other chars are left untouched. Efficiency of this encoding depends on the occurrences of the critical chars, but statistically on binary data it can be much higher than base64.
For instance, with this encoding, a clear text filename of length <= 224 is almost guaranteed to be accepted with a ZFS backend where we store encrypted filenames in their encoded form. Indeed, a clear text filename of length <= 224 results in a cipher text filename of length 224 maximum (name encryption is done in blocks of 32 bytes). Among those 224 bytes, even if there are up to 31 critical characters, the encoded form will still fit into NAME_MAX.
However, a clear text filename of length > 224 is almost guaranteed to be refused by the ZFS backend for ENAMETOOLONG reason. This is because such filename results in a cipher text filename of length 256. And the probability to have no critical characters among those 256 bytes is very low. But this cannot be avoided with any encoding algorithm, the only way to support clear text filenames of length > 224 with a ZFS backend would be to have binary names handled directly at the ZAP layer.

Comment by Andreas Dilger [ 30/Apr/21 ]

Olaf or Brian, could you please comment on the ZFS filename issue here. Is there any way to store binary filenames in ZFS?

Comment by Brian Behlendorf [ 30/Apr/21 ]

Thanks for asking, I think I can point you in the right direction.

It is possible to create a ZAP object which uses a binary key but it takes a little special handling. You're going to need to use the `zap_lookup_uint64()`, `zap_add_uint64()`, `zap_update_uint64()`, `zap_remove_uint64()` family of functions and create a new ZAP object with these additional flags, `zap_create_flags(os, 0, ZAP_FLAG_HASH64 | ZAP_FLAG_UINT64, ...);`.  For your use case it sounds like you may also want to set `ZAP_FLAG_PRE_HASHED_KEY`.  This will prevent the ZAP from hashing the value again and it'll just using the leading high bits for the hash value.

The deduplication functionality in ZFS is implemented using this functionality and 256-bit binary keys.  I'd suggest taking a look at the `ddt_zap.c` sources file and using it a reference.  It should give you a pretty good idea how to use these interfaces.

However, I'm not sure how the ZPL layer would handle it if you created a binary ZAP like this, added it to the directory structure, then mounted the filesystem through the POSIX layer.  If we're lucky it might just try and output the binary keys as filenames.  If we need to we can always add some code to ZFS to more gracefully handle this unlikely debug case.

Comment by Andreas Dilger [ 01/May/21 ]

Brian, thanks for your response.

I was wondering exactly the same question about ZPL mounting of such directories. It would definitely be prudent to have the ZPL base64 encode such dirents for display to userspace, as Lustre and ext4 do when accessing encrypted directories without a key. We've definitely had to poke into the directory tree on occasion, so keeping this ability will prove useful at some point again in the future.

Is there a special object type that we should use for this other than a normal directory type? That would make it more clear to the ZPL that this is a directory, but that it has a special encoding. I was wondering if there was anything that was done in ZPL to handle foreign character sets (Unicode/UTF-8/etc), but UTF-8 doesn't ever contain a NUL byte in it, while random encrypted data will.

Comment by Andreas Dilger [ 01/May/21 ]

Sebastien, it should be configured to have statfs() return f_namelen=224 when run on an encrypted directory, so that this is at least conveyed to applications that check what the maximum filename length is.

Comment by Sebastien Buisson [ 03/May/21 ]

Thanks Brian for the hints! I will look in this direction. Compared to what I did as an example in patch https://review.whamcloud.com/4347, I think what I was missing are the various flags to have the object properly handle the binary key. I will let you know how it goes.

Andreas, I gave a look at your suggestion to return a Namelen adapted to the encrypted directory case, but it seems the osd_statfs function only has the OSD device as parameter, so it seems we cannot make a difference depending on the directory on which statfs is called. What I did from userspace is to call statfs -f /mnt/testfs/myencrypteddir, but maybe you had something different in mind?

Comment by Gerrit Updater [ 05/May/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/43386/
Subject: LU-13717 sec: rework includes for client encryption
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 028281ae195927e97518c573e7be3e326d9e6bd3

Comment by Sebastien Buisson [ 14/May/21 ]

behlendorf it indeed helps to set the ZAP_FLAG_HASH64, ZAP_FLAG_UINT64 and ZAP_FLAG_PRE_HASHED_KEY flags on the ZAP object: with these, I am able to insert a new dir entry with zap_add_uint64 and lookup with zap_lookup_uint64. Thanks for the tip!

The problem I am facing now is that it seems the flags mentioned above can only be set at ZAP object creation time (via zap_create_flags_dnsize/zap_create_flags). Unfortunately, with Lustre client encryption, the workflow is to create an empty directory, and then set it as encrypted.
For instance, if I create a subdirectory inside this encrypted directory, the corresponding ZAP object will have the desired flags set to be able to handle a binary key, because we know we are inside an encrypted directory, and the files in the subdirectory work ok with a binary key. So this case is fine.
Now, if I create files directly under the encrypted directory, their parent ZAP object does not have the desired flags set to be able to handle a binary key (as this directory was created before being defined as encrypted), so those files cannot work with a binary key.
Back to the initial requirement of being able to support encrypted file names up to NAME_MAX, that would mean we are able to support these, but only in subdirectories of encrypted directories, and not right under the encrypted directory.

Is there a way to modify ZAP flags after the ZAP object has been created, or re-create a ZAP object from a previous one with additional flags? Or maybe ZAP objects can be files, and not only directories?

Comment by Andreas Dilger [ 20/May/21 ]

Sebastien, one question that came up during discussion today is whether the user.* xattrs are encrypted by fscrypt?

Comment by Sebastien Buisson [ 21/May/21 ]

Good question. In fact, fscrypt does not protect the confidentiality of non-filename metadata, e.g.

  • file sizes
  • file permissions
  • file timestamps
  • all extended attributes.
Comment by Gerrit Updater [ 02/Jun/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/43387/
Subject: LU-13717 sec: limit hard links to linkEA size for enc files
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 2ffb8f5726d27e7c2324a3e833491231fdaa3306

Comment by Sebastien Buisson [ 17/Jun/21 ]

Hi behlendorf, do you have any suggestions regarding the ZAP object flags, per my comment above?

In particular, I am wondering if there is a way to modify ZAP flags after the ZAP object has been created, or re-create a ZAP object from a previous one with additional flags. Or maybe ZAP objects can be files, and not only directories?

Thanks,
Sebastien.

Comment by Andreas Dilger [ 17/Jun/21 ]

Sebastien, definitely regular files can be ZAP objects. We do this for the ZFS OI files, for example, as well as some other index files in osd-zfs. That said, we do want to maintain the ability to mount/view/modify ZFS OSD filesystems (OST or MDT) via ZPL, so if you need to add a new ZAP type for client-encrypted directory names, then a similar change should be pushed to ZPL to allow it to identify those files as directories, and be able to encode/print the filenames appropriately for userspace when the filesystem is mounted directly.

Comment by Gerrit Updater [ 27/Jul/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/43388/
Subject: LU-13717 sec: handle null algo for filename encryption
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: f18c87cb5362496a4baadaa14265471c992ca06a

Comment by Gerrit Updater [ 11/Sep/21 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/43390/
Subject: LU-13717 sec: filename encryption
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 4d38566a004f6a636c37ec0c86f053be9b905bd7

Comment by Gerrit Updater [ 13/Sep/21 ]

"Sebastien Buisson <sbuisson@ddn.com>" uploaded a new patch: https://review.whamcloud.com/44898
Subject: LU-13717 sec: no name check for Lustre encrypted inodes
Project: tools/e2fsprogs
Branch: master-lustre
Current Patch Set: 1
Commit: 1d808adc6fe4321a732ca763babc58d52f7f0123

Comment by Gerrit Updater [ 17/Sep/21 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/43392/
Subject: LU-13717 sec: filename encryption - digest support
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: ed4a625d88567a2498c3fe32fd340ae7985e6ad0

Comment by Gerrit Updater [ 22/Sep/21 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/43394/
Subject: LU-13717 sec: filename encryption - symlink support
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: e735298935b64541fc561bd9e978cd7af48c503e

Comment by Peter Jones [ 22/Sep/21 ]

Landed for 2.15

Comment by Andreas Dilger [ 28/Sep/21 ]

Sebastien, I was just wondering what the interoperability is for filesystems that have encrypted filenames, but are accessed by an older client? Ideally, the older client could access the unencrypted parts of the filesystem normally, and would not "explode" if trying to access the encrypted directory tree (i.e. not LBUG/Oops when accessing a filename with a NUL byte in it). If there is such a problem accessing an encrypted directory by an old client, then it makes sense for the 2.15 MDS to return -ENOKEY to clients without the OBD_CONNECT2_ENCRYPT feature when accessing encrypted directories and files.

Separately, there is a question of what happens in the unusual case of an encrypted directory being accessed by an old MDS after a downgrade, whether by a new or old client? It looks like there is already an LMAI_ENCRYPT flag being stored on each file, so this should prevent major problems.

Comment by Sebastien Buisson [ 29/Sep/21 ]

Regarding access to 2.15 encrypted directories from older clients, I tested with a 2.12 client and a 2.14 client. I tried to list content of a directory for which I know for sure some names have illegal characters (as told so by e2fsck), and the result is the same for both versions:

# ls -li /testfs/vault
total 0
144115238860527809 -rw-r--r-- 1 root root 0 Sep 28 15:32 ??????U??e?<?g?????n?G???YK$8???
144115238860527803 -rw-r--r-- 1 root root 0 Sep 28 15:32 =o?"????v????R??XM?????????n?=J?,?
144115238860527801 -rw-r--r-- 1 root root 0 Sep 28 15:32 J=M??%????9???Q{?q??,??rp.?hV;8l?
144115238860527804 -rw-r--r-- 1 root root 0 Sep 28 15:32 O?9????=?@??>O???????v??76<V???0&
144115238860527819 -rw-r--r-- 1 root root 0 Sep 29 15:37 \99D????5?N??Y??g_(?F%3(@x=M?1c???r???u??olJ????g?????{dNP??Wh??#?,F5C?(?3J???S;v??5??????"b???zZ????????n?|??I#??1f_9???BY????O!?3?B?4c??1?=o????a?????JV?Iv],???`z%??6?-k%??=M?=}?4????%??Um???????<6??T?~???#????B??b??????:.??A)
144115238860527806 -rw-r--r-- 1 root root 0 Sep 28 15:32 g??_??[??Z??p???;?_=}?.??aM????c0
144115238860527802 -rw-r--r-- 1 root root 0 Sep 28 15:32 ???T?"g??>??{??J???8?;??e???%]?_
144115238860527807 -rw-r--r-- 1 root root 0 Sep 28 15:32 ???????sg??=J?zu????q????????7?#G
144115238860527805 -rw-r--r-- 1 root root 0 Sep 28 15:32 ????????Q???????0=@as?y6?J?[?p???
144115238860527810 -rw-r--r-- 1 root root 0 Sep 28 15:32 ???*1??`????p????PI9?=?"?????????
144115238860527808 -rw-r--r-- 1 root root 0 Sep 28 15:32 ?:????d????z?????a?p???Jh???????

So it does not blow up the client, but of course it is not possible to browse subdirectories for instance. This is fine, as the purpose of encryption is to limit access to those who do not have the key.
All encrypted names go through critical_encode() in 2.15 servers before being sent to the clients. As this routine escapes critical characters such as NULL, LF, CR, /, DEL and =, the names read by encryption-unaware clients are not that badly formed.

Comment by Gerrit Updater [ 08/Oct/21 ]

"Sebastien Buisson <sbuisson@ddn.com>" uploaded a new patch: https://review.whamcloud.com/45163
Subject: LU-13717 llite: encoded ciphertext name longer than NAME_MAX
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 1fa416289e3ae7a9d3bef15a194c2b5fd54dded2

Comment by Gerrit Updater [ 12/Oct/21 ]

"Sebastien Buisson <sbuisson@ddn.com>" uploaded a new patch: https://review.whamcloud.com/45211
Subject: LU-13717 ldiskfs: set LDISKFS_ENCRYPT_FL on encrypted files
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 946e64754846d5672387323ff482e2379df977e0

Comment by Gerrit Updater [ 12/Oct/21 ]

"Sebastien Buisson <sbuisson@ddn.com>" uploaded a new patch: https://review.whamcloud.com/45212
Subject: LU-13717 sec: process Lustre enc inodes as normal enc inodes
Project: tools/e2fsprogs
Branch: master-lustre
Current Patch Set: 1
Commit: 2f2ab907b1f0d3a77b69c754aa03c71d580f7fd4

Comment by Gerrit Updater [ 13/Oct/21 ]

"Sebastien Buisson <sbuisson@ddn.com>" uploaded a new patch: https://review.whamcloud.com/45221
Subject: LU-13717 sec: missing defs in includes for client encryption
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 089ce9b3939c9f6aad257c61987e7b8d18469b22

Comment by Gerrit Updater [ 19/Oct/21 ]

"Sebastien Buisson <sbuisson@ddn.com>" uploaded a new patch: https://review.whamcloud.com/45289
Subject: LU-13717 sec: support encrypted files handling in pfsck mode
Project: tools/e2fsprogs
Branch: master-lustre
Current Patch Set: 1
Commit: ef01ec0a4ffc6c0de9dfc1377fa3880611b5e664

Comment by Gerrit Updater [ 03/Nov/21 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/45221/
Subject: LU-13717 sec: missing defs in includes for client encryption
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 79ac7c144539e0d964db329c341ebf30a8472f5c

Comment by Gerrit Updater [ 30/Nov/21 ]

"Andreas Dilger <adilger@whamcloud.com>" merged in patch https://review.whamcloud.com/45289/
Subject: LU-13717 sec: support encrypted files handling in pfsck mode
Project: tools/e2fsprogs
Branch: master-lustre
Current Patch Set:
Commit: d35469d6c41a90ebeca121d3e5ec709cfc967cb8

Comment by Gerrit Updater [ 30/Nov/21 ]

"Andreas Dilger <adilger@whamcloud.com>" merged in patch https://review.whamcloud.com/45212/
Subject: LU-13717 sec: process Lustre enc inodes as normal enc inodes
Project: tools/e2fsprogs
Branch: master-lustre
Current Patch Set:
Commit: 35ce354fa5688625961a62fae8d26f826b015593

Comment by Andreas Dilger [ 17/Dec/21 ]

e2fsprogs be included in e2fsprogs-1.46.2.wc4

Comment by Gerrit Updater [ 21/Dec/21 ]

"James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/45907
Subject: LU-13717 sec: test native fscrypt for Ubunutu 5.4 kernel
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: ef8ff9e8a4c70d8d40aa53219613f38a7d2d351b

Comment by Gerrit Updater [ 23/Dec/21 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/45163/
Subject: LU-13717 sec: fix handling of encrypted file with long name
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 75414af6bf310244d38284958ecf037d61936726

Comment by Gerrit Updater [ 18/Jan/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/45211/
Subject: LU-13717 sec: make client encryption compatible with ext4
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 4231fab66eab3e984499bf0c6bd4514692a409fa

Comment by Peter Jones [ 18/Jan/22 ]

Landed for 2.15

Generated at Sat Feb 10 03:03:36 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.