kernel-fxtec-pro1x

Author	SHA1	Message	Date
Eric Biggers	62810e0d59	fs-verity: use mempool for hash requests When initializing an fs-verity hash algorithm, also initialize a mempool that contains a single preallocated hash request object. Then replace the direct calls to ahash_request_alloc() and ahash_request_free() with allocating and freeing from this mempool. This eliminates the possibility of the allocation failing, which is desirable for the I/O path. This doesn't cause deadlocks because there's no case where multiple hash requests are needed at a time to make forward progress. Link: https://lore.kernel.org/r/20191231175545.20709-1-ebiggers@kernel.org Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Eric Biggers <ebiggers@google.com>	2020-02-13 15:14:20 -08:00
Eric Biggers	dcd848f102	fs-verity: implement readahead of Merkle tree pages When fs-verity verifies data pages, currently it reads each Merkle tree page synchronously using read_mapping_page(). Therefore, when the Merkle tree pages aren't already cached, fs-verity causes an extra 4 KiB I/O request for every 512 KiB of data (assuming that the Merkle tree uses SHA-256 and 4 KiB blocks). This results in more I/O requests and performance loss than is strictly necessary. Therefore, implement readahead of the Merkle tree pages. For simplicity, we take advantage of the fact that the kernel already does readahead of the file's data, just like it does for any other file. Due to this, we don't really need a separate readahead state (struct file_ra_state) just for the Merkle tree, but rather we just need to piggy-back on the existing data readahead requests. We also only really need to bother with the first level of the Merkle tree, since the usual fan-out factor is 128, so normally over 99% of Merkle tree I/O requests are for the first level. Therefore, make fsverity_verify_bio() enable readahead of the first Merkle tree level, for up to 1/4 the number of pages in the bio, when it sees that the REQ_RAHEAD flag is set on the bio. The readahead size is then passed down to ->read_merkle_tree_page() for the filesystem to (optionally) implement if it sees that the requested page is uncached. While we're at it, also make build_merkle_tree_level() set the Merkle tree readahead size, since it's easy to do there. However, for now don't set the readahead size in fsverity_verify_page(), since currently it's only used to verify holes on ext4 and f2fs, and it would need parameters added to know how much to read ahead. This patch significantly improves fs-verity sequential read performance. Some quick benchmarks with 'cat'-ing a 250MB file after dropping caches: On an ARM64 phone (using sha256-ce): Before: 217 MB/s After: 263 MB/s (compare to sha256sum of non-verity file: 357 MB/s) In an x86_64 VM (using sha256-avx2): Before: 173 MB/s After: 215 MB/s (compare to sha256sum of non-verity file: 223 MB/s) Link: https://lore.kernel.org/r/20200106205533.137005-1-ebiggers@kernel.org Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Eric Biggers <ebiggers@google.com>	2020-02-13 15:14:14 -08:00
Eric Biggers	9630eacb1e	fs-verity: implement readahead for FS_IOC_ENABLE_VERITY When it builds the first level of the Merkle tree, FS_IOC_ENABLE_VERITY sequentially reads each page of the file using read_mapping_page(). This works fine if the file's data is already in pagecache, which should normally be the case, since this ioctl is normally used immediately after writing out the file. But in any other case this implementation performs very poorly, since only one page is read at a time. Fix this by implementing readahead using the functions from mm/readahead.c. This improves performance in the uncached case by about 20x, as seen in the following benchmarks done on a 250MB file (on x86_64 with SHA-NI): FS_IOC_ENABLE_VERITY uncached (before) 3.299s FS_IOC_ENABLE_VERITY uncached (after) 0.160s FS_IOC_ENABLE_VERITY cached 0.147s sha256sum uncached 0.191s sha256sum cached 0.145s Note: we could instead switch to kernel_read(). But that would mean we'd no longer be hashing the data directly from the pagecache, which is a nice optimization of its own. And using kernel_read() would require allocating another temporary buffer, hashing the data and tree pages separately, and explicitly zero-padding the last page -- so it wouldn't really be any simpler than direct pagecache access, at least for now. Link: https://lore.kernel.org/r/20200106205410.136707-1-ebiggers@kernel.org Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Eric Biggers <ebiggers@google.com>	2020-02-13 15:14:14 -08:00
Daniel Rosenberg	86eb43f574	fscrypt: improve format of no-key names When an encrypted directory is listed without the key, the filesystem must show "no-key names" that uniquely identify directory entries, are at most 255 (NAME_MAX) bytes long, and don't contain '/' or '\0'. Currently, for short names the no-key name is the base64 encoding of the ciphertext filename, while for long names it's the base64 encoding of the ciphertext filename's dirhash and second-to-last 16-byte block. This format has the following problems: - Since it doesn't always include the dirhash, it's incompatible with directories that will use a secret-keyed dirhash over the plaintext filenames. In this case, the dirhash won't be computable from the ciphertext name without the key, so it instead must be retrieved from the directory entry and always included in the no-key name. Casefolded encrypted directories will use this type of dirhash. - It's ambiguous: it's possible to craft two filenames that map to the same no-key name, since the method used to abbreviate long filenames doesn't use a proper cryptographic hash function. Solve both these problems by switching to a new no-key name format that is the base64 encoding of a variable-length structure that contains the dirhash, up to 149 bytes of the ciphertext filename, and (if any bytes remain) the SHA-256 of the remaining bytes of the ciphertext filename. This ensures that each no-key name contains everything needed to find the directory entry again, contains only legal characters, doesn't exceed NAME_MAX, is unambiguous unless there's a SHA-256 collision, and that we only take the performance hit of SHA-256 on very long filenames. Note: this change does not address the existing issue where users can modify the 'dirhash' part of a no-key name and the filesystem may still accept the name. Signed-off-by: Daniel Rosenberg <drosen@google.com> [EB: improved comments and commit message, fixed checking return value of base64_decode(), check for SHA-256 error, continue to set disk_name for short names to keep matching simpler, and many other cleanups] Link: https://lore.kernel.org/r/20200120223201.241390-7-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>	2020-02-13 15:14:09 -08:00
Eric Biggers	eeb955a8ad	ubifs: allow both hash and disk name to be provided in no-key names In order to support a new dirhash method that is a secret-keyed hash over the plaintext filenames (which will be used by encrypted+casefolded directories on ext4 and f2fs), fscrypt will be switching to a new no-key name format that always encodes the dirhash in the name. UBIFS isn't happy with this because it has assertions that verify that either the hash or the disk name is provided, not both. Change it to use the disk name if one is provided, even if a hash is available too; else use the hash. Link: https://lore.kernel.org/r/20200120223201.241390-6-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>	2020-02-13 15:10:34 -08:00
Eric Biggers	869dc687a8	ubifs: don't trigger assertion on invalid no-key filename If userspace provides an invalid fscrypt no-key filename which encodes a hash value with any of the UBIFS node type bits set (i.e. the high 3 bits), gracefully report ENOENT rather than triggering ubifs_assert(). Test case with kvm-xfstests shell: . fs/ubifs/config . ~/xfstests/common/encrypt dev=$(__blkdev_to_ubi_volume /dev/vdc) ubiupdatevol $dev -t mount $dev /mnt -t ubifs mkdir /mnt/edir xfs_io -c set_encpolicy /mnt/edir rm /mnt/edir/_,,,,,DAAAAAAAAAAAAAAAAAAAAAAAAAA With the bug, the following assertion fails on the 'rm' command: [ 19.066048] UBIFS error (ubi0:0 pid 379): ubifs_assert_failed: UBIFS assert failed: !(hash & ~UBIFS_S_KEY_HASH_MASK), in fs/ubifs/key.h:170 Fixes: `f4f61d2cc6` ("ubifs: Implement encrypted filenames") Cc: <stable@vger.kernel.org> # v4.10+ Link: https://lore.kernel.org/r/20200120223201.241390-5-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>	2020-02-13 15:10:34 -08:00
Eric Biggers	338a1f52ae	fscrypt: clarify what is meant by a per-file key Now that there's sometimes a second type of per-file key (the dirhash key), clarify some function names, macros, and documentation that specifically deal with per-file encryption keys. Link: https://lore.kernel.org/r/20200120223201.241390-4-ebiggers@kernel.org Reviewed-by: Daniel Rosenberg <drosen@google.com> Signed-off-by: Eric Biggers <ebiggers@google.com>	2020-02-13 15:10:33 -08:00
Daniel Rosenberg	7495f91bb5	fscrypt: derive dirhash key for casefolded directories When we allow indexed directories to use both encryption and casefolding, for the dirhash we can't just hash the ciphertext filenames that are stored on-disk (as is done currently) because the dirhash must be case insensitive, but the stored names are case-preserving. Nor can we hash the plaintext names with an unkeyed hash (or a hash keyed with a value stored on-disk like ext4's s_hash_seed), since that would leak information about the names that encryption is meant to protect. Instead, if we can accept a dirhash that's only computable when the fscrypt key is available, we can hash the plaintext names with a keyed hash using a secret key derived from the directory's fscrypt master key. We'll use SipHash-2-4 for this purpose. Prepare for this by deriving a SipHash key for each casefolded encrypted directory. Make sure to handle deriving the key not only when setting up the directory's fscrypt_info, but also in the case where the casefold flag is enabled after the fscrypt_info was already set up. (We could just always derive the key regardless of casefolding, but that would introduce unnecessary overhead for people not using casefolding.) Signed-off-by: Daniel Rosenberg <drosen@google.com> [EB: improved commit message, updated fscrypt.rst, squashed with change that avoids unnecessarily deriving the key, and many other cleanups] Link: https://lore.kernel.org/r/20200120223201.241390-3-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>	2020-02-13 15:10:33 -08:00
Daniel Rosenberg	f4951340a1	fscrypt: don't allow v1 policies with casefolding Casefolded encrypted directories will use a new dirhash method that requires a secret key. If the directory uses a v2 encryption policy, it's easy to derive this key from the master key using HKDF. However, v1 encryption policies don't provide a way to derive additional keys. Therefore, don't allow casefolding on directories that use a v1 policy. Specifically, make it so that trying to enable casefolding on a directory that has a v1 policy fails, trying to set a v1 policy on a casefolded directory fails, and trying to open a casefolded directory that has a v1 policy (if one somehow exists on-disk) fails. Signed-off-by: Daniel Rosenberg <drosen@google.com> [EB: improved commit message, updated fscrypt.rst, and other cleanups] Link: https://lore.kernel.org/r/20200120223201.241390-2-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>	2020-02-13 15:10:33 -08:00
Eric Biggers	2ad325daa7	fscrypt: add "fscrypt_" prefix to fname_encrypt() fname_encrypt() is a global function, due to being used in both fname.c and hooks.c. So it should be prefixed with "fscrypt_", like all the other global functions in fs/crypto/. Link: https://lore.kernel.org/r/20200120071736.45915-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>	2020-02-13 15:10:33 -08:00
Eric Biggers	23ad6ce441	fscrypt: don't print name of busy file when removing key When an encryption key can't be fully removed due to file(s) protected by it still being in-use, we shouldn't really print the path to one of these files to the kernel log, since parts of this path are likely to be encrypted on-disk, and (depending on how the system is set up) the confidentiality of this path might be lost by printing it to the log. This is a trade-off: a single file path often doesn't matter at all, especially if it's a directory; the kernel log might still be protected in some way; and I had originally hoped that any "inode(s) still busy" bugs (which are security weaknesses in their own right) would be quickly fixed and that to do so it would be super helpful to always know the file path and not have to run 'find dir -inum $inum' after the fact. But in practice, these bugs can be hard to fix (e.g. due to asynchronous process killing that is difficult to eliminate, for performance reasons), and also not tied to specific files, so knowing a file path doesn't necessarily help. So to be safe, for now let's just show the inode number, not the path. If someone really wants to know a path they can use 'find -inum'. Fixes: b1c0ec3599f4 ("fscrypt: add FS_IOC_REMOVE_ENCRYPTION_KEY ioctl") Cc: <stable@vger.kernel.org> # v5.4+ Link: https://lore.kernel.org/r/20200120060732.390362-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>	2020-02-13 15:10:33 -08:00
Eric Biggers	6fe9354e00	fscrypt: document gfp_flags for bounce page allocation Document that fscrypt_encrypt_pagecache_blocks() allocates the bounce page from a mempool, and document what this means for the @gfp_flags argument. Link: https://lore.kernel.org/r/20191231181026.47400-1-ebiggers@kernel.org Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Eric Biggers <ebiggers@google.com>	2020-02-13 15:04:00 -08:00
Eric Biggers	7df05e52cb	fscrypt: optimize fscrypt_zeroout_range() Currently fscrypt_zeroout_range() issues and waits on a bio for each block it writes, which makes it very slow. Optimize it to write up to 16 pages at a time instead. Also add a function comment, and improve reliability by allowing the allocations of the bio and the first ciphertext page to wait on the corresponding mempools. Link: https://lore.kernel.org/r/20191226160813.53182-1-ebiggers@kernel.org Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Eric Biggers <ebiggers@google.com>	2020-02-13 15:04:00 -08:00
Eric Biggers	a9ae9e66a0	fscrypt: remove redundant bi_status check submit_bio_wait() already returns bi_status translated to an errno. So the additional check of bi_status is redundant and can be removed. Link: https://lore.kernel.org/r/20191209204509.228942-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>	2020-02-13 15:04:00 -08:00
Herbert Xu	b504c7cead	fscrypt: Allow modular crypto algorithms The commit 643fa9612bf1 ("fscrypt: remove filesystem specific build config option") removed modular support for fs/crypto. This causes the Crypto API to be built-in whenever fscrypt is enabled. This makes it very difficult for me to test modular builds of the Crypto API without disabling fscrypt which is a pain. As fscrypt is still evolving and it's developing new ties with the fs layer, it's hard to build it as a module for now. However, the actual algorithms are not required until a filesystem is mounted. Therefore we can allow them to be built as modules. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Link: https://lore.kernel.org/r/20191227024700.7vrzuux32uyfdgum@gondor.apana.org.au Signed-off-by: Eric Biggers <ebiggers@google.com>	2020-02-13 15:03:58 -08:00
Eric Biggers	6f39a6b4de	fscrypt: include <linux/ioctl.h> in UAPI header <linux/fscrypt.h> defines ioctl numbers using the macros like _IOWR() which are defined in <linux/ioctl.h>, so <linux/ioctl.h> should be included as a prerequisite, like it is in many other kernel headers. In practice this doesn't really matter since anyone referencing these ioctl numbers will almost certainly include <sys/ioctl.h> too in order to actually call ioctl(). But we might as well fix this. Link: https://lore.kernel.org/r/20191219185624.21251-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>	2020-02-12 21:26:20 -08:00
Eric Biggers	11dd760288	fscrypt: don't check for ENOKEY from fscrypt_get_encryption_info() fscrypt_get_encryption_info() returns 0 if the encryption key is unavailable; it never returns ENOKEY. So remove checks for ENOKEY. Link: https://lore.kernel.org/r/20191209212348.243331-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>	2020-02-12 21:26:20 -08:00
Eric Biggers	bfc935af5b	fscrypt: remove fscrypt_is_direct_key_policy() fscrypt_is_direct_key_policy() is no longer used, so remove it. Link: https://lore.kernel.org/r/20191209211829.239800-5-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>	2020-02-12 21:26:20 -08:00
Eric Biggers	51a6bbc53f	fscrypt: move fscrypt_valid_enc_modes() to policy.c fscrypt_valid_enc_modes() is only used by policy.c, so move it to there. Also adjust the order of the checks to be more natural, matching the numerical order of the constants and also keeping AES-256 (the recommended default) first in the list. No change in behavior. Link: https://lore.kernel.org/r/20191209211829.239800-4-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>	2020-02-12 21:26:20 -08:00
Eric Biggers	6dad35d9e8	fscrypt: check for appropriate use of DIRECT_KEY flag earlier FSCRYPT_POLICY_FLAG_DIRECT_KEY is currently only allowed with Adiantum encryption. But FS_IOC_SET_ENCRYPTION_POLICY allowed it in combination with other encryption modes, and an error wasn't reported until later when the encrypted directory was actually used. Fix it to report the error earlier by validating the correct use of the DIRECT_KEY flag in fscrypt_supported_policy(), similar to how we validate the IV_INO_LBLK_64 flag. Link: https://lore.kernel.org/r/20191209211829.239800-3-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>	2020-02-12 21:26:19 -08:00
Eric Biggers	cf17f4020b	fscrypt: split up fscrypt_supported_policy() by policy version Make fscrypt_supported_policy() call new functions fscrypt_supported_v1_policy() and fscrypt_supported_v2_policy(), to reduce the indentation level and make the code easier to read. Also adjust the function comment to mention that whether the encryption policy is supported can also depend on the inode. No change in behavior. Link: https://lore.kernel.org/r/20191209211829.239800-2-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>	2020-02-12 21:26:19 -08:00
Eric Biggers	3597e506e5	fscrypt: introduce fscrypt_needs_contents_encryption() Add a function fscrypt_needs_contents_encryption() which takes an inode and returns true if it's an encrypted regular file and the kernel was built with fscrypt support. This will allow replacing duplicated checks of IS_ENCRYPTED() && S_ISREG() on the I/O paths in ext4 and f2fs, while also optimizing out unneeded code when !CONFIG_FS_ENCRYPTION. Link: https://lore.kernel.org/r/20191209205021.231767-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>	2020-02-12 21:26:19 -08:00
Eric Biggers	b168e58523	fscrypt: move fscrypt_d_revalidate() to fname.c fscrypt_d_revalidate() and fscrypt_d_ops really belong in fname.c, since they're specific to filenames encryption. crypto.c is for contents encryption and general fs/crypto/ initialization and utilities. Link: https://lore.kernel.org/r/20191209204359.228544-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>	2020-02-12 21:26:19 -08:00
Eric Biggers	bac335ab74	fscrypt: constify inode parameter to filename encryption functions Constify the struct inode parameter to fscrypt_fname_disk_to_usr() and the other filename encryption functions so that users don't have to pass in a non-const inode when they are dealing with a const one, as in [1]. [1] https://lkml.kernel.org/linux-ext4/20191203051049.44573-6-drosen@google.com/ Cc: Daniel Rosenberg <drosen@google.com> Link: https://lore.kernel.org/r/20191215213947.9521-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>	2020-02-12 21:26:19 -08:00
Eric Biggers	38c2723e47	fscrypt: constify struct fscrypt_hkdf parameter to fscrypt_hkdf_expand() Constify the struct fscrypt_hkdf parameter to fscrypt_hkdf_expand(). This makes it clearer that struct fscrypt_hkdf contains the key only, not any per-request state. Link: https://lore.kernel.org/r/20191209204054.227736-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>	2020-02-12 21:26:19 -08:00
Eric Biggers	7eabda806e	fscrypt: verify that the crypto_skcipher has the correct ivsize As a sanity check, verify that the allocated crypto_skcipher actually has the ivsize that fscrypt is assuming it has. This will always be the case unless there's a bug. But if there ever is such a bug (e.g. like there was in earlier versions of the ESSIV conversion patch [1]) it's preferable for it to be immediately obvious, and not rely on the ciphertext verification tests failing due to uninitialized IV bytes. [1] https://lkml.kernel.org/linux-crypto/20190702215517.GA69157@gmail.com/ Link: https://lore.kernel.org/r/20191209203918.225691-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>	2020-02-12 21:26:19 -08:00
Eric Biggers	17b10a9cf6	fscrypt: use crypto_skcipher_driver_name() Crypto API users shouldn't really be accessing struct skcipher_alg directly. <crypto/skcipher.h> already has a function crypto_skcipher_driver_name(), so use that instead. No change in behavior. Link: https://lore.kernel.org/r/20191209203810.225302-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>	2020-02-12 21:26:18 -08:00
Eric Biggers	36500bffb9	fscrypt: support passing a keyring key to FS_IOC_ADD_ENCRYPTION_KEY Extend the FS_IOC_ADD_ENCRYPTION_KEY ioctl to allow the raw key to be specified by a Linux keyring key, rather than specified directly. This is useful because fscrypt keys belong to a particular filesystem instance, so they are destroyed when that filesystem is unmounted. Usually this is desired. But in some cases, userspace may need to unmount and re-mount the filesystem while keeping the keys, e.g. during a system update. This requires keeping the keys somewhere else too. The keys could be kept in memory in a userspace daemon. But depending on the security architecture and assumptions, it can be preferable to keep them only in kernel memory, where they are unreadable by userspace. We also can't solve this by going back to the original fscrypt API (where for each file, the master key was looked up in the process's keyring hierarchy) because that caused lots of problems of its own. Therefore, add the ability for FS_IOC_ADD_ENCRYPTION_KEY to accept a Linux keyring key. This solves the problem by allowing userspace to (if needed) save the keys securely in a Linux keyring for re-provisioning, while still using the new fscrypt key management ioctls. This is analogous to how dm-crypt accepts a Linux keyring key, but the key is then stored internally in the dm-crypt data structures rather than being looked up again each time the dm-crypt device is accessed. Use a custom key type "fscrypt-provisioning" rather than one of the existing key types such as "logon". This is strongly desired because it enforces that these keys are only usable for a particular purpose: for fscrypt as input to a particular KDF. Otherwise, the keys could also be passed to any kernel API that accepts a "logon" key with any service prefix, e.g. dm-crypt, UBIFS, or (recently proposed) AF_ALG. This would risk leaking information about the raw key despite it ostensibly being unreadable. Of course, this mistake has already been made for multiple kernel APIs; but since this is a new API, let's do it right. This patch has been tested using an xfstest which I wrote to test it. Link: https://lore.kernel.org/r/20191119222447.226853-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>	2020-02-12 21:26:18 -08:00
Dave Jiang	9f75e365f3	keys: Export lookup_user_key to external users Export lookup_user_key() symbol in order to allow nvdimm passphrase update to retrieve user injected keys. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2020-02-12 21:26:18 -08:00
Eric Biggers	a7cbfc8a9c	f2fs: fix race conditions in ->d_compare() and ->d_hash() Since ->d_compare() and ->d_hash() can be called in RCU-walk mode, ->d_parent and ->d_inode can be concurrently modified, and in particular, ->d_inode may be changed to NULL. For f2fs_d_hash() this resulted in a reproducible NULL dereference if a lookup is done in a directory being deleted, e.g. with: int main() { if (fork()) { for (;;) { mkdir("subdir", 0700); rmdir("subdir"); } } else { for (;;) access("subdir/file", 0); } } ... or by running the 't_encrypted_d_revalidate' program from xfstests. Both repros work in any directory on a filesystem with the encoding feature, even if the directory doesn't actually have the casefold flag. I couldn't reproduce a crash in f2fs_d_compare(), but it appears that a similar crash is possible there. Fix these bugs by reading ->d_parent and ->d_inode using READ_ONCE() and falling back to the case sensitive behavior if the inode is NULL. Reported-by: Al Viro <viro@zeniv.linux.org.uk> Fixes: 2c2eb7a300cd ("f2fs: Support case-insensitive file name lookups") Cc: <stable@vger.kernel.org> # v5.4+ Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-01-28 10:36:50 -08:00
Eric Biggers	5c34b827f8	f2fs: fix dcache lookup of !casefolded directories Do the name comparison for non-casefolded directories correctly. This is analogous to ext4's commit 66883da1eee8 ("ext4: fix dcache lookup of !casefolded directories"). Fixes: 2c2eb7a300cd ("f2fs: Support case-insensitive file name lookups") Cc: <stable@vger.kernel.org> # v5.4+ Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-01-28 10:36:50 -08:00
Hridya Valsaraju	c728890938	f2fs: Add f2fs stats to sysfs Currently f2fs stats are only available from /d/f2fs/status. This patch adds some of the f2fs stats to sysfs so that they are accessible even when debugfs is not mounted. The following sysfs nodes are added: -/sys/fs/f2fs/<disk>/free_segments -/sys/fs/f2fs/<disk>/cp_foreground_calls -/sys/fs/f2fs/<disk>/cp_background_calls -/sys/fs/f2fs/<disk>/gc_foreground_calls -/sys/fs/f2fs/<disk>/gc_background_calls -/sys/fs/f2fs/<disk>/moved_blocks_foreground -/sys/fs/f2fs/<disk>/moved_blocks_background -/sys/fs/f2fs/<disk>/avg_vblocks Signed-off-by: Hridya Valsaraju <hridya@google.com> [Jaegeuk Kim: allow STAT_FS without DEBUG_FS] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-01-28 10:36:50 -08:00
Hridya Valsaraju	864d624db0	f2fs: delete duplicate information on sysfs nodes This patch merges the sysfs node documentation present in Documentation/filesystems/f2fs.txt and Documentation/ABI/testing/sysfs-fs-f2fs and deletes the duplicate information from Documentation/filesystems/f2fs.txt. This is to prevent having to update both files when a new sysfs node is added for f2fs. The patch also makes minor formatting changes to Documentation/ABI/testing/sysfs-fs-f2fs. Signed-off-by: Hridya Valsaraju <hridya@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-01-28 10:36:50 -08:00
Chao Yu	4e284f6a85	f2fs: change to use rwsem for gc_mutex Mutex lock won't serialize callers, in order to avoid starving of unlucky caller, let's use rwsem lock instead. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-01-28 10:36:50 -08:00
Jaegeuk Kim	61abb38611	f2fs: update f2fs document regarding to fsync_mode This patch adds missing fsync_mode entry in f2fs document. Fixes: `04485987f0` ("f2fs: introduce async IPU policy") Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-01-28 10:36:50 -08:00
Jaegeuk Kim	3319f930d4	f2fs: add a way to turn off ipu bio cache Setting 0x40 in /sys/fs/f2fs/dev/ipu_policy gives a way to turn off bio cache, which is useufl to check whether block layer using hardware encryption engine merges IOs correctly. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-01-28 10:36:49 -08:00
Chengguang Xu	057b9c3288	f2fs: code cleanup for f2fs_statfs_project() Calling min_not_zero() to simplify complicated prjquota limit comparison in f2fs_statfs_project(). Signed-off-by: Chengguang Xu <cgxu519@mykernel.net> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-01-28 10:36:49 -08:00
Chengguang Xu	664c48849a	f2fs: fix miscounted block limit in f2fs_statfs_project() statfs calculates Total/Used/Avail disk space in block unit, so we should translate soft/hard prjquota limit to block unit as well. Below testing result shows the block/inode numbers of Total/Used/Avail from df command are all correct afer applying this patch. [root@localhost quota-tools]\# ./repquota -P /dev/sdb1 *** Report for project quotas on device /dev/sdb1 Block grace time: 7days; Inode grace time: 7days Block limits File limits Project used soft hard grace used soft hard grace ----------------------------------------------------------- \#0 -- 4 0 0 1 0 0 \#101 -- 0 0 0 2 0 0 \#102 -- 0 10240 0 2 10 0 \#103 -- 0 0 20480 2 0 20 \#104 -- 0 10240 20480 2 10 20 \#105 -- 0 20480 10240 2 20 10 [root@localhost sdb1]\# lsattr -p t{1,2,3,4,5} 101 ----------------N-- t1/a1 102 ----------------N-- t2/a2 103 ----------------N-- t3/a3 104 ----------------N-- t4/a4 105 ----------------N-- t5/a5 [root@localhost sdb1]\# df -hi t{1,2,3,4,5} Filesystem Inodes IUsed IFree IUse% Mounted on /dev/sdb1 2.4M 21 2.4M 1% /mnt/sdb1 /dev/sdb1 10 2 8 20% /mnt/sdb1 /dev/sdb1 20 2 18 10% /mnt/sdb1 /dev/sdb1 10 2 8 20% /mnt/sdb1 /dev/sdb1 10 2 8 20% /mnt/sdb1 [root@localhost sdb1]\# df -h t{1,2,3,4,5} Filesystem Size Used Avail Use% Mounted on /dev/sdb1 10G 489M 9.6G 5% /mnt/sdb1 /dev/sdb1 10M 0 10M 0% /mnt/sdb1 /dev/sdb1 20M 0 20M 0% /mnt/sdb1 /dev/sdb1 10M 0 10M 0% /mnt/sdb1 /dev/sdb1 10M 0 10M 0% /mnt/sdb1 Fixes: 909110c060f2 ("f2fs: choose hardlimit when softlimit is larger than hardlimit in f2fs_statfs_project()") Signed-off-by: Chengguang Xu <cgxu519@mykernel.net> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-01-28 10:36:49 -08:00
Sahitya Tummala	39cb3c8cc9	f2fs: show the CP_PAUSE reason in checkpoint traces Remove the duplicate CP_UMOUNT enum and add the new CP_PAUSE enum to show the checkpoint reason in the trace prints. Signed-off-by: Sahitya Tummala <stummala@codeaurora.org> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-01-28 10:36:49 -08:00
Eric Biggers	2598e16422	f2fs: fix deadlock allocating bio_post_read_ctx from mempool Without any form of coordination, any case where multiple allocations from the same mempool are needed at a time to make forward progress can deadlock under memory pressure. This is the case for struct bio_post_read_ctx, as one can be allocated to decrypt a Merkle tree page during fsverity_verify_bio(), which itself is running from a post-read callback for a data bio which has its own struct bio_post_read_ctx. Fix this by freeing first bio_post_read_ctx before calling fsverity_verify_bio(). This works because verity (if enabled) is always the last post-read step. This deadlock can be reproduced by trying to read from an encrypted verity file after reducing NUM_PREALLOC_POST_READ_CTXS to 1 and patching mempool_alloc() to pretend that pool->alloc() always fails. Note that since NUM_PREALLOC_POST_READ_CTXS is actually 128, to actually hit this bug in practice would require reading from lots of encrypted verity files at the same time. But it's theoretically possible, as N available objects doesn't guarantee forward progress when > N/2 threads each need 2 objects at a time. Fixes: 95ae251fe828 ("f2fs: add fs-verity support") Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-01-28 10:36:49 -08:00
Eric Biggers	a757df406d	f2fs: remove unneeded check for error allocating bio_post_read_ctx Since allocating an object from a mempool never fails when __GFP_DIRECT_RECLAIM (which is included in GFP_NOFS) is set, the check for failure to allocate a bio_post_read_ctx is unnecessary. Remove it. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-01-28 10:36:49 -08:00
Jaegeuk Kim	89da42b87e	f2fs: convert inline_dir early before starting rename If we hit an error during rename, we'll get two dentries in different directories. Chao adds to check the room in inline_dir which can avoid needless inversion. This should be done by inode_lock(&old_dir). Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-01-28 10:36:49 -08:00
Chao Yu	8fcf59d420	f2fs: fix memleak of kobject If kobject_init_and_add() failed, caller needs to invoke kobject_put() to release kobject explicitly. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-01-28 10:36:48 -08:00
Chao Yu	c9c21ca2a0	f2fs: fix to add swap extent correctly As Youling reported in mailing list: https://www.linuxquestions.org/questions/linux-newbie-8/the-file-system-f2fs-is-broken-4175666043/ https://www.linux.org/threads/the-file-system-f2fs-is-broken.26490/ There is a test case can corrupt f2fs image: - dd if=/dev/zero of=/swapfile bs=1M count=4096 - chmod 600 /swapfile - mkswap /swapfile - swapon --discard /swapfile The root cause is f2fs_swap_activate() intends to return zero value to setup_swap_extents() to enable SWP_FS mode (swap file goes through fs), in this flow, setup_swap_extents() setups swap extent with wrong block address range, result in discard_swap() erasing incorrect address. Because f2fs_swap_activate() has pinned swapfile, its data block address will not change, it's safe to let swap to handle IO through raw device, so we can get rid of SWAP_FS mode and initial swap extents inside f2fs_swap_activate(), by this way, later discard_swap() can trim in right address range. Fixes: 4969c06a0d83 ("f2fs: support swap file w/ DIO") Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-01-28 10:36:48 -08:00
Omar Sandoval	e51cb8f5c1	mm: export add_swap_extent() Btrfs currently does not support swap files because swap's use of bmap does not work with copy-on-write and multiple devices. See `35054394c4` ("Btrfs: stop providing a bmap operation to avoid swapfile corruptions"). However, the swap code has a mechanism for the filesystem to manually add swap extents using add_swap_extent() from the ->swap_activate() aop. iomap has done this since `67482129cd` ("iomap: add a swapfile activation function"). Btrfs will do the same in a later patch, so export add_swap_extent(). Link: http://lkml.kernel.org/r/bb1208575e02829aae51b538709476964f97b1ea.1536704650.git.osandov@fb.com Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: David Sterba <dsterba@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-01-28 10:36:48 -08:00
Jaegeuk Kim	fb185f8398	f2fs: run fsck when getting bad inode during GC This is to avoid inifinite GC when trying to disable checkpoint. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-01-28 10:36:48 -08:00
Chao Yu	f1fb133573	f2fs: support data compression This patch tries to support compression in f2fs. - New term named cluster is defined as basic unit of compression, file can be divided into multiple clusters logically. One cluster includes 4 << n (n >= 0) logical pages, compression size is also cluster size, each of cluster can be compressed or not. - In cluster metadata layout, one special flag is used to indicate cluster is compressed one or normal one, for compressed cluster, following metadata maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores data including compress header and compressed data. - In order to eliminate write amplification during overwrite, F2FS only support compression on write-once file, data can be compressed only when all logical blocks in file are valid and cluster compress ratio is lower than specified threshold. - To enable compression on regular inode, there are three ways: * chattr +c file * chattr +c dir; touch dir/file * mount w/ -o compress_extension=ext; touch file.ext Compress metadata layout: [Dnode Structure] +-----------------------------------------------+ \| cluster 1 \| cluster 2 \| ......... \| cluster N \| +-----------------------------------------------+ . . . . . . . . . Compressed Cluster . . Normal Cluster . +----------+---------+---------+---------+ +---------+---------+---------+---------+ \|compr flag\| block 1 \| block 2 \| block 3 \| \| block 1 \| block 2 \| block 3 \| block 4 \| +----------+---------+---------+---------+ +---------+---------+---------+---------+ . . . . . . +-------------+-------------+----------+----------------------------+ \| data length \| data chksum \| reserved \| compressed data \| +-------------+-------------+----------+----------------------------+ Changelog: 20190326: - fix error handling of read_end_io(). - remove unneeded comments in f2fs_encrypt_one_page(). 20190327: - fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages(). - don't jump into loop directly to avoid uninitialized variables. - add TODO tag in error path of f2fs_write_cache_pages(). 20190328: - fix wrong merge condition in f2fs_read_multi_pages(). - check compressed file in f2fs_post_read_required(). 20190401 - allow overwrite on non-compressed cluster. - check cluster meta before writing compressed data. 20190402 - don't preallocate blocks for compressed file. - add lz4 compress algorithm - process multiple post read works in one workqueue Now f2fs supports processing post read work in multiple workqueue, it shows low performance due to schedule overhead of multiple workqueue executing orderly. 20190921 - compress: support buffered overwrite C: compress cluster flag V: valid block address N: NEW_ADDR One cluster contain 4 blocks before overwrite after overwrite - VVVV -> CVNN - CVNN -> VVVV - CVNN -> CVNN - CVNN -> CVVV - CVVV -> CVNN - CVVV -> CVVV 20191029 - add kconfig F2FS_FS_COMPRESSION to isolate compression related codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm. note that: will remove lzo backend if Jaegeuk agreed that too. - update codes according to Eric's comments. 20191101 - apply fixes from Jaegeuk 20191113 - apply fixes from Jaegeuk - split workqueue for fsverity 20191216 - apply fixes from Jaegeuk [Jaegeuk Kim] - add tracepoint for f2fs_{,de}compress_pages() - fix many bugs and add some compression stats - fix overwrite/mmap bugs - address 32bit build error, reported by Geert. - bug fixes when handling errors and i_compressed_blocks Reported-by: <noreply@ellerman.id.au> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-01-28 10:36:48 -08:00
Jaegeuk Kim	949740f204	f2fs: free sysfs kobject Detected kmemleak. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-01-28 10:36:48 -08:00
Jaegeuk Kim	e8764241d3	f2fs: declare nested quota_sem and remove unnecessary sems 1. f2fs_quota_sync -> down_read(&sbi->quota_sem) -> dquot_writeback_dquots -> f2fs_dquot_commit -> down_read(&sbi->quota_sem) 2. f2fs_quota_sync -> down_read(&sbi->quota_sem) -> f2fs_write_data_pages -> f2fs_write_single_data_page -> down_write(&F2FS_I(inode)->i_sem) f2fs_mkdir -> f2fs_do_add_link -> down_write(&F2FS_I(inode)->i_sem) -> f2fs_init_inode_metadata -> f2fs_new_node_page -> dquot_alloc_inode -> f2fs_dquot_mark_dquot_dirty -> down_read(&sbi->quota_sem) Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-01-28 10:36:48 -08:00
Jaegeuk Kim	e9bbe1b939	f2fs: don't put new_page twice in f2fs_rename In f2fs_rename(), new_page is gone after f2fs_set_link(), but it tries to put again when whiteout is failed and jumped to put_out_dir. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-01-28 10:36:48 -08:00

1 2 3 4 5 ...

784224 commits