linux

mirror of https://github.com/torvalds/linux.git synced 2024-09-24 17:03:03 +00:00

Author	SHA1	Message	Date
Eric Whitney	04e22412f4	ext4: make online defrag error reporting consistent Make the error reporting behavior resulting from the unsupported use of online defrag on files with data journaling enabled consistent with that implemented for bigalloc file systems. Difference found with ext4/308. Signed-off-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>	2015-06-21 21:38:03 -04:00
Eric Whitney	c27e43a10c	ext4: minor cleanup of ext4_da_reserve_space() Remove outdated comments and dead code from ext4_da_reserve_space. Clean up its trace point, and relocate it to make it more useful. While we're at it, fix a nearby conditional used to determine if we have a non-bigalloc file system. It doesn't match usage elsewhere in the code, and misleadingly suggests that an s_cluster_ratio value of 0 would be legal. Signed-off-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2015-06-21 21:37:05 -04:00
Darrick J. Wong	292db1bc6c	ext4: don't retry file block mapping on bigalloc fs with non-extent file ext4 isn't willing to map clusters to a non-extent file. Don't signal this with an out of space error, since the FS will retry the allocation (which didn't fail) forever. Instead, return EUCLEAN so that the operation will fail immediately all the way back to userspace. (The fix is either to run e2fsck -E bmap2extent, or to chattr +e the file.) Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org	2015-06-21 21:10:51 -04:00
Theodore Ts'o	c5e298ae53	ext4: prevent ext4_quota_write() from failing due to ENOSPC In order to prevent quota block tracking to be inaccurate when ext4_quota_write() fails with ENOSPC, we make two changes. The quota file can now use the reserved block (since the quota file is arguably file system metadata), and ext4_quota_write() now uses ext4_should_retry_alloc() to retry the block allocation after a commit has completed and released some blocks for allocation. This fixes failures of xfstests generic/270: Quota error (device vdc): write_blk: dquota write failed Quota error (device vdc): qtree_write_dquot: Error -28 occurred while creating quota Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2015-06-21 01:25:29 -04:00
Theodore Ts'o	89d96a6f8e	ext4: call sync_blockdev() before invalidate_bdev() in put_super() Normally all of the buffers will have been forced out to disk before we call invalidate_bdev(), but there will be some cases, where a file system operation was aborted due to an ext4_error(), where there may still be some dirty buffers in the buffer cache for the device. So try to force them out to memory before calling invalidate_bdev(). This fixes a warning triggered by generic/081: WARNING: CPU: 1 PID: 3473 at /usr/projects/linux/ext4/fs/block_dev.c:56 __blkdev_put+0xb5/0x16f() Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org	2015-06-20 22:50:33 -04:00
Jan Kara	2143c1965a	jbd2: speedup jbd2_journal_dirty_metadata() It is often the case that we mark buffer as having dirty metadata when the buffer is already in that state (frequent for bitmaps, inode table blocks, superblock). Thus it is unnecessary to contend on grabbing journal head reference and bh_state lock. Avoid that by checking whether any modification to the buffer is needed before grabbing any locks or references. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2015-06-20 21:44:17 -04:00
Michal Hocko	7b506b1035	jbd2: get rid of open coded allocation retry loop insert_revoke_hash does an open coded endless allocation loop if journal_oom_retry is true. It doesn't implement any allocation fallback strategy between the retries, though. The memory allocator doesn't know about the never fail requirement so it cannot potentially help to move on with the allocation (e.g. use memory reserves). Get rid of the retry loop and use __GFP_NOFAIL instead. We will lose the debugging message but I am not sure it is anyhow helpful. Do the same for journal_alloc_journal_head which is doing a similar thing. Signed-off-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2015-06-15 15:45:58 -04:00
Andreas Dilger	b03a2f7eb2	ext4: improve warning directory handling messages Several ext4_warning() messages in the directory handling code do not report the inode number of the (potentially corrupt) directory where a problem is seen, and others report this in an ad-hoc manner. Add an ext4_warning_inode() helper to print the inode number and command name consistent with ext4_error_inode(). Consolidate the place in ext4.h that these macros are defined. Clean up some other directory error and warning messages to print the calling function name. Minor code style fixes in nearby lines. Signed-off-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2015-06-15 14:50:26 -04:00
Joseph Qi	6f6a6fda29	jbd2: fix ocfs2 corrupt when updating journal superblock fails If updating journal superblock fails after journal data has been flushed, the error is omitted and this will mislead the caller as a normal case. In ocfs2, the checkpoint will be treated successfully and the other node can get the lock to update. Since the sb_start is still pointing to the old log block, it will rewrite the journal data during journal recovery by the other node. Thus the new updates will be overwritten and ocfs2 corrupts. So in above case we have to return the error, and ocfs2_commit_cache will take care of the error and prevent the other node to do update first. And only after recovering journal it can do the new updates. The issue discussion mail can be found at: https://oss.oracle.com/pipermail/ocfs2-devel/2015-June/010856.html http://comments.gmane.org/gmane.comp.file-systems.ext4/48841 [ Fixed bug in patch which allowed a non-negative error return from jbd2_cleanup_journal_tail() to leak out of jbd2_fjournal_flush(); this was causing xfstests ext4/306 to fail. -- Ted ] Reported-by: Yiwen Jiang <jiangyiwen@huawei.com> Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Tested-by: Yiwen Jiang <jiangyiwen@huawei.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: stable@vger.kernel.org	2015-06-15 14:36:01 -04:00
Rasmus Villemoes	97b4af2f76	ext4: mballoc: avoid 20-argument function call Making a function call with 20 arguments is rather expensive in both stack and .text. In this case, doing the formatting manually doesn't make it any less readable, so we might as well save 155 bytes of .text and 112 bytes of stack. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>	2015-06-15 00:32:58 -04:00
Lukas Czerner	0d306dcf86	ext4: wait for existing dio workers in ext4_alloc_file_blocks() Currently existing dio workers can jump in and potentially increase extent tree depth while we're allocating blocks in ext4_alloc_file_blocks(). This may cause us to underestimate the number of credits needed for the transaction because the extent tree depth can change after our estimation. Fix this by waiting for all the existing dio workers in the same way as we do it in ext4_punch_hole. We've seen errors caused by this in xfstest generic/299, however it's really hard to reproduce. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2015-06-15 00:23:53 -04:00
Lukas Czerner	4134f5c88d	ext4: recalculate journal credits as inode depth changes Currently in ext4_alloc_file_blocks() the number of credits is calculated only once before we enter the allocation loop. However within the allocation loop the extent tree depth can change, hence the number of credits needed can increase potentially exceeding the number of credits reserved in the handle which can cause journal failures. Fix this by recalculating number of credits when the inode depth changes. Note that even though ext4_alloc_file_blocks() is only currently used by extent base inodes we will avoid recalculating number of credits unnecessarily in the case of indirect based inodes. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2015-06-15 00:20:46 -04:00
Dmitry Monakhov	b4f1afcd06	jbd2: use GFP_NOFS in jbd2_cleanup_journal_tail() jbd2_cleanup_journal_tail() can be invoked by jbd2__journal_start() So allocations should be done with GFP_NOFS [Full stack trace snipped from 3.10-rh7] [<ffffffff815c4bd4>] dump_stack+0x19/0x1b [<ffffffff8105dba1>] warn_slowpath_common+0x61/0x80 [<ffffffff8105dcca>] warn_slowpath_null+0x1a/0x20 [<ffffffff815c2142>] slab_pre_alloc_hook.isra.31.part.32+0x15/0x17 [<ffffffff8119c045>] kmem_cache_alloc+0x55/0x210 [<ffffffff811477f5>] ? mempool_alloc_slab+0x15/0x20 [<ffffffff811477f5>] mempool_alloc_slab+0x15/0x20 [<ffffffff81147939>] mempool_alloc+0x69/0x170 [<ffffffff815cb69e>] ? _raw_spin_unlock_irq+0xe/0x20 [<ffffffff8109160d>] ? finish_task_switch+0x5d/0x150 [<ffffffff811f1a8e>] bio_alloc_bioset+0x1be/0x2e0 [<ffffffff8127ee49>] blkdev_issue_flush+0x99/0x120 [<ffffffffa019a733>] jbd2_cleanup_journal_tail+0x93/0xa0 [jbd2] -->GFP_KERNEL [<ffffffffa019aca1>] jbd2_log_do_checkpoint+0x221/0x4a0 [jbd2] [<ffffffffa019afc7>] __jbd2_log_wait_for_space+0xa7/0x1e0 [jbd2] [<ffffffffa01952d8>] start_this_handle+0x2d8/0x550 [jbd2] [<ffffffff811b02a9>] ? __memcg_kmem_put_cache+0x29/0x30 [<ffffffff8119c120>] ? kmem_cache_alloc+0x130/0x210 [<ffffffffa019573a>] jbd2__journal_start+0xba/0x190 [jbd2] [<ffffffff811532ce>] ? lru_cache_add+0xe/0x10 [<ffffffffa01c9549>] ? ext4_da_write_begin+0xf9/0x330 [ext4] [<ffffffffa01f2c77>] __ext4_journal_start_sb+0x77/0x160 [ext4] [<ffffffffa01c9549>] ext4_da_write_begin+0xf9/0x330 [ext4] [<ffffffff811446ec>] generic_file_buffered_write_iter+0x10c/0x270 [<ffffffff81146918>] __generic_file_write_iter+0x178/0x390 [<ffffffff81146c6b>] __generic_file_aio_write+0x8b/0xb0 [<ffffffff81146ced>] generic_file_aio_write+0x5d/0xc0 [<ffffffffa01bf289>] ext4_file_write+0xa9/0x450 [ext4] [<ffffffff811c31d9>] ? pipe_read+0x379/0x4f0 [<ffffffff811b93f0>] do_sync_write+0x90/0xe0 [<ffffffff811b9b6d>] vfs_write+0xbd/0x1e0 [<ffffffff811ba5b8>] SyS_write+0x58/0xb0 [<ffffffff815d4799>] system_call_fastpath+0x16/0x1b Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org	2015-06-15 00:18:02 -04:00
Fabian Frederick	bf86546760	ext4: use swap() in mext_page_double_lock() Use kernel.h macro definition. Thanks to Julia Lawall for Coccinelle scripting support. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2015-06-12 23:47:33 -04:00
Fabian Frederick	4b7e2db5c0	ext4: use swap() in memswap() Use kernel.h macro definition. Thanks to Julia Lawall for Coccinelle scripting support. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2015-06-12 23:46:33 -04:00
Theodore Ts'o	bdf96838ae	ext4: fix race between truncate and __ext4_journalled_writepage() The commit cf108bca465d: "ext4: Invert the locking order of page_lock and transaction start" caused __ext4_journalled_writepage() to drop the page lock before the page was written back, as part of changing the locking order to jbd2_journal_start -> page_lock. However, this introduced a potential race if there was a truncate racing with the data=journalled writeback mode. Fix this by grabbing the page lock after starting the journal handle, and then checking to see if page had gotten truncated out from under us. This fixes a number of different warnings or BUG_ON's when running xfstests generic/086 in data=journalled mode, including: jbd2_journal_dirty_metadata: vdc-8: bad jh for block 115643: transaction (ee3fe7 c0, 164), jh->b_transaction ( (null), 0), jh->b_next_transaction ( (null), 0), jlist 0 - and - kernel BUG at /usr/projects/linux/ext4/fs/jbd2/transaction.c:2200! ... Call Trace: [<c02b2ded>] ? __ext4_journalled_invalidatepage+0x117/0x117 [<c02b2de5>] __ext4_journalled_invalidatepage+0x10f/0x117 [<c02b2ded>] ? __ext4_journalled_invalidatepage+0x117/0x117 [<c027d883>] ? lock_buffer+0x36/0x36 [<c02b2dfa>] ext4_journalled_invalidatepage+0xd/0x22 [<c0229139>] do_invalidatepage+0x22/0x26 [<c0229198>] truncate_inode_page+0x5b/0x85 [<c022934b>] truncate_inode_pages_range+0x156/0x38c [<c0229592>] truncate_inode_pages+0x11/0x15 [<c022962d>] truncate_pagecache+0x55/0x71 [<c02b913b>] ext4_setattr+0x4a9/0x560 [<c01ca542>] ? current_kernel_time+0x10/0x44 [<c026c4d8>] notify_change+0x1c7/0x2be [<c0256a00>] do_truncate+0x65/0x85 [<c0226f31>] ? file_ra_state_init+0x12/0x29 - and - WARNING: CPU: 1 PID: 1331 at /usr/projects/linux/ext4/fs/jbd2/transaction.c:1396 irty_metadata+0x14a/0x1ae() ... Call Trace: [<c01b879f>] ? console_unlock+0x3a1/0x3ce [<c082cbb4>] dump_stack+0x48/0x60 [<c0178b65>] warn_slowpath_common+0x89/0xa0 [<c02ef2cf>] ? jbd2_journal_dirty_metadata+0x14a/0x1ae [<c0178bef>] warn_slowpath_null+0x14/0x18 [<c02ef2cf>] jbd2_journal_dirty_metadata+0x14a/0x1ae [<c02d8615>] __ext4_handle_dirty_metadata+0xd4/0x19d [<c02b2f44>] write_end_fn+0x40/0x53 [<c02b4a16>] ext4_walk_page_buffers+0x4e/0x6a [<c02b59e7>] ext4_writepage+0x354/0x3b8 [<c02b2f04>] ? mpage_release_unused_pages+0xd4/0xd4 [<c02b1b21>] ? wait_on_buffer+0x2c/0x2c [<c02b5a4b>] ? ext4_writepage+0x3b8/0x3b8 [<c02b5a5b>] __writepage+0x10/0x2e [<c0225956>] write_cache_pages+0x22d/0x32c [<c02b5a4b>] ? ext4_writepage+0x3b8/0x3b8 [<c02b6ee8>] ext4_writepages+0x102/0x607 [<c019adfe>] ? sched_clock_local+0x10/0x10e [<c01a8a7c>] ? __lock_is_held+0x2e/0x44 [<c01a8ad5>] ? lock_is_held+0x43/0x51 [<c0226dff>] do_writepages+0x1c/0x29 [<c0276bed>] __writeback_single_inode+0xc3/0x545 [<c0277c07>] writeback_sb_inodes+0x21f/0x36d ... Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org	2015-06-12 23:45:33 -04:00
Theodore Ts'o	1cb767cd4a	ext4 crypto: fail the mount if blocksize != pagesize We currently don't correctly handle the case where blocksize != pagesize, so disallow the mount in those cases. Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2015-06-12 23:44:33 -04:00
Namjae Jeon	331573febb	ext4: Add support FALLOC_FL_INSERT_RANGE for fallocate This patch implements fallocate's FALLOC_FL_INSERT_RANGE for Ext4. 1) Make sure that both offset and len are block size aligned. 2) Update the i_size of inode by len bytes. 3) Compute the file's logical block number against offset. If the computed block number is not the starting block of the extent, split the extent such that the block number is the starting block of the extent. 4) Shift all the extents which are lying between [offset, last allocated extent] towards right by len bytes. This step will make a hole of len bytes at offset. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com>	2015-06-09 01:55:03 -04:00
Jan Kara	de92c8caf1	jbd2: speedup jbd2_journal_get_[write\|undo]_access() jbd2_journal_get_write_access() and jbd2_journal_get_create_access() are frequently called for buffers that are already part of the running transaction - most frequently it is the case for bitmaps, inode table blocks, and superblock. Since in such cases we have nothing to do, it is unfortunate we still grab reference to journal head, lock the bh, lock bh_state only to find out there's nothing to do. Improving this is a bit subtle though since until we find out journal head is attached to the running transaction, it can disappear from under us because checkpointing / commit decided it's no longer needed. We deal with this by protecting journal_head slab with RCU. We still have to be careful about journal head being freed & reallocated within slab and about exposing journal head in consistent state (in particular b_modified and b_frozen_data must be in correct state before we allow user to touch the buffer). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2015-06-08 12:46:37 -04:00
Jan Kara	8b00f400ee	jbd2: more simplifications in do_get_write_access() Check for the simple case of unjournaled buffer first, handle it and bail out. This allows us to remove one if and unindent the difficult case by one tab. The result is easier to read. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2015-06-08 12:44:21 -04:00
Jan Kara	d012aa5965	jbd2: simplify error path on allocation failure in do_get_write_access() We were acquiring bh_state_lock when allocation of buffer failed in do_get_write_access() only to be able to jump to a label that releases the lock and does all other checks that don't make sense for this error path. Just jump into the right label instead. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2015-06-08 12:40:39 -04:00
Jan Kara	ee57aba159	jbd2: simplify code flow in do_get_write_access() needs_copy is set only in one place in do_get_write_access(), just move the frozen buffer copying into that place and factor it out to a separate function to make do_get_write_access() slightly more readable. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2015-06-08 12:39:07 -04:00
Fabian Frederick	b4ab9e2982	ext4 crypto: fix sparse warnings in fs/ext4/ioctl.c [ Added another sparse fix for EXT4_IOC_GET_ENCRYPTION_POLICY while we're at it. --tytso ] Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2015-06-08 12:23:21 -04:00
David Moore	8bc3b1e6e8	ext4: BUG_ON assertion repeated for inode1, not done for inode2 During a source code review of fs/ext4/extents.c I noted identical consecutive lines. An assertion is repeated for inode1 and never done for inode2. This is not in keeping with the rest of the code in the ext4_swap_extents function and appears to be a bug. Assert that the inode2 mutex is not locked. Signed-off-by: David Moore <dmoorefo@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Eric Sandeen <sandeen@redhat.com>	2015-06-08 11:59:12 -04:00
Theodore Ts'o	ad0a0ce894	ext4 crypto: fix ext4_get_crypto_ctx()'s calling convention in ext4_decrypt_one Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2015-06-08 11:54:56 -04:00
Lukas Czerner	42ac1848ea	ext4: return error code from ext4_mb_good_group() Currently ext4_mb_good_group() only returns 0 or 1 depending on whether the allocation group is suitable for use or not. However we might get various errors and fail while initializing new group including -EIO which would never get propagated up the call chain. This might lead to an endless loop at writeback when we're trying to find a good group to allocate from and we fail to initialize new group (read error for example). Fix this by returning proper error code from ext4_mb_good_group() and using it in ext4_mb_regular_allocator(). In ext4_mb_regular_allocator() we will always return only the first occurred error from ext4_mb_good_group() and we only propagate it back to the caller if we do not get any other errors and we fail to allocate any blocks. Note that with other modes than errors=continue, we will fail immediately in ext4_mb_good_group() in case of error, however with errors=continue we should try to continue using the file system, that's why we're not going to fail immediately when we see an error from ext4_mb_good_group(), but rather when we fail to find a suitable block group to allocate from due to an problem in group initialization. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>	2015-06-08 11:40:40 -04:00
Lukas Czerner	bbdc322f2c	ext4: try to initialize all groups we can in case of failure on ppc64 Currently on the machines with page size > block size when initializing block group buddy cache we initialize it for all the block group bitmaps in the page. However in the case of read error, checksum error, or if a single bitmap is in any way corrupted we would fail to initialize all of the bitmaps. This is problematic because we will not have access to the other allocation groups even though those might be perfectly fine and usable. Fix this by reading all the bitmaps instead of error out on the first problem and simply skip the bitmaps which were either not read properly, or are not valid. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2015-06-08 11:38:37 -04:00
Lukas Czerner	41e5b7ed3e	ext4: verify block bitmap even after fresh initialization If we want to rely on the buffer_verified() flag of the block bitmap buffer, we have to set it consistently. However currently if we're initializing uninitialized block bitmap in ext4_read_block_bitmap_nowait() we're not going to set buffer verified at all. We can do this by simply setting the flag on the buffer, but I think it's actually better to run ext4_validate_block_bitmap() to make sure that what we did in the ext4_init_block_bitmap() is right. So run ext4_validate_block_bitmap() even after the block bitmap initialization. Also bail out early from ext4_validate_block_bitmap() if we see corrupt bitmap, since we already know it's corrupt and we do not need to verify that. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2015-06-08 11:18:52 -04:00
Michal Hocko	6ccaf3e2f3	jbd2: revert must-not-fail allocation loops back to GFP_NOFAIL This basically reverts `47def82672` (jbd2: Remove __GFP_NOFAIL from jbd2 layer). The deprecation of __GFP_NOFAIL was a bad choice because it led to open coding the endless loop around the allocator rather than removing the dependency on the non failing allocation. So the deprecation was a clear failure and the reality tells us that __GFP_NOFAIL is not even close to go away. It is still true that __GFP_NOFAIL allocations are generally discouraged and new uses should be evaluated and an alternative (pre-allocations or reservations) should be considered but it doesn't make any sense to lie the allocator about the requirements. Allocator can take steps to help making a progress if it knows the requirements. Signed-off-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Acked-by: David Rientjes <rientjes@google.com>	2015-06-08 10:53:10 -04:00
Theodore Ts'o	3dbb5eb9a3	ext4 crypto: allocate bounce pages using GFP_NOWAIT Previously we allocated bounce pages using a combination of alloc_page() and mempool_alloc() with the __GFP_WAIT bit set. Instead, use mempool_alloc() with GFP_NOWAIT. The mempool_alloc() function will try using alloc_pages() initially, and then only use the mempool reserve of pages if alloc_pages() is unable to fulfill the request. This minimizes the the impact on the mm layer when we need to do a large amount of writeback of encrypted files, as Jaeguk Kim had reported that under a heavy fio workload on a system with restricted amounts memory (which unfortunately, includes many mobile handsets), he had observed the the OOM killer getting triggered several times. Using GFP_NOWAIT If the mempool_alloc() function fails, we will retry the page writeback at a later time; the function of the mempool is to ensure that we can writeback at least 32 pages at a time, so we can more efficiently dispatch I/O under high memory pressure situations. In the future we should make this be a tunable so we can determine the best tradeoff between permanently sequestering memory and the ability to quickly launder pages so we can free up memory quickly when necessary. Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2015-06-03 09:32:39 -04:00
Chao Yu	e298e73bd7	ext4 crypto: release crypto resource on module exit Crypto resource should be released when ext4 module exits, otherwise it will cause memory leak. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2015-05-31 13:37:35 -04:00
Theodore Ts'o	abdd438b26	ext4 crypto: handle unexpected lack of encryption keys Fix up attempts by users to try to write to a file when they don't have access to the encryption key. Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2015-05-31 13:35:39 -04:00
Theodore Ts'o	4d3c4e5b8c	ext4 crypto: allocate the right amount of memory for the on-disk symlink Previously we were taking the required padding when allocating space for the on-disk symlink. This caused a buffer overrun which could trigger a krenel crash when running fsstress. Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2015-05-31 13:35:32 -04:00
Theodore Ts'o	82d0d3e7e6	ext4 crypto: clean up error handling in ext4_fname_setup_filename Fix a potential memory leak where fname->crypto_buf.name wouldn't get freed in some error paths, and also make the error handling easier to understand/audit. Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2015-05-31 13:35:22 -04:00
Theodore Ts'o	d87f6d78e9	ext4 crypto: policies may only be set on directories Thanks to Chao Yu <chao2.yu@samsung.com> for pointing out we were missing this check. Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2015-05-31 13:35:14 -04:00
Theodore Ts'o	c2faccaff6	ext4 crypto: enforce crypto policy restrictions on cross-renames Thanks to Chao Yu <chao2.yu@samsung.com> for pointing out the need for this check. Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2015-05-31 13:35:09 -04:00
Theodore Ts'o	e709e9df64	ext4 crypto: encrypt tmpfile located in encryption protected directory Factor out calls to ext4_inherit_context() and move them to __ext4_new_inode(); this fixes a problem where ext4_tmpfile() wasn't calling calling ext4_inherit_context(), so the temporary file wasn't getting protected. Since the blocks for the tmpfile could end up on disk, they really should be protected if the tmpfile is created within the context of an encrypted directory. Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2015-05-31 13:35:02 -04:00
Theodore Ts'o	6bc445e0ff	ext4 crypto: make sure the encryption info is initialized on opendir(2) Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2015-05-31 13:34:57 -04:00
Theodore Ts'o	5555702955	ext4 crypto: set up encryption info for new inodes in ext4_inherit_context() Set up the encryption information for newly created inodes immediately after they inherit their encryption context from their parent directories. Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2015-05-31 13:34:29 -04:00
Theodore Ts'o	95ea68b4c7	ext4 crypto: fix memory leaks in ext4_encrypted_zeroout ext4_encrypted_zeroout() could end up leaking a bio and bounce page. Fortunately it's not used much. While we're fixing things up, refactor out common code into the static function alloc_bounce_page() and fix up error handling if mempool_alloc() fails. Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2015-05-31 13:34:24 -04:00
Theodore Ts'o	c936e1ec28	ext4 crypto: use per-inode tfm structure As suggested by Herbert Xu, we shouldn't allocate a new tfm each time we read or write a page. Instead we can use a single tfm hanging off the inode's crypt_info structure for all of our encryption needs for that inode, since the tfm can be used by multiple crypto requests in parallel. Also use cmpxchg() to avoid races that could result in crypt_info structure getting doubly allocated or doubly freed. Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2015-05-31 13:34:22 -04:00
Theodore Ts'o	71dea01ea2	ext4 crypto: require CONFIG_CRYPTO_CTR if ext4 encryption is enabled On arm64 this is apparently needed for CTS mode to function correctly. Otherwise attempts to use CTS return ENOENT. Change-Id: I732ea9a5157acc76de5b89edec195d0365f4ca63 Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2015-05-31 13:31:37 -04:00
Theodore Ts'o	614def7013	ext4 crypto: shrink size of the ext4_crypto_ctx structure Some fields are only used when the crypto_ctx is being used on the read path, some are only used on the write path, and some are only used when the structure is on free list. Optimize memory use by using a union. Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2015-05-31 13:31:34 -04:00
Theodore Ts'o	1aaa6e8b24	ext4 crypto: get rid of ci_mode from struct ext4_crypt_info The ci_mode field was superfluous, and getting rid of it gets rid of an unused hole in the structure. Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2015-05-18 13:20:47 -04:00
Theodore Ts'o	8ee0371470	ext4 crypto: use slab caches Use slab caches the ext4_crypto_ctx and ext4_crypt_info structures for slighly better memory efficiency and debuggability. Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2015-05-18 13:19:47 -04:00
Theodore Ts'o	f5aed2c2a8	ext4: clean up superblock encryption mode fields The superblock fields s_file_encryption_mode and s_dir_encryption_mode are vestigal, so remove them as a cleanup. While we're at it, allow file systems with both encryption and inline_data enabled at the same time to work correctly. We can't have encrypted inodes with inline data, but there's no reason to prohibit unencrypted inodes from using the inline data feature. Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2015-05-18 13:18:47 -04:00
Theodore Ts'o	b7236e21d5	ext4 crypto: reorganize how we store keys in the inode This is a pretty massive patch which does a number of different things: 1) The per-inode encryption information is now stored in an allocated data structure, ext4_crypt_info, instead of directly in the node. This reduces the size usage of an in-memory inode when it is not using encryption. 2) We drop the ext4_fname_crypto_ctx entirely, and use the per-inode encryption structure instead. This remove an unnecessary memory allocation and free for the fname_crypto_ctx as well as allowing us to reuse the ctfm in a directory for multiple lookups and file creations. 3) We also cache the inode's policy information in the ext4_crypt_info structure so we don't have to continually read it out of the extended attributes. 4) We now keep the keyring key in the inode's encryption structure instead of releasing it after we are done using it to derive the per-inode key. This allows us to test to see if the key has been revoked; if it has, we prevent the use of the derived key and free it. 5) When an inode is released (or when the derived key is freed), we will use memset_explicit() to zero out the derived key, so it's not left hanging around in memory. This implies that when a user logs out, it is important to first revoke the key, and then unlink it, and then finally, to use "echo 3 > /proc/sys/vm/drop_caches" to release any decrypted pages and dcache entries from the system caches. 6) All this, and we also shrink the number of lines of code by around 100. :-) Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2015-05-18 13:17:47 -04:00
Theodore Ts'o	e2881b1b51	ext4 crypto: separate kernel and userspace structure for the key Use struct ext4_encryption_key only for the master key passed via the kernel keyring. For internal kernel space users, we now use struct ext4_crypt_info. This will allow us to put information from the policy structure so we can cache it and avoid needing to constantly looking up the extended attribute. We will do this in a spearate patch. This patch is mostly mechnical to make it easier for patch review. Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2015-05-18 13:16:47 -04:00
Theodore Ts'o	d229959072	ext4 crypto: don't allocate a page when encrypting/decrypting file names Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2015-05-18 13:15:47 -04:00
Theodore Ts'o	5b643f9ce3	ext4 crypto: optimize filename encryption Encrypt the filename as soon it is passed in by the user. This avoids our needing to encrypt the filename 2 or 3 times while in the process of creating a filename. Similarly, when looking up a directory entry, encrypt the filename early, or if the encryption key is not available, base-64 decode the file syystem so that the hash value and the last 16 bytes of the encrypted filename is available in the new struct ext4_filename data structure. Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2015-05-18 13:14:47 -04:00

1 2 3 4 5 ...

40468 Commits