linux/fs
Dave Chinner 1f2dcfe89e xfs: xfs_inode_free() isn't RCU safe
The xfs_inode freed in xfs_inode_free() has multiple allocated
structures attached to it. We free these in xfs_inode_free() before
we mark the inode as invalid, and before we run call_rcu() to queue
the structure for freeing.

Unfortunately, this freeing can race with other accesses that are in
the RCU current grace period that have found the inode in the radix
tree with a valid state.  This includes xfs_iflush_cluster(), which
calls xfs_inode_clean(), and that accesses the inode log item on the
xfs_inode.

The log item structure is freed in xfs_inode_free(), so there is the
possibility we can be accessing freed memory in xfs_iflush_cluster()
after validating the xfs_inode structure as being valid for this RCU
context. Hence we can get spuriously incorrect clean state returned
from such checks. This can lead to use thinking the inode is dirty
when it is, in fact, clean, and so incorrectly attaching it to the
buffer for IO and completion processing.

This then leads to use-after-free situations on the xfs_inode itself
if the IO completes after the current RCU grace period expires. The
buffer callbacks will access the xfs_inode and try to do all sorts
of things it shouldn't with freed memory.

IOWs, xfs_iflush_cluster() only works correctly when racing with
inode reclaim if the inode log item is present and correctly stating
the inode is clean. If the inode is being freed, then reclaim has
already made sure the inode is clean, and hence xfs_iflush_cluster
can skip it. However, we are accessing the inode inode under RCU
read lock protection and so also must ensure that all dynamically
allocated memory we reference in this context is not freed until the
RCU grace period expires.

To fix this, move all the potential memory freeing into
xfs_inode_free_callback() so that we are guarantee RCU protected
lookup code will always have the memory structures it needs
available during the RCU grace period that lookup races can occur
in.

Discovered-by: Brain Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-05-18 14:01:53 +10:00
..
9p wrappers for ->i_mutex access 2016-01-22 18:04:28 -05:00
adfs fs/adfs/adfs.h: tidy up comments 2016-01-20 17:09:18 -08:00
affs affs_do_readpage_ofs(): just use kmap_atomic() around memcpy() 2016-02-20 00:15:51 -05:00
afs wrappers for ->i_mutex access 2016-01-22 18:04:28 -05:00
autofs4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-03-19 18:52:29 -07:00
befs kmemcg: account certain kmem allocations to memcg 2016-01-14 16:00:49 -08:00
bfs kmemcg: account certain kmem allocations to memcg 2016-01-14 16:00:49 -08:00
btrfs Merge branch 'for-linus-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs 2016-03-21 18:12:42 -07:00
cachefiles CacheFiles: Provide read-and-reset release counters for cachefilesd 2016-02-01 12:30:10 -05:00
ceph ceph: use kmem_cache_zalloc 2016-03-25 18:51:56 +01:00
cifs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-03-19 18:52:29 -07:00
coda Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-01-23 12:24:56 -08:00
configfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-03-19 18:52:29 -07:00
cramfs don't put symlink bodies in pagecache into highmem 2015-12-08 22:41:36 -05:00
crypto f2fs/crypto: fix xts_tweak initialization 2016-03-26 10:13:05 -07:00
debugfs wrappers for ->i_mutex access 2016-01-22 18:04:28 -05:00
devpts pty: make sure super_block is still valid in final /dev/tty close 2016-02-06 23:45:46 -08:00
dlm dlm for 4.6 2016-03-17 16:38:36 -07:00
ecryptfs Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2016-03-23 06:12:39 -07:00
efivarfs efi: Make efivarfs entries immutable by default 2016-02-10 16:25:52 +00:00
efs kmemcg: account certain kmem allocations to memcg 2016-01-14 16:00:49 -08:00
exofs wrappers for ->i_mutex access 2016-01-22 18:04:28 -05:00
exportfs wrappers for ->i_mutex access 2016-01-22 18:04:28 -05:00
ext2 Performance improvements in SEEK_DATA and xattr scalability 2016-03-17 16:31:18 -07:00
ext4 ext4: in ext4_dir_llseek, check syscall bitness directly 2016-03-22 15:36:02 -07:00
f2fs f2fs: submit node page write bios when really required 2016-03-17 21:19:47 -07:00
fat fat: add config option to set UTF-8 mount option by default 2016-03-22 15:36:02 -07:00
freevxfs don't put symlink bodies in pagecache into highmem 2015-12-08 22:41:36 -05:00
fscache FS-Cache: Handle a write to the page immediately beyond the EOF marker 2015-11-11 02:11:02 -05:00
fuse fuse: return patrial success from fuse_direct_io() 2016-03-16 14:38:31 +01:00
gfs2 GFS2: merge window 2016-03-17 16:51:32 -07:00
hfs wrappers for ->i_mutex access 2016-01-22 18:04:28 -05:00
hfsplus wrappers for ->i_mutex access 2016-01-22 18:04:28 -05:00
hostfs wrappers for ->i_mutex access 2016-01-22 18:04:28 -05:00
hpfs hpfs: don't truncate the file when delete fails 2016-02-27 19:15:51 -05:00
hugetlbfs wrappers for ->i_mutex access 2016-01-22 18:04:28 -05:00
isofs kmemcg: account certain kmem allocations to memcg 2016-01-14 16:00:49 -08:00
jbd2 jbd2: do not fail journal because of frozen_buffer allocation failure 2016-03-13 17:38:20 -04:00
jffs2 MTD updates for v4.6 2016-03-24 19:57:15 -07:00
jfs wrappers for ->i_mutex access 2016-01-22 18:04:28 -05:00
kernfs Merge branch 'for-4.6-ns' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2016-03-21 10:05:13 -07:00
lockd lockd: constify nlmsvc_binding structure 2016-01-07 10:10:50 -05:00
logfs wrappers for ->i_mutex access 2016-01-22 18:04:28 -05:00
minix kmemcg: account certain kmem allocations to memcg 2016-01-14 16:00:49 -08:00
ncpfs ncpfs: fix a braino in OOM handling in ncp_fill_cache() 2016-03-07 22:25:16 -05:00
nfs Various bugfixes, a RDMA update from Chuck Lever, and support for a new 2016-03-24 19:50:32 -07:00
nfs_common lockd: NLM grace period shouldn't block NFSv4 opens 2015-08-13 10:22:06 -04:00
nfsd Various bugfixes, a RDMA update from Chuck Lever, and support for a new 2016-03-24 19:50:32 -07:00
nilfs2 mm: introduce page reference manipulation functions 2016-03-17 15:09:34 -07:00
nls
notify fsnotify: turn fsnotify reaper thread into a workqueue job 2016-02-18 16:23:24 -08:00
ntfs wrappers for ->i_mutex access 2016-01-22 18:04:28 -05:00
ocfs2 ocfs2: extend enough credits for freeing one truncate record while replaying truncate records 2016-03-25 16:37:42 -07:00
omfs
openpromfs kmemcg: account certain kmem allocations to memcg 2016-01-14 16:00:49 -08:00
orangefs orangefs: fix orangefs_superblock locking 2016-03-26 07:22:00 -04:00
overlayfs ovl: cleanup unused var in rename2 2016-03-21 17:31:46 +01:00
proc Merge branch 'for-4.6-ns' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2016-03-21 10:05:13 -07:00
pstore pstore: Add support for 64 Bit address space 2016-03-10 09:43:36 -08:00
qnx4 kmemcg: account certain kmem allocations to memcg 2016-01-14 16:00:49 -08:00
qnx6 kmemcg: account certain kmem allocations to memcg 2016-01-14 16:00:49 -08:00
quota Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2016-03-21 12:22:37 -07:00
ramfs don't put symlink bodies in pagecache into highmem 2015-12-08 22:41:36 -05:00
reiserfs quota: Add support for ->get_nextdqblk() for VFS quota 2016-02-09 13:05:23 +01:00
romfs kmemcg: account certain kmem allocations to memcg 2016-01-14 16:00:49 -08:00
squashfs kmemcg: account certain kmem allocations to memcg 2016-01-14 16:00:49 -08:00
sysfs platform/chrome: Branch for v4.4 2015-11-13 21:53:18 -08:00
sysv kmemcg: account certain kmem allocations to memcg 2016-01-14 16:00:49 -08:00
tracefs wrappers for ->i_mutex access 2016-01-22 18:04:28 -05:00
ubifs ubifs: Remove unused header 2016-03-20 21:37:46 +01:00
udf udf: Merge linux specific translation into CS0 conversion function 2016-02-09 13:05:23 +01:00
ufs kmemcg: account certain kmem allocations to memcg 2016-01-14 16:00:49 -08:00
xfs xfs: xfs_inode_free() isn't RCU safe 2016-05-18 14:01:53 +10:00
aio.c mm: move ->mremap() from file_operations to vm_operations_struct 2015-09-04 16:54:41 -07:00
anon_inodes.c
attr.c wrappers for ->i_mutex access 2016-01-22 18:04:28 -05:00
bad_inode.c fs/bad_inode.c: is_bad_inode can be boolean 2015-12-06 21:17:14 -05:00
binfmt_aout.c
binfmt_elf_fdpic.c libnvdimm for 4.4: 2015-11-10 12:07:22 -08:00
binfmt_elf.c mm: ASLR: use get_random_long() 2016-02-27 10:28:52 -08:00
binfmt_em86.c
binfmt_flat.c
binfmt_misc.c wrappers for ->i_mutex access 2016-01-22 18:04:28 -05:00
binfmt_script.c
block_dev.c Merge branch 'for-4.6/core' of git://git.kernel.dk/linux-block 2016-03-18 16:43:11 -07:00
buffer.c mm: simplify lock_page_memcg() 2016-03-15 16:55:16 -07:00
char_dev.c fs/char_dev.c: fix incorrect documentation for unregister_chrdev_region 2015-08-05 13:49:35 -07:00
compat_binfmt_elf.c
compat_ioctl.c Merge 4.5-rc4 into char-misc-next 2016-02-14 14:25:59 -08:00
compat.c saner calling conventions for copy_mount_options() 2016-01-04 10:28:32 -05:00
coredump.c fs/coredump: prevent fsuid=0 dumps into user-controlled directories 2016-03-22 15:36:02 -07:00
dax.c xfs: Changes for 4.6-rc1 2016-03-21 11:53:05 -07:00
dcache.c dcache.c: new helper: __d_add() 2016-03-14 00:17:38 -04:00
dcookies.c
direct-io.c xfs: Changes for 4.6-rc1 2016-03-21 11:53:05 -07:00
drop_caches.c inode: convert inode_sb_list_lock to per-sb 2015-08-17 18:39:46 -04:00
eventfd.c eventfd: document lockless access in eventfd_poll 2016-03-22 15:36:02 -07:00
eventpoll.c timer: convert timer_slack_ns from unsigned long to u64 2016-03-17 15:09:34 -07:00
exec.c Merge branch 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-03-20 19:08:56 -07:00
fcntl.c fcntl: allow to set O_DIRECT flag on pipe 2016-01-09 02:55:37 -05:00
fhandle.c fs/coredump: prevent fsuid=0 dumps into user-controlled directories 2016-03-22 15:36:02 -07:00
file_table.c fs, file table: reinit files_stat.max_files after deferred memory initialisation 2015-08-07 04:39:40 +03:00
file.c kmemcg: account certain kmem allocations to memcg 2016-01-14 16:00:49 -08:00
filesystems.c find_filesystem(): simplify comparison 2016-01-19 12:02:23 -05:00
fs_pin.c
fs_struct.c
fs-writeback.c writeback, cgroup: fix use of the wrong bdi_writeback which mismatches the inode 2016-03-20 09:44:20 -06:00
inode.c writeback: initialize inode members that track writeback history 2016-02-16 14:57:21 -07:00
internal.h Merge branch 'for-linus' into work.misc 2016-01-08 21:20:11 -05:00
ioctl.c wrappers for ->i_mutex access 2016-01-22 18:04:28 -05:00
Kconfig Merge tag 'ofs-pull-tag-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux 2016-03-26 12:59:04 -07:00
Kconfig.binfmt
libfs.c wrappers for ->i_mutex access 2016-01-22 18:04:28 -05:00
locks.c wrappers for ->i_mutex access 2016-01-22 18:04:28 -05:00
Makefile Merge tag 'ofs-pull-tag-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux 2016-03-26 12:59:04 -07:00
mbcache.c mbcache: add reusable flag to cache entries 2016-02-22 22:44:04 -05:00
mount.h fs: use seq_open_private() for proc_mounts 2015-06-30 19:44:56 -07:00
mpage.c fs/mpage.c:mpage_readpages(): use lru_to_page() helper 2016-03-15 16:55:16 -07:00
namei.c kill dentry_unhash() 2016-03-14 00:16:33 -04:00
namespace.c wrappers for ->i_mutex access 2016-01-22 18:04:28 -05:00
no-block.c
nsfs.c fs/seq_file: convert int seq_vprint/seq_printf/etc... returns to void 2015-09-11 15:21:34 -07:00
open.c fs/coredump: prevent fsuid=0 dumps into user-controlled directories 2016-03-22 15:36:02 -07:00
pipe.c pipe: limit the per-user amount of pages allocated in pipes 2016-01-19 19:25:21 -05:00
pnode.c fs/pnode.c: treat zero mnt_group_id-s as unequal 2016-02-20 00:15:52 -05:00
pnode.h mnt: Clarify and correct the disconnect logic in umount_tree 2015-07-22 20:33:27 -05:00
posix_acl.c xattr handlers: Simplify list operation 2015-12-13 19:46:12 -05:00
proc_namespace.c vfs: show_vfsstat: do not ignore errors from show_devname method 2016-03-16 13:09:08 -04:00
read_write.c Merge branches 'work.lookups', 'work.misc' and 'work.preadv2' into for-next 2016-03-18 16:07:38 -04:00
readdir.c wrappers for ->i_mutex access 2016-01-22 18:04:28 -05:00
select.c timer: convert timer_slack_ns from unsigned long to u64 2016-03-17 15:09:34 -07:00
seq_file.c fs, seqfile: always allow oom killer 2015-11-06 17:50:42 -08:00
signalfd.c signalfd: fix information leak in signalfd_copyinfo 2015-08-07 04:39:40 +03:00
splice.c Merge branches 'work.lookups', 'work.misc' and 'work.preadv2' into for-next 2016-03-18 16:07:38 -04:00
stack.c
stat.c fs/stat.c: drop the last new_valid_dev check 2016-01-16 11:17:23 -08:00
statfs.c
super.c writeback: flush inode cgroup wb switches instead of pinning super_block 2016-03-03 14:42:50 -07:00
sync.c fs/sync.c: make sync_file_range(2) use WB_SYNC_NONE writeback 2015-11-06 17:50:42 -08:00
timerfd.c timerfd: Handle relative timers with CONFIG_TIME_LOW_RES proper 2016-01-17 11:13:55 +01:00
userfaultfd.c userfaultfd: don't block on the last VM updates at exit time 2016-03-02 09:03:18 -08:00
utimes.c wrappers for ->i_mutex access 2016-01-22 18:04:28 -05:00
xattr.c xattr handlers: plug a lock leak in simple_xattr_list 2016-02-20 00:15:51 -05:00