linux/fs
Stefan Behrens ff76b05655 Btrfs: Don't allocate inode that is already in use
Due to an off-by-one error, it is possible to reproduce a bug
when the inode cache is used.

The same inode number is assigned twice, the second time this
leads to an EEXIST in btrfs_insert_empty_items().

The issue can happen when a file is removed right after a subvolume
is created and then a new inode number is created before the
inodes in free_inode_pinned are processed.
unlink() calls btrfs_return_ino() which calls start_caching() in this
case which adds [highest_ino + 1, BTRFS_LAST_FREE_OBJECTID] by
searching for the highest inode (which already cannot find the
unlinked one anymore in btrfs_find_free_objectid()). So if this
unlinked inode's number is equal to the highest_ino + 1 (or >= this value
instead of > this value which was the off-by-one error), we mustn't add
the inode number to free_ino_pinned (caching_thread() does it right).
In this case we need to try directly to add the number to the inode_cache
which will fail in this case.

When this inode number is allocated while it is still in free_ino_pinned,
it is allocated and still added to the free inode cache when the
pinned inodes are processed, thus one of the following inode number
allocations will get an inode that is already in use and fail with EEXIST
in btrfs_insert_empty_items().

One example which was created with the reproducer below:
Create a snapshot, work in the newly created snapshot for the rest.
In unlink(inode 34284) call btrfs_return_ino() which calls start_caching().
start_caching() calls add_free_space [34284, 18446744073709517077].
In btrfs_return_ino(), call start_caching pinned [34284, 1] which is wrong.
mkdir() call btrfs_find_ino_for_alloc() which returns the number 34284.
btrfs_unpin_free_ino calls add_free_space [34284, 1].
mkdir() call btrfs_find_ino_for_alloc() which returns the number 34284.
EEXIST when the new inode is inserted.

One possible reproducer is this one:
 #!/bin/sh
 # preparation
TEST_DEV=/dev/sdc1
TEST_MNT=/mnt
umount ${TEST_MNT} 2>/dev/null || true
mkfs.btrfs -f ${TEST_DEV}
mount ${TEST_DEV} ${TEST_MNT} -o \
 rw,relatime,compress=lzo,space_cache,inode_cache
btrfs subv create ${TEST_MNT}/s1
for i in `seq 34027`; do touch ${TEST_MNT}/s1/${i}; done
btrfs subv snap ${TEST_MNT}/s1 ${TEST_MNT}/s2
FILENAME=`find ${TEST_MNT}/s1/ -inum 4085 | sed 's|^.*/\([^/]*\)$|\1|'`
rm ${TEST_MNT}/s2/$FILENAME
touch ${TEST_MNT}/s2/$FILENAME
 # the following steps can be repeated to reproduce the issue again and again
[ -e ${TEST_MNT}/s3 ] && btrfs subv del ${TEST_MNT}/s3
btrfs subv snap ${TEST_MNT}/s2 ${TEST_MNT}/s3
rm ${TEST_MNT}/s3/$FILENAME
touch ${TEST_MNT}/s3/$FILENAME
ls -alFi ${TEST_MNT}/s?/$FILENAME
touch ${TEST_MNT}/s3/_1 || logger FAILED
ls -alFi ${TEST_MNT}/s?/_1
touch ${TEST_MNT}/s3/_2 || logger FAILED
ls -alFi ${TEST_MNT}/s?/_2
touch ${TEST_MNT}/s3/__1 || logger FAILED
ls -alFi ${TEST_MNT}/s?/__1
touch ${TEST_MNT}/s3/__2 || logger FAILED
ls -alFi ${TEST_MNT}/s?/__2
 # if the above is not enough, add the following loop:
for i in `seq 3 9`; do touch ${TEST_MNT}/s3/__${i} || logger FAILED; done
 #for i in `seq 3 34027`; do touch ${TEST_MNT}/s3/__${i} || logger FAILED; done
 # one of the touch(1) calls in s3 fail due to EEXIST because the inode is
 # already in use that btrfs_find_ino_for_alloc() returns.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Reviewed-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-11-11 22:02:36 -05:00
..
9p 9p: don't forget to destroy inode cache if fscache registration fails 2013-09-17 22:31:01 -04:00
adfs truncate: drop 'oldsize' truncate_pagecache() parameter 2013-09-12 15:38:02 -07:00
affs truncate: drop 'oldsize' truncate_pagecache() parameter 2013-09-12 15:38:02 -07:00
afs afs: dget_parent() can't return a negative dentry 2013-09-29 22:02:24 -04:00
autofs4 autofs4: close the races around autofs4_notify_daemon() 2013-09-16 19:16:38 -04:00
befs [readdir] convert befs 2013-06-29 12:56:55 +04:00
bfs truncate: drop 'oldsize' truncate_pagecache() parameter 2013-09-12 15:38:02 -07:00
btrfs Btrfs: Don't allocate inode that is already in use 2013-11-11 22:02:36 -05:00
cachefiles CacheFiles: Don't try to dump the index key if the cookie has been cleared 2013-09-20 15:15:43 -07:00
ceph ceph: use d_invalidate() to invalidate aliases 2013-09-06 12:55:29 -07:00
cifs cifs: ntstatus_to_dos_map[] is not terminated 2013-10-14 12:14:01 -05:00
coda helper for reading ->d_count 2013-07-05 18:59:33 +04:00
configfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-07-14 11:42:26 -07:00
cramfs [readdir] convert f2fs 2013-06-29 12:56:46 +04:00
debugfs debugfs: debugfs_remove_recursive() must not rely on list_empty(d_subdirs) 2013-07-31 12:16:31 -04:00
devpts fs: Limit sys_mount to only request filesystem modules (Part 2). 2013-03-07 01:08:55 -08:00
dlm dlm: remove signal blocking 2013-08-12 15:22:43 -05:00
ecryptfs eCryptfs: fix 32 bit corruption issue 2013-10-24 12:36:30 -07:00
efivarfs efivarfs: we can use simple_lookup() now 2013-07-14 17:48:35 +04:00
efs efs: iget_locked() doesn't return an ERR_PTR() 2013-08-24 12:10:22 -04:00
exofs truncate: drop 'oldsize' truncate_pagecache() parameter 2013-09-12 15:38:02 -07:00
exportfs exportfs: don't assume that ->iterate() won't feed us too long entries 2013-09-07 19:54:55 -04:00
ext2 truncate: drop 'oldsize' truncate_pagecache() parameter 2013-09-12 15:38:02 -07:00
ext3 ext[34]: fix double put in tmpfile 2013-10-15 12:14:06 -04:00
ext4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-10-16 17:18:18 -07:00
f2fs f2fs: optimize gc for better performance 2013-09-05 13:50:32 +09:00
fat truncate: drop 'oldsize' truncate_pagecache() parameter 2013-09-12 15:38:02 -07:00
freevxfs [readdir] convert freevxfs 2013-06-29 12:56:53 +04:00
fscache Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client 2013-09-19 12:50:37 -05:00
fuse fuse: no RCU mode in fuse_access() 2013-10-01 16:41:23 +02:00
gfs2 gfs2: set FILE_CREATED 2013-09-16 19:17:24 -04:00
hfs truncate: drop 'oldsize' truncate_pagecache() parameter 2013-09-12 15:38:02 -07:00
hfsplus truncate: drop 'oldsize' truncate_pagecache() parameter 2013-09-12 15:38:02 -07:00
hostfs um: hostfs: Fix writeback 2013-09-07 10:38:29 +02:00
hpfs truncate: drop 'oldsize' truncate_pagecache() parameter 2013-09-12 15:38:02 -07:00
hppfs clean up scary strncpy(dst, src, strlen(src)) uses 2013-07-03 16:07:41 -07:00
hugetlbfs cope with potentially long ->d_dname() output for shmem/hugetlb 2013-08-24 12:10:17 -04:00
isofs isofs: Refuse RW mount of the filesystem instead of making it RO 2013-07-31 22:14:50 +02:00
jbd jbd: use a single printk for jbd_debug() 2013-08-09 10:49:00 +02:00
jbd2 jbd2: Fix endian mixing problems in the checksumming code 2013-08-28 14:59:58 -04:00
jffs2 [readdir] convert jffs2 2013-06-29 12:56:47 +04:00
jfs Just a patch to fix an oops in an error path. 2013-10-22 09:01:11 +01:00
lockd LOCKD: Don't call utsname()->nodename from nlmclnt_setlockargs 2013-08-05 15:03:46 -04:00
logfs Lots of bug fixes, cleanups and optimizations. In the bug fixes 2013-07-02 09:39:34 -07:00
minix truncate: drop 'oldsize' truncate_pagecache() parameter 2013-09-12 15:38:02 -07:00
ncpfs ncpfs: fix error return code in ncp_parse_options() 2013-07-09 10:33:25 -07:00
nfs NFS: Give "flavor" an initial value to fix a compile warning 2013-09-29 16:03:34 -04:00
nfs_common
nfsd Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-09-12 15:01:38 -07:00
nilfs2 nilfs2: fix issue with race condition of competition between segments for dirty blocks 2013-09-30 14:31:02 -07:00
nls
notify fsnotify: update comments concerning locking scheme 2013-07-09 10:33:20 -07:00
ntfs truncate: drop 'oldsize' truncate_pagecache() parameter 2013-09-12 15:38:02 -07:00
ocfs2 ocfs2: needs ->d_lock to poke in ->d_parent->d_inode from ->d_revalidate() 2013-09-29 22:02:20 -04:00
omfs truncate: drop 'oldsize' truncate_pagecache() parameter 2013-09-12 15:38:02 -07:00
openpromfs [readdir] convert openpromfs 2013-06-29 12:56:32 +04:00
proc procfs: call default get_unmapped_area on MMU-present architectures 2013-10-16 21:35:53 -07:00
pstore pstore: Remove the messages related to compression failure 2013-09-16 09:28:29 -07:00
qnx4 [readdir] convert qnx4 2013-06-29 12:56:38 +04:00
qnx6 [readdir] convert qnx6 2013-06-29 12:56:39 +04:00
quota fs: convert fs shrinkers to new scan/count API 2013-09-10 18:56:31 -04:00
ramfs initmpfs: move rootfs code from fs/ramfs/ to init/ 2013-09-11 15:59:37 -07:00
reiserfs reiserfs: fix race with flush_used_journal_lists and flush_journal_list 2013-09-24 11:24:21 +02:00
romfs [readdir] convert romfs 2013-06-29 12:56:29 +04:00
squashfs Squashfs: add corruption check for type in squashfs_readdir() 2013-09-06 04:57:54 +01:00
sysfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-09-07 14:36:57 -07:00
sysv sysv: Add forgotten superblock lock init for v7 fs 2013-09-29 22:02:02 -04:00
ubifs Just one patch which fixes the power-cut recovery testing mode. 2013-09-16 15:36:55 -04:00
udf udf: Fortify LVID loading 2013-09-24 11:23:33 +02:00
ufs truncate: drop 'oldsize' truncate_pagecache() parameter 2013-09-12 15:38:02 -07:00
xfs xfs: Use kmem_free() instead of free() 2013-10-04 13:56:12 -05:00
aio.c aio: fix use-after-free in aio_migratepage 2013-09-26 20:34:51 -04:00
anon_inodes.c fs/anon_inode: Introduce a new lib function anon_inode_getfile_private() 2013-07-16 09:32:17 -04:00
attr.c
bad_inode.c [readdir] ->readdir() is gone 2013-06-29 12:57:04 +04:00
binfmt_aout.c mm: remove free_area_cache 2013-07-10 18:11:34 -07:00
binfmt_elf_fdpic.c Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc 2013-05-02 10:16:16 -07:00
binfmt_elf.c fs/binfmt_elf.c: prevent a coredump with a large vm_map_count from Oopsing 2013-09-30 14:31:01 -07:00
binfmt_em86.c
binfmt_flat.c new helper: read_code() 2013-04-29 15:40:23 -04:00
binfmt_misc.c binfmt_misc: reuse string_unescape_inplace() 2013-04-30 17:04:03 -07:00
binfmt_script.c
binfmt_som.c
bio-integrity.c Merge branch 'for-3.12/core' of git://git.kernel.dk/linux-block 2013-09-22 15:00:11 -07:00
bio.c block: Fix bio_copy_data() 2013-09-24 14:41:42 -07:00
block_dev.c a trivial writeback fix 2013-09-13 23:06:40 -04:00
buffer.c fs: buffer: move allocation failure loop into the allocator 2013-10-16 21:35:53 -07:00
char_dev.c
compat_binfmt_elf.c
compat_ioctl.c compat.c: LOOP_CLR_FD is taken care of in loop.c itself... 2013-06-29 12:46:44 +04:00
compat.c [readdir] constify ->actor 2013-06-29 12:57:05 +04:00
coredump.c coredump: add new %P variable in core_pattern 2013-09-11 15:59:01 -07:00
coredump.h
dcache.c vfs: decrapify dput(), fix cache behavior under normal load 2013-10-31 15:43:02 -07:00
dcookies.c consolidate compat lookup_dcookie() 2013-03-03 23:00:23 -05:00
direct-io.c direct-io: Use return from cmpxchg to decide of assignment happened 2013-09-09 10:47:42 -07:00
drop_caches.c shrinker: add node awareness 2013-09-10 18:56:31 -04:00
eventfd.c
eventpoll.c Revert "epoll: use freezable blocking call" 2013-10-30 15:27:53 +01:00
exec.c exec: cleanup the error handling in search_binary_handler() 2013-09-11 15:59:09 -07:00
fcntl.c vfs: add missing check for __O_TMPFILE in fcntl_init() 2013-08-05 18:25:32 +04:00
fhandle.c
file_table.c nfsd regression since delayed fput() 2013-10-20 08:44:39 -04:00
file.c don't bother with deferred freeing of fdtables 2013-05-01 17:31:42 -04:00
filesystems.c fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
fs_struct.c constify path_get/path_put and fs_struct.c stuff 2013-03-01 23:51:07 -05:00
fs-writeback.c a trivial writeback fix 2013-09-13 23:06:40 -04:00
generic_acl.c
inode.c fs: convert inode and dentry shrinking to be node aware 2013-09-10 18:56:31 -04:00
internal.h fs: convert inode and dentry shrinking to be node aware 2013-09-10 18:56:31 -04:00
ioctl.c
ioprio.c
Kconfig efivarfs: Move to fs/efivarfs 2013-04-17 13:25:09 +01:00
Kconfig.binfmt fs: make binfmt support for #! scripts modular and removable 2013-04-30 17:04:04 -07:00
libfs.c make simple_lookup() usable for filesystems that set ->s_d_op 2013-07-14 17:43:25 +04:00
locks.c locks: move file_lock_list to a set of percpu hlist_heads and convert file_lock_lock to an lglock 2013-07-08 13:36:42 +04:00
Makefile Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-05-01 17:51:54 -07:00
mbcache.c fs: convert fs shrinkers to new scan/count API 2013-09-10 18:56:31 -04:00
mount.h get rid of full-hash scan on detaching vfsmounts 2013-04-09 14:12:52 -04:00
mpage.c
namei.c fs/namei.c: fix new kernel-doc warning 2013-10-22 12:02:40 +01:00
namespace.c initmpfs: move rootfs code from fs/ramfs/ to init/ 2013-09-11 15:59:37 -07:00
no-block.c
open.c vfs: improve i_op->atomic_open() documentation 2013-09-16 19:17:24 -04:00
pipe.c aio: don't include aio.h in sched.h 2013-05-07 20:16:25 -07:00
pnode.c vfs: Fix invalid ida_remove() call 2013-05-31 15:16:33 -04:00
pnode.h vfs: Don't copy mount bind mounts of /proc/<pid>/ns/mnt between namespaces 2013-08-26 18:42:15 -07:00
posix_acl.c
proc_namespace.c
read_write.c aio: Kill aio_rw_vect_retry() 2013-07-30 11:53:12 -04:00
readdir.c [readdir] constify ->actor 2013-06-29 12:57:05 +04:00
select.c Revert "select: use freezable blocking call" 2013-10-30 15:28:35 +01:00
seq_file.c seq_file: always update file->f_pos in seq_lseek() 2013-10-25 10:46:40 -04:00
signalfd.c switch signalfd{,4}() to COMPAT_SYSCALL_DEFINE 2013-03-03 22:58:46 -05:00
splice.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-07-03 09:10:19 -07:00
stack.c
stat.c quota: provide interface for readding allocated space into reserved space 2013-08-17 09:32:32 -04:00
statfs.c vfs: allow O_PATH file descriptors for fstatfs() 2013-10-12 13:12:31 -07:00
super.c fs/super.c: fix lru_list leak for real 2013-10-01 13:11:21 -04:00
sync.c teach SYSCALL_DEFINE<n> how to deal with long long/unsigned long long 2013-03-03 22:46:22 -05:00
timerfd.c timerfd: Add alarm timers 2013-05-29 12:57:34 -07:00
utimes.c
xattr_acl.c
xattr.c