linux/fs
Mike Kravetz 87bf91d39b hugetlbfs: Use i_mmap_rwsem to address page fault/truncate race
hugetlbfs page faults can race with truncate and hole punch operations.
Current code in the page fault path attempts to handle this by 'backing
out' operations if we encounter the race.  One obvious omission in the
current code is removing a page newly added to the page cache.  This is
pretty straight forward to address, but there is a more subtle and
difficult issue of backing out hugetlb reservations.  To handle this
correctly, the 'reservation state' before page allocation needs to be
noted so that it can be properly backed out.  There are four distinct
possibilities for reservation state: shared/reserved, shared/no-resv,
private/reserved and private/no-resv.  Backing out a reservation may
require memory allocation which could fail so that needs to be taken
into account as well.

Instead of writing the required complicated code for this rare
occurrence, just eliminate the race.  i_mmap_rwsem is now held in read
mode for the duration of page fault processing.  Hold i_mmap_rwsem in
write mode when modifying i_size.  In this way, truncation can not
proceed when page faults are being processed.  In addition, i_size
will not change during fault processing so a single check can be made
to ensure faults are not beyond (proposed) end of file.  Faults can
still race with hole punch, but that race is handled by existing code
and the use of hugetlb_fault_mutex.

With this modification, checks for races with truncation in the page
fault path can be simplified and removed.  remove_inode_hugepages no
longer needs to take hugetlb_fault_mutex in the case of truncation.
Comments are expanded to explain reasoning behind locking.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Prakash Sangappa <prakash.sangappa@oracle.com>
Link: http://lkml.kernel.org/r/20200316205756.146666-3-mike.kravetz@oracle.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-02 09:35:32 -07:00
..
9p
adfs fs/adfs: bigdir: Fix an error code in adfs_fplus_read() 2020-01-25 11:31:59 -05:00
affs
afs afs: Fix unpinned address list during probing 2020-03-26 16:04:29 -07:00
autofs
befs
bfs
btrfs btrfs: fix missing semaphore unlock in btrfs_sync_file 2020-03-25 16:29:16 +01:00
cachefiles cachefiles: drop direct usage of ->bmap method. 2020-02-03 08:05:56 -05:00
ceph ceph: fix memory leak in ceph_cleanup_snapid_map() 2020-03-23 13:07:08 +01:00
cifs cifs: update internal module version number 2020-03-29 16:59:31 -05:00
coda
configfs
cramfs cramfs: switch to use of errofc() et.al. 2020-02-07 14:48:41 -05:00
crypto fscrypt updates for 5.7 2020-03-31 12:58:36 -07:00
debugfs debugfs: remove return value of debugfs_create_file_size() 2020-03-18 13:35:29 +01:00
devpts
dlm
ecryptfs eCryptfs fixes for 5.6-rc3 2020-02-17 21:08:37 -08:00
efivarfs efi: Use more granular check for availability for variable services 2020-02-23 21:59:42 +01:00
efs
erofs erofs: handle corrupted images whose decompressed size less than it'd be 2020-03-03 23:40:52 +08:00
exportfs
ext2 dax fixes 5.6-rc1 2020-02-11 16:52:08 -08:00
ext4 fscrypt updates for 5.7 2020-03-31 12:58:36 -07:00
f2fs fscrypt updates for 5.7 2020-03-31 12:58:36 -07:00
fat fat: fix uninit-memory access for partial initialized inode 2020-03-06 07:06:09 -06:00
freevxfs
fscache proc: convert everything to "struct proc_ops" 2020-02-04 03:05:26 +00:00
fuse fuse: fix stack use after return 2020-02-13 09:16:07 +01:00
gfs2 We've got a lot of patches (39) for this merge window. Most of these patches 2020-03-31 14:16:03 -07:00
hfs
hfsplus
hostfs
hpfs
hugetlbfs hugetlbfs: Use i_mmap_rwsem to address page fault/truncate race 2020-04-02 09:35:32 -07:00
iomap fs: Fix page_mkwrite off-by-one errors 2020-01-06 08:58:23 -08:00
isofs
jbd2 jbd2: fix data races at struct journal_head 2020-02-29 13:40:02 -05:00
jffs2 fs_parse: fold fs_parameter_desc/fs_parameter_spec 2020-02-07 14:48:37 -05:00
jfs Trivial cleanup for jfs 2020-02-05 05:28:20 +00:00
kernfs Merge branch 'imm.timestamp' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-02-05 05:02:42 +00:00
lockd proc: convert everything to "struct proc_ops" 2020-02-04 03:05:26 +00:00
minix
nfs selinux/stable-5.7 PR 20200330 2020-03-31 15:07:55 -07:00
nfs_common
nfsd Highlights: 2020-02-07 17:50:21 -08:00
nilfs2
nls
notify
ntfs fs/buffer: Make BH_Uptodate_Lock bit_spin_lock a regular spinlock_t 2020-03-28 13:21:08 +01:00
ocfs2 ocfs2: use memalloc_nofs_save instead of memalloc_noio_save 2020-04-02 09:35:26 -07:00
omfs
openpromfs
orangefs help_next should increase position index 2020-02-04 15:22:04 -05:00
overlayfs ovl: fix lockdep warning for async write 2020-03-13 15:53:06 +01:00
proc Merge branch 'merge.nfs-fs_parse.1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-02-08 13:26:41 -08:00
pstore pstore/ram: Replace zero-length array with flexible-array member 2020-03-09 14:45:40 -07:00
qnx4
qnx6
quota \n 2020-01-30 15:37:41 -08:00
ramfs fs_parse: fold fs_parameter_desc/fs_parameter_spec 2020-02-07 14:48:37 -05:00
reiserfs block: remove __bdevname 2020-03-24 07:57:07 -06:00
romfs
squashfs
sysfs sysfs: add sysfs_change_owner() 2020-02-26 20:07:25 -08:00
sysv
tracefs
ubifs ubifs: wire up FS_IOC_GET_ENCRYPTION_NONCE 2020-03-19 21:57:06 -07:00
udf udf: Clarify meaning of f_files in udf_statfs 2020-01-20 13:59:41 +01:00
ufs
unicode kbuild: rename hostprogs-y/always to hostprogs/always-y 2020-02-04 01:53:07 +09:00
vboxsf fs: Add VirtualBox guest shared folder (vboxsf) support 2020-02-08 17:34:58 -05:00
verity fs-verity: use u64_to_user_ptr() 2020-01-14 13:28:28 -08:00
xfs dax fixes 5.6-rc1 2020-02-11 16:52:08 -08:00
zonefs zonfs: Fix handling of read-only zones 2020-03-25 11:28:26 +09:00
aio.c aio: prevent potential eventfd recursion on poll 2020-02-03 17:27:47 -07:00
anon_inodes.c
attr.c
bad_inode.c
binfmt_aout.c
binfmt_elf_fdpic.c
binfmt_elf.c fs/binfmt_elf.c: coredump: allow process with empty address space to coredump 2020-01-31 10:30:41 -08:00
binfmt_em86.c
binfmt_flat.c
binfmt_misc.c
binfmt_script.c
block_dev.c block: fix a device invalidation regression 2020-03-18 08:47:04 -06:00
buffer.c Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2020-03-30 16:17:15 -07:00
char_dev.c chardev: Avoid potential use-after-free in 'chrdev_open()' 2020-01-06 20:10:26 +01:00
compat_binfmt_elf.c
compat.c
coredump.c pipe: use exclusive waits when reading or writing 2020-02-08 11:39:19 -08:00
d_path.c
dax.c dax: pass NOWAIT flag to iomap_apply 2020-02-05 20:34:32 -08:00
dcache.c
dcookies.c
direct-io.c fs/direct-io.c: include fs/internal.h for missing prototype 2020-01-04 13:55:09 -08:00
drop_caches.c
eventfd.c eventfd: track eventfd_signal() recursion depth 2020-02-03 17:27:38 -07:00
eventpoll.c epoll: fix possible lost wakeup on epoll_ctl() path 2020-03-21 18:56:06 -07:00
exec.c firmware_loader: load files from the mount namespace of init 2020-02-10 15:39:28 -08:00
fcntl.c fcntl: Distribute switch variables for initialization 2020-03-03 10:55:06 -05:00
fhandle.c
file_table.c
file.c io_uring: make sure openat/openat2 honor rlimit nofile 2020-03-20 08:47:27 -06:00
filesystems.c fs_parser: remove fs_parameter_description name field 2020-02-07 14:48:36 -05:00
fs_context.c add prefix to fs_context->log 2020-02-07 14:48:35 -05:00
fs_parser.c fs_parse: remove pr_notice() about each validation 2020-04-02 09:35:26 -07:00
fs_pin.c
fs_struct.c
fs_types.c
fs-writeback.c memcg: fix a crash in wb_workfn when a device disappears 2020-01-31 10:30:36 -08:00
fsopen.c add prefix to fs_context->log 2020-02-07 14:48:35 -05:00
inode.c futex: Fix inode life-time issue 2020-03-06 11:06:15 +01:00
internal.h block: move guard_bio_eod to bio.c 2020-03-25 09:50:08 -06:00
io_uring.c for-5.7/io_uring-2020-03-29 2020-03-30 12:18:49 -07:00
io-wq.c io-wq: handle hashed writes in chains 2020-03-23 14:58:07 -06:00
io-wq.h io-wq: handle hashed writes in chains 2020-03-23 14:58:07 -06:00
ioctl.c compat-ioctl fix for v5.6 2020-02-08 13:44:41 -08:00
Kconfig fs: New zonefs file system 2020-02-09 15:51:46 -08:00
Kconfig.binfmt
libfs.c libfs: fix infoleak in simple_attr_read() 2020-03-24 13:27:16 +01:00
locks.c locks: reinstate locks_delete_block optimization 2020-03-18 13:03:38 -07:00
Makefile fs: New zonefs file system 2020-02-09 15:51:46 -08:00
mbcache.c
mount.h
mpage.c fs: move guard_bio_eod() after bio_set_op_attrs 2020-01-09 08:16:12 -07:00
namei.c vfs: fix do_last() regression 2020-02-01 10:36:49 -08:00
namespace.c saner copy_mount_options() 2020-02-03 21:23:33 -05:00
no-block.c
nsfs.c fs/nsfs.c: Added ns_match 2020-03-12 17:33:11 -07:00
open.c cifs_atomic_open(): fix double-put on late allocation failure 2020-03-12 18:25:20 -04:00
pipe.c mm: kmem: rename memcg_kmem_(un)charge() into memcg_kmem_(un)charge_page() 2020-04-02 09:35:28 -07:00
pnode.c
pnode.h
posix_acl.c fs/posix_acl.c: fix kernel-doc warnings 2020-01-04 13:55:09 -08:00
proc_namespace.c
read_write.c overlayfs update for 5.6 2020-02-04 11:45:21 +00:00
readdir.c readdir: make user_access_begin() use the real access range 2020-01-23 10:15:28 -08:00
select.c
seq_file.c
signalfd.c
splice.c splice: make do_splice public 2020-03-02 14:04:31 -07:00
stack.c
stat.c fs: make two stat prep helpers available 2020-01-20 17:03:54 -07:00
statfs.c
super.c
sync.c
timerfd.c timerfd: Make timerfd_settime() time namespace aware 2020-01-14 12:20:53 +01:00
userfaultfd.c mm/userfaultfd: honor FAULT_FLAG_KILLABLE in fault path 2020-04-02 09:35:30 -07:00
utimes.c
xattr.c