linux/fs
Michal Hocko 7dea19f9ee mm: introduce memalloc_nofs_{save,restore} API
GFP_NOFS context is used for the following 5 reasons currently:

 - to prevent from deadlocks when the lock held by the allocation
   context would be needed during the memory reclaim

 - to prevent from stack overflows during the reclaim because the
   allocation is performed from a deep context already

 - to prevent lockups when the allocation context depends on other
   reclaimers to make a forward progress indirectly

 - just in case because this would be safe from the fs POV

 - silence lockdep false positives

Unfortunately overuse of this allocation context brings some problems to
the MM.  Memory reclaim is much weaker (especially during heavy FS
metadata workloads), OOM killer cannot be invoked because the MM layer
doesn't have enough information about how much memory is freeable by the
FS layer.

In many cases it is far from clear why the weaker context is even used
and so it might be used unnecessarily.  We would like to get rid of
those as much as possible.  One way to do that is to use the flag in
scopes rather than isolated cases.  Such a scope is declared when really
necessary, tracked per task and all the allocation requests from within
the context will simply inherit the GFP_NOFS semantic.

Not only this is easier to understand and maintain because there are
much less problematic contexts than specific allocation requests, this
also helps code paths where FS layer interacts with other layers (e.g.
crypto, security modules, MM etc...) and there is no easy way to convey
the allocation context between the layers.

Introduce memalloc_nofs_{save,restore} API to control the scope of
GFP_NOFS allocation context.  This is basically copying
memalloc_noio_{save,restore} API we have for other restricted allocation
context GFP_NOIO.  The PF_MEMALLOC_NOFS flag already exists and it is
just an alias for PF_FSTRANS which has been xfs specific until recently.
There are no more PF_FSTRANS users anymore so let's just drop it.

PF_MEMALLOC_NOFS is now checked in the MM layer and drops __GFP_FS
implicitly same as PF_MEMALLOC_NOIO drops __GFP_IO.  memalloc_noio_flags
is renamed to current_gfp_context because it now cares about both
PF_MEMALLOC_NOFS and PF_MEMALLOC_NOIO contexts.  Xfs code paths preserve
their semantic.  kmem_flags_convert() doesn't need to evaluate the flag
anymore.

This patch shouldn't introduce any functional changes.

Let's hope that filesystems will drop direct GFP_NOFS (resp.  ~__GFP_FS)
usage as much as possible and only use a properly documented
memalloc_nofs_{save,restore} checkpoints where they are appropriate.

[akpm@linux-foundation.org: fix comment typo, reflow comment]
Link: http://lkml.kernel.org/r/20170306131408.9828-5-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Chris Mason <clm@fb.com>
Cc: David Sterba <dsterba@suse.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Brian Foster <bfoster@redhat.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Nikolay Borisov <nborisov@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03 15:52:09 -07:00
..
9p 9p: Convert to separately allocated bdi 2017-04-20 12:09:55 -06:00
adfs
affs sched/headers: Prepare to remove <linux/cred.h> inclusion from <linux/sched.h> 2017-03-02 08:42:31 +01:00
afs Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2017-05-02 16:40:27 -07:00
autofs4 Fix dead URLs to ftp.kernel.org 2017-03-28 16:16:52 +02:00
befs sched/headers: Prepare to remove <linux/cred.h> inclusion from <linux/sched.h> 2017-03-02 08:42:31 +01:00
bfs Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
btrfs Merge branch 'for-4.12/block' of git://git.kernel.dk/linux-block 2017-05-01 10:39:57 -07:00
cachefiles sched/headers: Prepare to remove <linux/cred.h> inclusion from <linux/sched.h> 2017-03-02 08:42:31 +01:00
ceph Merge branch 'for-4.12/block' of git://git.kernel.dk/linux-block 2017-05-01 10:39:57 -07:00
cifs Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2017-05-02 16:40:27 -07:00
coda coda: Convert to separately allocated bdi 2017-04-20 12:09:55 -06:00
configfs Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
cramfs
crypto A code cleanup and bugfix for fs/crypto. 2017-03-25 15:36:56 -07:00
debugfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2017-02-23 20:33:51 -08:00
devpts
dlm net: Work around lockdep limitation in sockets that use sockets 2017-03-09 18:23:27 -08:00
ecryptfs ecryptfs: Convert to separately allocated bdi 2017-04-20 12:09:55 -06:00
efivarfs
efs Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
exofs exofs: Convert to separately allocated bdi 2017-04-20 12:09:55 -06:00
exportfs Merge branch 'rebased-statx' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-03-03 11:38:56 -08:00
ext2 sched/headers: Prepare to remove <linux/cred.h> inclusion from <linux/sched.h> 2017-03-02 08:42:31 +01:00
ext4 statx: Include a mask for stx_attributes in struct statx 2017-04-03 01:06:00 -04:00
f2fs f2fs: combine nat_bits and free_nid_bitmap cache 2017-03-20 10:00:18 -04:00
fat fat: fix using uninitialized fields of fat_inode/fsinfo_inode 2017-03-09 17:01:10 -08:00
freevxfs
fscache KEYS: Differentiate uses of rcu_dereference_key() and user_key_payload() 2017-03-02 10:09:00 +11:00
fuse fuse: Get rid of bdi_initialized 2017-04-20 12:09:55 -06:00
gfs2 fs: Remove SB_I_DYNBDI flag 2017-04-20 12:09:55 -06:00
hfs Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-03-03 21:44:35 -08:00
hfsplus sched/headers: Prepare to remove <linux/cred.h> inclusion from <linux/sched.h> 2017-03-02 08:42:31 +01:00
hostfs
hpfs sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h> 2017-03-02 08:42:32 +01:00
hugetlbfs hugetlbfs: fix offset overflow in hugetlbfs mmap 2017-04-13 18:24:21 -07:00
isofs sched/headers: Prepare to remove <linux/cred.h> inclusion from <linux/sched.h> 2017-03-02 08:42:31 +01:00
jbd2 jbd2: don't leak memory if setting up journal fails 2017-03-15 15:08:48 -04:00
jffs2 sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h> 2017-03-02 08:42:32 +01:00
jfs fs: add i_blocksize() 2017-02-27 18:43:46 -08:00
kernfs kernfs: Check KERNFS_HAS_RELEASE before calling kernfs_release_file() 2017-03-17 10:25:59 +09:00
lockd sched/headers: Prepare for new header dependencies before moving code to <linux/sched/signal.h> 2017-03-02 08:42:29 +01:00
minix statx: Add a system call to make enhanced file info available 2017-03-02 20:51:15 -05:00
ncpfs ncpfs: Convert to separately allocated bdi 2017-04-20 12:09:55 -06:00
nfs Merge branch 'for-4.12/block' of git://git.kernel.dk/linux-block 2017-05-01 10:39:57 -07:00
nfs_common
nfsd Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-05-01 19:12:53 -07:00
nilfs2 fs: Remove SB_I_DYNBDI flag 2017-04-20 12:09:55 -06:00
nls
notify sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h> 2017-03-02 08:42:32 +01:00
ntfs sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h> 2017-03-02 08:42:32 +01:00
ocfs2 fs/ocfs2/cluster: use offset_in_page() macro 2017-05-03 15:52:07 -07:00
omfs sched/headers: Prepare to remove <linux/cred.h> inclusion from <linux/sched.h> 2017-03-02 08:42:31 +01:00
openpromfs Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
orangefs orangefs: use iov_iter_revert() 2017-04-21 13:57:32 -04:00
overlayfs overlayfs: remove now unnecessary header file include 2017-03-08 10:42:13 -08:00
proc proc: show MADV_FREE pages info in smaps 2017-05-03 15:52:08 -07:00
pstore pstore: Solve lockdep warning by moving inode locks 2017-04-27 20:35:34 -07:00
qnx4
qnx6
quota sched/headers: Prepare to remove <linux/cred.h> inclusion from <linux/sched.h> 2017-03-02 08:42:31 +01:00
ramfs Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
reiserfs fs: add i_blocksize() 2017-02-27 18:43:46 -08:00
romfs romfs: use different way to generate fsid for BLOCK or MTD 2017-01-24 16:26:14 -08:00
squashfs fs/pstore: fs/squashfs: change usage of LZ4 to work with new LZ4 version 2017-02-24 17:46:57 -08:00
sysfs sysfs: be careful of error returns from ops->show() 2017-04-08 17:33:32 +02:00
sysv statx: Add a system call to make enhanced file info available 2017-03-02 20:51:15 -05:00
tracefs
ubifs Merge branch 'for-4.12/block' of git://git.kernel.dk/linux-block 2017-05-01 10:39:57 -07:00
udf statx: Add a system call to make enhanced file info available 2017-03-02 20:51:15 -05:00
ufs Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
xfs mm: introduce memalloc_nofs_{save,restore} API 2017-05-03 15:52:09 -07:00
aio.c Merge branch 'WIP.sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-03-03 10:16:38 -08:00
anon_inodes.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
attr.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/signal.h> 2017-03-02 08:42:29 +01:00
bad_inode.c statx: Add a system call to make enhanced file info available 2017-03-02 20:51:15 -05:00
binfmt_aout.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/task_stack.h> 2017-03-02 08:42:36 +01:00
binfmt_elf_fdpic.c sched/headers: Prepare to move cputime functionality from <linux/sched.h> into <linux/sched/cputime.h> 2017-03-02 08:42:39 +01:00
binfmt_elf.c sched/headers: Prepare to move cputime functionality from <linux/sched.h> into <linux/sched/cputime.h> 2017-03-02 08:42:39 +01:00
binfmt_em86.c
binfmt_flat.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/task_stack.h> 2017-03-02 08:42:36 +01:00
binfmt_misc.c sched/headers: Prepare to remove the <linux/mm_types.h> dependency from <linux/sched.h> 2017-03-02 08:42:37 +01:00
binfmt_script.c
block_dev.c block: get rid of blk_integrity_revalidate() 2017-04-21 14:17:27 -06:00
buffer.c sched/headers: Prepare for the reduction of <linux/sched.h>'s signal API dependency 2017-03-02 08:42:37 +01:00
char_dev.c
compat_binfmt_elf.c fs/binfmt: Convert obsolete cputime type to nsecs 2017-02-01 09:13:51 +01:00
compat_ioctl.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
compat.c fs/compat.c: trim unused includes 2017-04-17 12:52:27 -04:00
coredump.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/task_stack.h> 2017-03-02 08:42:36 +01:00
dax.c Merge branch 'for-4.12/block' of git://git.kernel.dk/linux-block 2017-05-01 10:39:57 -07:00
dcache.c mnt: Protect the mountpoint hashtable with mount_lock 2017-01-10 13:34:43 +13:00
dcookies.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
direct-io.c fs: add i_blocksize() 2017-02-27 18:43:46 -08:00
drop_caches.c
eventfd.c sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h> 2017-03-02 08:42:32 +01:00
eventpoll.c epoll: Add busy poll support to epoll with socket fds. 2017-03-24 20:49:31 -07:00
exec.c x86/arch_prctl: Add ARCH_[GET|SET]_CPUID 2017-03-20 16:10:34 +01:00
fcntl.c fcntl: move compat syscalls from compat.c 2017-04-17 12:52:24 -04:00
fhandle.c fhandle: move compat syscalls from compat.c 2017-04-17 12:52:26 -04:00
file_table.c sched/headers: Prepare to remove <linux/cred.h> inclusion from <linux/sched.h> 2017-03-02 08:42:31 +01:00
file.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/signal.h> 2017-03-02 08:42:29 +01:00
filesystems.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
fs_pin.c
fs_struct.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/task.h> 2017-03-02 08:42:35 +01:00
fs-writeback.c writeback: fix memory leak in wb_queue_work() 2017-03-13 08:27:34 -06:00
inode.c
internal.h fhandle: move compat syscalls from compat.c 2017-04-17 12:52:26 -04:00
ioctl.c sched/headers: Prepare for the reduction of <linux/sched.h>'s signal API dependency 2017-03-02 08:42:37 +01:00
iomap.c iomap: invalidate page caches should be after iomap_dio_complete() in direct write 2017-03-06 09:50:01 -08:00
Kconfig dax: fix build warnings with FS_DAX and !FS_IOMAP 2017-01-24 16:26:14 -08:00
Kconfig.binfmt
libfs.c Merge branch 'rebased-statx' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-03-03 11:38:56 -08:00
locks.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
Makefile logfs: remove from tree 2016-12-14 23:48:11 -05:00
mbcache.c
mount.h mnt: Tuck mounts under others instead of creating shadow/side mounts. 2017-02-04 00:01:06 +13:00
mpage.c fs: add i_blocksize() 2017-02-27 18:43:46 -08:00
namei.c Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2017-05-03 08:50:52 -07:00
namespace.c sched/headers: Prepare to move 'init_task' and 'init_thread_union' from <linux/sched.h> to <linux/sched/task.h> 2017-03-02 08:42:38 +01:00
no-block.c
nsfs.c nsfs: mark dentry with DCACHE_RCUACCESS 2017-04-19 15:56:24 -07:00
open.c open: move compat syscalls from compat.c 2017-04-17 12:52:25 -04:00
pipe.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
pnode.c mnt: Tuck mounts under others instead of creating shadow/side mounts. 2017-02-04 00:01:06 +13:00
pnode.h mnt: Tuck mounts under others instead of creating shadow/side mounts. 2017-02-04 00:01:06 +13:00
posix_acl.c sched/headers: Prepare to remove <linux/cred.h> inclusion from <linux/sched.h> 2017-03-02 08:42:31 +01:00
proc_namespace.c sched/headers: Prepare to move the task_lock()/unlock() APIs to <linux/sched/task.h> 2017-03-02 08:42:38 +01:00
read_write.c move compat_rw_copy_check_uvector() over to fs/read_write.c 2017-04-17 12:52:26 -04:00
readdir.c readdir: move compat syscalls from compat.c 2017-04-17 12:52:24 -04:00
select.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2017-05-02 16:40:27 -07:00
seq_file.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
signalfd.c
splice.c Merge branch 'work.splice' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-05-02 11:38:06 -07:00
stack.c
stat.c Merge branch 'work.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-05-02 11:54:26 -07:00
statfs.c statfs: move compat syscalls from compat.c 2017-04-17 12:52:23 -04:00
super.c bdi: Drop 'parent' argument from bdi_register[_va]() 2017-04-20 12:09:55 -06:00
sync.c vfs: use helper for calling f_op->fsync() 2017-02-20 16:51:23 +01:00
timerfd.c timerfd: Only check CAP_WAKE_ALARM when it is needed 2017-03-01 12:53:44 +01:00
userfaultfd.c userfaultfd: report actual registered features in fdinfo 2017-04-08 00:47:48 -07:00
utimes.c utimes: move compat syscalls from compat.c 2017-04-17 12:52:23 -04:00
xattr.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00