linux/include
Tejun Heo b23afb93d3 memcg: punt high overage reclaim to return-to-userland path
Currently, try_charge() tries to reclaim memory synchronously when the
high limit is breached; however, if the allocation doesn't have
__GFP_WAIT, synchronous reclaim is skipped.  If a process performs only
speculative allocations, it can blow way past the high limit.  This is
actually easily reproducible by simply doing "find /".  slab/slub
allocator tries speculative allocations first, so as long as there's
memory which can be consumed without blocking, it can keep allocating
memory regardless of the high limit.

This patch makes try_charge() always punt the over-high reclaim to the
return-to-userland path.  If try_charge() detects that high limit is
breached, it adds the overage to current->memcg_nr_pages_over_high and
schedules execution of mem_cgroup_handle_over_high() which performs
synchronous reclaim from the return-to-userland path.

As long as kernel doesn't have a run-away allocation spree, this should
provide enough protection while making kmemcg behave more consistently.
It also has the following benefits.

- All over-high reclaims can use GFP_KERNEL regardless of the specific
  gfp mask in use, e.g. GFP_NOFS, when the limit was breached.

- It copes with prio inversion.  Previously, a low-prio task with
  small memory.high might perform over-high reclaim with a bunch of
  locks held.  If a higher prio task needed any of these locks, it
  would have to wait until the low prio task finished reclaim and
  released the locks.  By handing over-high reclaim to the task exit
  path this issue can be avoided.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Michal Hocko <mhocko@kernel.org>
Reviewed-by: Vladimir Davydov <vdavydov@parallels.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05 19:34:48 -08:00
..
acpi Merge branch 'acpi-processor' 2015-11-02 00:50:37 +01:00
asm-generic Power management and ACPI updates for v4.4-rc1 2015-11-04 18:10:13 -08:00
clocksource
crypto crypto: ahash - Add crypto_ahash_blocksize 2015-10-20 22:10:48 +08:00
drm drm/dp/mst: make mst i2c transfer code more robust. 2015-10-15 09:06:20 +10:00
dt-bindings char/misc drivers for 4.4-rc1 2015-11-04 22:15:15 -08:00
keys
kvm arm/arm64: KVM: Only allow 64bit hosts to build VGICv3 2015-10-09 23:11:57 +01:00
linux memcg: punt high overage reclaim to return-to-userland path 2015-11-05 19:34:48 -08:00
math-emu
media media updates for v4.3-rc1 2015-09-11 16:42:39 -07:00
memory
misc
net Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-11-03 13:41:45 -05:00
pcmcia
ras
rdma Changes for 4.3-rc1 2015-09-19 20:04:11 -07:00
rxrpc
scsi Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending 2015-09-11 19:00:42 -07:00
soc
sound Merge remote-tracking branches 'asoc/fix/rt298', 'asoc/fix/sx', 'asoc/fix/wm8904' and 'asoc/fix/wm8962' into asoc-linus 2015-10-23 08:44:14 +09:00
target target: Propigate backend read-only to core_tpg_add_lun 2015-09-24 23:17:21 -07:00
trace sched/core: Fix trace_sched_switch() 2015-10-06 17:08:15 +02:00
uapi char/misc drivers for 4.4-rc1 2015-11-04 22:15:15 -08:00
video
xen xen/grant-table: Add an helper to iterate over a specific number of grants 2015-10-23 14:20:46 +01:00
Kbuild