linux/Documentation
Jakub Kicinski 4b82ab4f28 mm/memcg: automatically penalize tasks with high swap use
Add a memory.swap.high knob, which can be used to protect the system
from SWAP exhaustion.  The mechanism used for penalizing is similar to
memory.high penalty (sleep on return to user space).

That is not to say that the knob itself is equivalent to memory.high.
The objective is more to protect the system from potentially buggy tasks
consuming a lot of swap and impacting other tasks, or even bringing the
whole system to stand still with complete SWAP exhaustion.  Hopefully
without the need to find per-task hard limits.

Slowing misbehaving tasks down gradually allows user space oom killers
or other protection mechanisms to react.  oomd and earlyoom already do
killing based on swap exhaustion, and memory.swap.high protection will
help implement such userspace oom policies more reliably.

We can use one counter for number of pages allocated under pressure to
save struct task space and avoid two separate hierarchy walks on the hot
path.  The exact overage is calculated on return to user space, anyway.

Take the new high limit into account when determining if swap is "full".
Borrowing the explanation from Johannes:

  The idea behind "swap full" is that as long as the workload has plenty
  of swap space available and it's not changing its memory contents, it
  makes sense to generously hold on to copies of data in the swap device,
  even after the swapin.  A later reclaim cycle can drop the page without
  any IO.  Trading disk space for IO.

  But the only two ways to reclaim a swap slot is when they're faulted
  in and the references go away, or by scanning the virtual address space
  like swapoff does - which is very expensive (one could argue it's too
  expensive even for swapoff, it's often more practical to just reboot).

  So at some point in the fill level, we have to start freeing up swap
  slots on fault/swapin.  Otherwise we could eventually run out of swap
  slots while they're filled with copies of data that is also in RAM.

  We don't want to OOM a workload because its available swap space is
  filled with redundant cache.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Chris Down <chris@chrisdown.name>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Link: http://lkml.kernel.org/r/20200527195846.102707-5-kuba@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02 10:59:09 -07:00
..
ABI Printk changes for 5.8 2020-06-01 12:13:30 -07:00
accounting
admin-guide mm/memcg: automatically penalize tasks with high swap use 2020-06-02 10:59:09 -07:00
arm
arm64 Documentation: arm64: fix amu.rst doc warnings 2020-04-23 17:05:22 +01:00
block block: Document genhd capability flags 2020-03-12 07:47:22 -06:00
bpf bpf: lsm: Add Documentation 2020-03-30 01:35:12 +02:00
cdrom
core-api Printk changes for 5.8 2020-06-01 12:13:30 -07:00
cpu-freq
crypto
dev-tools linux-kselftest-kunit-5.7-rc1 2020-04-01 16:11:40 -07:00
devicetree Fixes and new features for pstore 2020-06-01 12:07:34 -07:00
doc-guide
driver-api A handful of late-arriving fixes for the documentation tree. 2020-04-10 17:53:43 -07:00
fault-injection
fb
features
filesystems mm/writeback: discard NR_UNSTABLE_NFS, use NR_WRITEBACK instead 2020-06-02 10:59:08 -07:00
firmware_class
firmware-guide Documentation: firmware-guide: ACPI: fix table alignment in namespace.rst 2020-04-08 14:27:48 +02:00
fpga
gpu UAPI Changes: 2020-03-19 10:40:27 +10:00
hid
hwmon hwmon: Add Baikal-T1 PVT sensor driver 2020-05-28 07:59:45 -07:00
i2c i2c: convert SMBus alert setup function to return an ERRPTR 2020-03-10 12:19:52 +01:00
ia64
ide
iio
infiniband
input
isdn
kbuild Documentation: kbuild: fix the section title format 2020-04-23 10:53:19 +09:00
kernel-hacking docs: locking: Drop :c:func: throughout 2020-03-20 17:16:24 -06:00
leds
livepatch
locking Documentation/locking/locktypes: Minor copy editor fixes 2020-03-28 12:47:34 +01:00
m68k
maintainer
media media updates for v5.7-rc1 2020-03-30 13:42:05 -07:00
mhi docs: Add documentation for MHI bus 2020-03-19 07:41:04 +01:00
mips docs: mips: remove no longer needed au1xxx_ide.rst documentation 2020-03-24 15:53:48 +01:00
misc-devices Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2020-04-01 14:47:40 -07:00
netlabel
networking ice: cleanup language in ice.rst for fw.app 2020-05-01 15:30:24 -07:00
nios2
nvdimm
openrisc
parisc
PCI pci-v5.7-changes 2020-04-03 14:25:02 -07:00
pcmcia
power Merge branches 'pm-core', 'pm-sleep', 'pm-acpi' and 'pm-domains' 2020-03-30 14:46:58 +02:00
powerpc powerpc updates for 5.7 2020-04-05 11:12:59 -07:00
process checkpatch/coding-style: deprecate 80-column warning 2020-05-31 11:00:42 -07:00
RCU
riscv
s390
scheduler
scsi SCSI misc on 20200402 2020-04-02 17:03:53 -07:00
security crypto: lib/sha1 - rename "sha" to "sha1" 2020-05-08 15:32:17 +10:00
sh
sound ALSA: hda/realtek - Remove now-unnecessary XPS 13 headphone noise fixups 2020-03-31 10:54:06 +02:00
sparc
sphinx
sphinx-static
spi
target docs: prevent warnings due to autosectionlabel 2020-03-20 17:01:29 -06:00
timers
trace New tracing features: 2020-04-05 10:36:18 -07:00
translations media updates for v5.7-rc1 2020-03-30 13:42:05 -07:00
usb usb: raw-gadget: documentation updates 2020-05-14 12:30:18 +03:00
userspace-api
virt docs/virt/kvm: Document configuring and running nested guests 2020-05-06 05:45:47 -04:00
vm Documentation/vm/slub.rst: s/Toggle/Enable/ 2020-06-02 10:59:06 -07:00
w1
watchdog
x86 Documentation/x86, efi/x86: Clarify EFI handover protocol and its requirements 2020-04-14 08:32:15 +02:00
xtensa
.gitignore .gitignore: add SPDX License Identifier 2020-03-25 11:50:48 +01:00
asm-annotations.rst
atomic_bitops.txt
atomic_t.txt
bus-virt-phys-mapping.txt
Changes
CodingStyle
conf.py docs: conf.py: avoid thousands of duplicate label warning on Sphinx 2020-03-20 17:01:34 -06:00
COPYING-logo
crc32.txt
debugging-via-ohci1394.txt
digsig.txt
DMA-API-HOWTO.txt
DMA-API.txt
DMA-attributes.txt
DMA-ISA-LPC.txt
docutils.conf
dontdiff
futex-requeue-pi.txt
hwspinlock.txt
index.rst Char/Misc driver patches for 5.7-rc1 2020-04-03 13:22:40 -07:00
IPMI.txt
IRQ-affinity.txt
IRQ-domain.txt
IRQ.txt
irqflags-tracing.txt
Kconfig
kprobes.txt
kref.txt
logo.gif
lzo.txt
mailbox.txt
Makefile Kbuild updates for v5.7 2020-03-31 16:03:39 -07:00
memory-barriers.txt
nommu-mmap.txt
percpu-rw-semaphore.txt
pi-futex.txt
preempt-locking.txt
rbtree.txt
remoteproc.txt remoteproc: Add elf64 support in elf loader 2020-03-25 22:29:40 -07:00
robust-futex-ABI.txt threads: Update PID limit comment according to futex UAPI change 2020-03-21 17:48:13 +01:00
robust-futexes.txt
rpmsg.txt
speculation.txt
static-keys.txt
SubmittingPatches
tee.txt
this_cpu_ops.txt
unaligned-memory-access.txt
xz.txt