linux/include
Johannes Weiner a983b5ebee mm: memcontrol: fix excessive complexity in memory.stat reporting
We've seen memory.stat reads in top-level cgroups take up to fourteen
seconds during a userspace bug that created tens of thousands of ghost
cgroups pinned by lingering page cache.

Even with a more reasonable number of cgroups, aggregating memory.stat
is unnecessarily heavy.  The complexity is this:

	nr_cgroups * nr_stat_items * nr_possible_cpus

where the stat items are ~70 at this point.  With 128 cgroups and 128
CPUs - decent, not enormous setups - reading the top-level memory.stat
has to aggregate over a million per-cpu counters.  This doesn't scale.

Instead of spreading the source of truth across all CPUs, use the
per-cpu counters merely to batch updates to shared atomic counters.

This is the same as the per-cpu stocks we use for charging memory to the
shared atomic page_counters, and also the way the global vmstat counters
are implemented.

Vmstat has elaborate spilling thresholds that depend on the number of
CPUs, amount of memory, and memory pressure - carefully balancing the
cost of counter updates with the amount of per-cpu error.  That's
because the vmstat counters are system-wide, but also used for decisions
inside the kernel (e.g.  NR_FREE_PAGES in the allocator).  Neither is
true for the memory controller.

Use the same static batch size we already use for page_counter updates
during charging.  The per-cpu error in the stats will be 128k, which is
an acceptable ratio of cores to memory accounting granularity.

[hannes@cmpxchg.org: fix warning in __this_cpu_xchg() calls]
  Link: http://lkml.kernel.org/r/20171201135750.GB8097@cmpxchg.org
Link: http://lkml.kernel.org/r/20171103153336.24044-3-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31 17:18:36 -08:00
..
acpi Merge branches 'acpi-gpio', 'acpi-button', 'acpi-battery' and 'acpi-video' 2018-01-18 03:02:16 +01:00
asm-generic dma mapping changes for Linux 4.16: 2018-01-31 11:32:27 -08:00
clocksource arm64 updates for 4.15 2017-11-15 10:56:56 -08:00
crypto Merge branch 'misc.poll' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-01-30 17:58:07 -08:00
drm Merge branch 'misc.poll' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-01-30 17:58:07 -08:00
dt-bindings fixes/cleanups for rc1, non-desktop flags for VR 2017-11-23 21:04:56 -10:00
keys
kvm KVM: arm/arm64: timer: Don't set irq as forwarded if no usable GIC 2017-12-18 10:53:23 +01:00
linux mm: memcontrol: fix excessive complexity in memory.stat reporting 2018-01-31 17:18:36 -08:00
math-emu
media media: annotate ->poll() instances 2017-11-27 16:20:06 -05:00
memory
misc the rest of drivers/*: annotate ->poll() instances 2017-11-28 11:06:58 -05:00
net Merge branch 'misc.poll' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-01-30 17:58:07 -08:00
pcmcia
ras
rdma RDMA/restrack: Add general infrastructure to track RDMA resources 2018-01-29 20:21:39 -07:00
scsi First merge window pull request for 4.16 2018-01-31 12:05:10 -08:00
soc We have two changes to the core framework this time around. The first being a 2017-11-17 20:04:24 -08:00
sound Merge branch 'misc.poll' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-01-30 17:58:07 -08:00
target Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending 2017-11-24 19:19:20 -10:00
trace mm: use sc->priority for slab shrink targets 2018-01-31 17:18:36 -08:00
uapi First merge window pull request for 4.16 2018-01-31 12:05:10 -08:00
video fbdev changes for v4.15: 2017-11-20 21:50:24 -10:00
xen xen: fixes for 4.15-rc5 2017-12-22 12:30:10 -08:00