mainlining shenanigans
Go to file
Johannes Weiner 205b20cc5a mm: memcontrol: make cgroup stats and events query API explicitly local
Patch series "mm: memcontrol: memory.stat cost & correctness".

The cgroup memory.stat file holds recursive statistics for the entire
subtree.  The current implementation does this tree walk on-demand
whenever the file is read.  This is giving us problems in production.

1. The cost of aggregating the statistics on-demand is high.  A lot of
   system service cgroups are mostly idle and their stats don't change
   between reads, yet we always have to check them.  There are also always
   some lazily-dying cgroups sitting around that are pinned by a handful
   of remaining page cache; the same applies to them.

   In an application that periodically monitors memory.stat in our
   fleet, we have seen the aggregation consume up to 5% CPU time.

2. When cgroups die and disappear from the cgroup tree, so do their
   accumulated vm events.  The result is that the event counters at
   higher-level cgroups can go backwards and confuse some of our
   automation, let alone people looking at the graphs over time.

To address both issues, this patch series changes the stat
implementation to spill counts upwards when the counters change.

The upward spilling is batched using the existing per-cpu cache.  In a
sparse file stress test with 5 level cgroup nesting, the additional cost
of the flushing was negligible (a little under 1% of CPU at 100% CPU
utilization, compared to the 5% of reading memory.stat during regular
operation).

This patch (of 4):

memcg_page_state(), lruvec_page_state(), memcg_sum_events() are
currently returning the state of the local memcg or lruvec, not the
recursive state.

In practice there is a demand for both versions, although the callers
that want the recursive counts currently sum them up by hand.

Per default, cgroups are considered recursive entities and generally we
expect more users of the recursive counters, with the local counts being
special cases.  To reflect that in the name, add a _local suffix to the
current implementations.

The following patch will re-incarnate these functions with recursive
semantics, but with an O(1) implementation.

[hannes@cmpxchg.org: fix bisection hole]
  Link: http://lkml.kernel.org/r/20190417160347.GC23013@cmpxchg.org
Link: http://lkml.kernel.org/r/20190412151507.2769-2-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 19:52:53 -07:00
arch arch: remove <asm/sizes.h> and <asm-generic/sizes.h> 2019-05-14 19:52:52 -07:00
block for-5.2/block-20190507 2019-05-07 18:14:36 -07:00
certs kexec, KEYS: Make use of platform keyring for signature verify 2019-02-04 17:34:07 -05:00
crypto Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2019-05-07 22:03:58 -07:00
Documentation ipc: allow boot time extension of IPCMNI from 32k to 16M 2019-05-14 19:52:52 -07:00
drivers drivers/virt/fsl_hypervisor.c: prevent integer overflow in ioctl 2019-05-14 19:52:52 -07:00
fs fs/block_dev.c: Remove duplicate header 2019-05-14 19:52:52 -07:00
include mm: memcontrol: make cgroup stats and events query API explicitly local 2019-05-14 19:52:53 -07:00
init mm: shuffle initial free memory to improve memory-side-cache utilization 2019-05-14 19:52:48 -07:00
ipc ipc: do cyclic id allocation for the ipc object. 2019-05-14 19:52:52 -07:00
kernel panic/reboot: allow specifying reboot_mode for panic only 2019-05-14 19:52:51 -07:00
lib tools/testing/selftests/sysctl/sysctl.sh: add proc_do_large_bitmap() test case 2019-05-14 19:52:51 -07:00
LICENSES LICENSES: Rename other to deprecated 2019-05-03 06:34:32 -06:00
mm mm: memcontrol: make cgroup stats and events query API explicitly local 2019-05-14 19:52:53 -07:00
net net: replace CONFIG_DEBUG_KERNEL with CONFIG_DEBUG_MISC 2019-05-14 19:52:50 -07:00
samples samples: add .gitignore for pidfd-metadata 2019-05-10 11:50:52 +02:00
scripts scripts/gdb: print cached rate in lx-clk-summary 2019-05-14 19:52:52 -07:00
security Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-05-13 15:15:00 -07:00
sound sound updates for 5.2-rc1 2019-05-09 08:26:55 -07:00
tools tools/testing/selftests/sysctl/sysctl.sh: add proc_do_large_bitmap() test case 2019-05-14 19:52:51 -07:00
usr user/Makefile: Fix typo and capitalization in comment section 2018-12-11 00:18:03 +09:00
virt mm/mmu_notifier: convert user range->blockable to helper function 2019-05-14 09:47:49 -07:00
.clang-format Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-04-17 11:26:25 -07:00
.cocciconfig
.get_maintainer.ignore
.gitattributes .gitattributes: set git diff driver for C source code files 2016-10-07 18:46:30 -07:00
.gitignore .gitignore: add more all*.config patterns 2019-05-08 09:47:46 +09:00
.mailmap A reasonably busy cycle for docs, including: 2019-05-08 12:42:50 -07:00
COPYING COPYING: use the new text with points to the license files 2018-03-23 12:41:45 -06:00
CREDITS Char/Misc driver patches for 5.1-rc1 2019-03-06 14:18:59 -08:00
Kbuild Kbuild updates for v5.1 2019-03-10 17:48:21 -07:00
Kconfig kconfig: move the "Executable file formats" menu to fs/Kconfig.binfmt 2018-08-02 08:06:55 +09:00
MAINTAINERS - Core Frameworks 2019-05-14 10:39:08 -07:00
Makefile Kbuild updates for v5.2 2019-05-08 12:25:12 -07:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.