linux/Documentation
Johannes Weiner 0a31bc97c8 mm: memcontrol: rewrite uncharge API
The memcg uncharging code that is involved towards the end of a page's
lifetime - truncation, reclaim, swapout, migration - is impressively
complicated and fragile.

Because anonymous and file pages were always charged before they had their
page->mapping established, uncharges had to happen when the page type
could still be known from the context; as in unmap for anonymous, page
cache removal for file and shmem pages, and swap cache truncation for swap
pages.  However, these operations happen well before the page is actually
freed, and so a lot of synchronization is necessary:

- Charging, uncharging, page migration, and charge migration all need
  to take a per-page bit spinlock as they could race with uncharging.

- Swap cache truncation happens during both swap-in and swap-out, and
  possibly repeatedly before the page is actually freed.  This means
  that the memcg swapout code is called from many contexts that make
  no sense and it has to figure out the direction from page state to
  make sure memory and memory+swap are always correctly charged.

- On page migration, the old page might be unmapped but then reused,
  so memcg code has to prevent untimely uncharging in that case.
  Because this code - which should be a simple charge transfer - is so
  special-cased, it is not reusable for replace_page_cache().

But now that charged pages always have a page->mapping, introduce
mem_cgroup_uncharge(), which is called after the final put_page(), when we
know for sure that nobody is looking at the page anymore.

For page migration, introduce mem_cgroup_migrate(), which is called after
the migration is successful and the new page is fully rmapped.  Because
the old page is no longer uncharged after migration, prevent double
charges by decoupling the page's memcg association (PCG_USED and
pc->mem_cgroup) from the page holding an actual charge.  The new bits
PCG_MEM and PCG_MEMSW represent the respective charges and are transferred
to the new page during migration.

mem_cgroup_migrate() is suitable for replace_page_cache() as well,
which gets rid of mem_cgroup_replace_page_cache().  However, care
needs to be taken because both the source and the target page can
already be charged and on the LRU when fuse is splicing: grab the page
lock on the charge moving side to prevent changing pc->mem_cgroup of a
page under migration.  Also, the lruvecs of both pages change as we
uncharge the old and charge the new during migration, and putback may
race with us, so grab the lru lock and isolate the pages iff on LRU to
prevent races and ensure the pages are on the right lruvec afterward.

Swap accounting is massively simplified: because the page is no longer
uncharged as early as swap cache deletion, a new mem_cgroup_swapout() can
transfer the page's memory+swap charge (PCG_MEMSW) to the swap entry
before the final put_page() in page reclaim.

Finally, page_cgroup changes are now protected by whatever protection the
page itself offers: anonymous pages are charged under the page table lock,
whereas page cache insertions, swapin, and migration hold the page lock.
Uncharging happens under full exclusion with no outstanding references.
Charging and uncharging also ensure that the page is off-LRU, which
serializes against charge migration.  Remove the very costly page_cgroup
lock and set pc->flags non-atomically.

[mhocko@suse.cz: mem_cgroup_charge_statistics needs preempt_disable]
[vdavydov@parallels.com: fix flags definition]
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Tested-by: Jet Chen <jet.chen@intel.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Tested-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08 15:57:17 -07:00
..
ABI Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid 2014-08-06 20:56:28 -07:00
accounting Documentation/accounting/getdelays.c: add missing null-terminate after strncpy call 2014-06-23 16:47:44 -07:00
acpi ACPI / documentation: Remove reference to acpi_platform_device_ids from enumeration.txt 2014-07-12 00:07:05 +02:00
aoe
arm Merge branch 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm into next 2014-06-05 15:57:04 -07:00
arm64 arm64: Add support for 48-bit VA space with 64KB page configuration 2014-07-23 15:28:15 +01:00
auxdisplay
backlight backlight: lp855x_bl: support new LP8555 device 2013-11-13 12:09:14 +09:00
blackfin Documentation/: update 00-INDEX files 2014-02-10 16:01:40 -08:00
block Documentation/: update 00-INDEX files 2014-02-10 16:01:40 -08:00
blockdev zram: propagate error to user 2014-04-07 16:36:02 -07:00
bus-devices
cdrom
cgroups mm: memcontrol: rewrite uncharge API 2014-08-08 15:57:17 -07:00
connector w1: optional bundling of netlink kernel replies 2014-05-27 13:56:21 -07:00
console
cpu-freq intel_pstate: Update documentation of {max,min}_perf_pct sysfs files 2014-07-07 01:22:19 +02:00
cpuidle cpuidle: remove cpuidle_unregister_governor() 2013-10-30 01:21:24 +01:00
cris
crypto
development-process
device-mapper dm thin: add 'no_space_timeout' dm-thin-pool module param 2014-05-20 14:30:36 -04:00
devicetree Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc 2014-08-07 08:50:34 -07:00
DocBook Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-08-05 17:46:42 -07:00
driver-model Documentation: devres: Sort managed interfaces 2014-07-11 17:56:55 -07:00
dvb [media] get_dvb_firmware: Add firmware extractor for si2165 2014-07-27 17:01:12 -03:00
early-userspace
EDID drm: Add 800x600 (SVGA) screen resolution to the built-in EDIDs 2014-05-26 12:53:40 +10:00
extcon extcon: fix switch class porting guide (Documentation) 2014-01-07 11:54:28 +09:00
fault-injection
fb doc: spelling error changes 2014-05-05 15:32:05 +02:00
filesystems Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-08-05 17:46:42 -07:00
firmware_class doc: fix minor typos in firmware_class README 2014-07-17 18:43:40 -07:00
fmc FMC: make eeprom attribute writable 2014-02-28 15:12:08 -08:00
frv
gpio Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial into next 2014-06-04 08:50:34 -07:00
hid Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial into next 2014-06-04 08:50:34 -07:00
hwmon hwmon: Add pwm-fan driver 2014-08-04 07:01:38 -07:00
i2c Documentation: i2c: improve section about flags mangling the protocol 2014-04-06 13:53:48 +02:00
i2o
ia64
ide Documentation/: update 00-INDEX files 2014-02-10 16:01:40 -08:00
infiniband
input Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2014-07-23 15:42:53 -07:00
ioctl crypto: qat - Update to makefiles 2014-06-20 21:26:19 +08:00
isdn
ja_JP Documentation: Update stable address in Chinese and Japanese translations 2014-04-16 14:13:27 -07:00
kbuild kbuild: fix a typo in a kbuild document 2014-06-18 21:38:18 +02:00
kdump
ko_KR Documentation: HOWTO: Updates on subsystem trees, patchwork, -next (vs. -mm) in ko_KR 2014-01-08 15:32:51 -08:00
laptops Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2014-08-06 21:03:53 -07:00
leds Documentation/: update 00-INDEX files 2014-02-10 16:01:40 -08:00
m68k Documentation/: update 00-INDEX files 2014-02-10 16:01:40 -08:00
make
memory-devices
metag
mic misc: mic: add support for loading/unloading dma driver 2014-07-11 18:31:12 -07:00
mips
misc-devices Merge branch 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging 2014-01-29 18:56:27 -08:00
mmc
mn10300
mtd MTD updates for 3.16: 2014-06-11 08:35:34 -07:00
namespaces
netlabel
networking i40e: adds FCoE to build and updates its documentation 2014-08-02 19:41:13 -07:00
nfc
parisc
PCI doc: replace "practise" with "practice" in Documentation 2014-06-19 15:28:56 +02:00
pcmcia
phy phy: Add new Exynos USB 2.0 PHY driver 2014-03-08 12:39:44 +05:30
platform Documentation: Add list of laptop models supported by the Compal driver 2014-06-10 19:11:06 -04:00
power ACPI and power management updates for 3.17-rc1 2014-08-06 20:34:19 -07:00
powerpc Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc 2014-06-10 18:54:22 -07:00
pps
prctl
pti
ptp ptp: In the testptp utility, use clock_adjtime from glibc when available 2014-06-16 21:32:31 -07:00
rapidio rapidio: rework device hierarchy and introduce mport class of devices 2014-04-07 16:36:07 -07:00
RCU list: fix order of arguments for hlist_add_after(_rcu) 2014-08-06 18:01:24 -07:00
s390 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial into next 2014-06-04 08:50:34 -07:00
scheduler asm/system.h: clean asm/system.h from docs 2014-04-07 16:36:11 -07:00
scsi Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2014-08-06 21:03:53 -07:00
security Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2014-08-06 21:03:53 -07:00
serial tty/serial: Add GPIOLIB helpers for controlling modem lines 2014-05-28 12:49:14 -07:00
sh
sound ALSA: virtuoso: add Xonar Essence STX II support 2014-08-04 15:20:48 +02:00
spi Merge remote-tracking branches 'spi/topic/s3c64xx', 'spi/topic/sc18is602', 'spi/topic/sh-hspi', 'spi/topic/sh-msiof', 'spi/topic/sh-sci', 'spi/topic/sirf' and 'spi/topic/spidev' into spi-next 2014-03-30 00:51:34 +00:00
sysctl kernel/watchdog.c: print traces for all cpus on lockup detection 2014-06-23 16:47:44 -07:00
target
thermal drm/nouveau/doc: update the thermal documentation 2014-06-17 14:50:17 +10:00
timers clocksource: document some basic timekeeping concepts 2014-07-23 15:07:13 -07:00
tpm
trace mm: trace-vmscan-postprocess.pl: report the number of file/anon pages respectively 2014-08-06 18:01:20 -07:00
usb usb: doc: hotplug.txt code typos 2014-07-09 16:05:42 -07:00
vDSO x86/vdso/doc: Make vDSO examples more portable 2014-06-12 19:01:24 -07:00
video4linux [media] update cx23885 and em28xx cardlists 2014-07-26 11:55:10 -03:00
virtual Bugfixes 2014-07-22 10:22:53 +02:00
vm mm: mark remap_file_pages() syscall as deprecated 2014-06-06 16:08:17 -07:00
w1 w1: new w1_ds2406 driver 2014-06-19 17:45:14 -07:00
watchdog Documentation: fix two typos in watchdog-api.txt 2014-08-05 22:43:21 +02:00
wimax
x86 x86/mm: New tunable for single vs full TLB flush 2014-07-31 08:48:51 -07:00
xtensa xtensa: remap io area defined in device tree 2014-01-15 00:25:14 +04:00
zh_CN Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2014-08-06 21:03:53 -07:00
.gitignore
00-INDEX Merge branch 'x86-nuke-platforms-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-04-02 13:15:58 -07:00
applying-patches.txt
assoc_array.txt KEYS: Fix multiple key add into associative array 2013-12-02 11:24:18 +00:00
atomic_ops.txt arch,doc: Convert smp_mb__*() 2014-04-18 14:20:48 +02:00
bad_memory.txt
basic_profiling.txt
bcache.txt
binfmt_misc.txt
braille-console.txt
bt8xxgpio.txt
btmrvl.txt
BUG-HUNTING
bus-virt-phys-mapping.txt
cachetlb.txt
Changes Documentation/Changes: clean up mcelog paragraph 2014-07-12 11:30:36 -07:00
circular-buffers.txt documentation: Update circular buffer for load-acquire/store-release 2013-12-03 10:08:57 -08:00
clk.txt clk: Improve clk_ops documentation 2014-05-12 17:08:33 -07:00
coccinelle.txt
CodingStyle Documentation: expand/clarify debug documentation 2014-06-04 16:54:17 -07:00
cpu-hotplug.txt Doc/cpu-hotplug: Specify race-free way to register CPU hotplug callbacks 2014-03-20 13:43:40 +01:00
cpu-load.txt
cputopology.txt
crc32.txt
dcdbas.txt
debugging-modules.txt
debugging-via-ohci1394.txt firewire: revert to 4 GB RDMA, fix protocols using Memory Space 2014-05-29 15:50:30 +02:00
dell_rbu.txt
devices.txt Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media 2014-04-04 09:50:07 -07:00
digsig.txt
DMA-API-HOWTO.txt DMA-API: Update dma_pool_create ()and dma_pool_alloc() descriptions 2014-05-26 17:28:28 -06:00
DMA-API.txt DMA-API: Capitalize "CPU" consistently 2014-05-26 17:28:27 -06:00
DMA-attributes.txt doc: spelling error changes 2014-05-05 15:32:05 +02:00
dma-buf-sharing.txt doc: spelling error changes 2014-05-05 15:32:05 +02:00
DMA-ISA-LPC.txt DMA-API: Clarify physical/bus address distinction 2014-05-20 16:54:21 -06:00
dmaengine.txt
dmatest.txt dmatest: add a 'wait' parameter 2013-11-14 11:04:40 -08:00
dontdiff Documentation: LLVMLinux: Update Documentation/dontdiff 2014-04-09 13:44:34 -07:00
dynamic-debug-howto.txt doc: spelling error changes 2014-05-05 15:32:05 +02:00
edac.txt Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial into next 2014-06-04 08:50:34 -07:00
efi-stub.txt doc: arm64: add description of EFI stub support 2014-04-30 19:57:05 +01:00
eisa.txt
email-clients.txt Documentation: add section about git to email-clients.txt 2014-06-29 13:38:33 -07:00
flexible-arrays.txt
futex-requeue-pi.txt doc: fix double words 2014-03-21 13:16:58 +01:00
gcov.txt gcov: compile specific gcov implementation based on gcc version 2013-11-13 12:09:34 +09:00
highuid.txt
HOWTO Documentation: HOWTO: Update broken links to tpp 2013-12-10 23:09:08 -08:00
hsi.txt Documentation: HSI: Add some general description for the HSI subsystem 2014-05-04 09:49:46 +02:00
hw_random.txt
hwspinlock.txt
init.txt
initrd.txt
intel_txt.txt
Intel-IOMMU.txt
io_ordering.txt
io-mapping.txt doc: fix some typos 2013-12-02 14:48:28 +01:00
iostats.txt
IPMI.txt
IRQ-affinity.txt
IRQ-domain.txt genirq: Improve documentation to match current implementation 2014-05-27 10:16:44 +02:00
IRQ.txt
irqflags-tracing.txt asm/system.h: clean asm/system.h from docs 2014-04-07 16:36:11 -07:00
isapnp.txt
java.txt Documentation: update java sample wrapper for java 7 2014-05-25 12:39:00 -07:00
kernel-doc-nano-HOWTO.txt
kernel-docs.txt
kernel-parameters.txt Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus 2014-08-07 08:47:00 -07:00
kernel-per-CPU-kthreads.txt Documentation/kernel-per-CPU-kthreads.txt: Workqueue affinity 2014-02-17 14:56:08 -08:00
kmemcheck.txt doc: fix double words 2014-03-21 13:16:58 +01:00
kmemleak.txt mm: introduce kmemleak_update_trace() 2014-06-06 16:08:17 -07:00
kobject.txt kobject: remove kset from sysfs immediately in kset_unregister() 2013-12-07 21:20:11 -08:00
kprobes.txt kprobes: Introduce NOKPROBE_SYMBOL() macro to maintain kprobes blacklist 2014-04-24 10:02:56 +02:00
kref.txt
ldm.txt
local_ops.txt
lockdep-design.txt
lockstat.txt
lockup-watchdogs.txt
logo.gif
logo.txt
magic-number.txt Documentation/serial: Delete obsolete driver documentation 2014-04-16 14:20:34 -07:00
Makefile
ManagementStyle
md.txt doc: fix some typos in documentations 2013-12-02 14:45:19 +01:00
media-framework.txt
memory-barriers.txt documentation: Add acquire/release barriers to pairing rules 2014-07-08 08:32:51 -07:00
memory-hotplug.txt mm, hotplug: probe interface is available on several platforms 2014-06-23 16:47:43 -07:00
module-signing.txt Nothing major: the stricter permissions checking for sysfs broke 2014-04-06 09:38:07 -07:00
mono.txt
mutex-design.txt locking/mutexes: Documentation update/rewrite 2014-06-05 13:29:37 +02:00
nommu-mmap.txt
numastat.txt
oops-tracing.txt Use 'E' instead of 'X' for unsigned module taint flag. 2014-03-31 14:52:43 +10:30
padata.txt
parport-lowlevel.txt
parport.txt
percpu-rw-semaphore.txt
phy.txt phy: core: Let node ptr of PHY point to PHY and not of PHY provider 2014-07-22 12:46:11 +05:30
pi-futex.txt
pinctrl.txt pinctrl: Fix some typos and grammar issues in the documentation 2014-01-15 13:59:50 +01:00
pnp.txt
preempt-locking.txt
printk-formats.txt doc: printk-formats: do not mention casts for u64/s64 2014-05-05 15:32:42 +02:00
pwm.txt pwm: modify PWM_LOOKUP to initialize all struct pwm_lookup members 2014-05-21 11:19:36 +02:00
ramoops.txt
rbtree.txt doc: spelling error changes 2014-05-05 15:32:05 +02:00
remoteproc.txt
rfkill.txt doc: spelling error changes 2014-05-05 15:32:05 +02:00
robust-futex-ABI.txt Documentation/robust-futex-API: Count properly to 4 2013-11-30 14:08:28 +01:00
robust-futexes.txt doc: spelling error changes 2014-05-05 15:32:05 +02:00
rpmsg.txt
rt-mutex-design.txt doc: fix some typos in documentations 2013-12-02 14:45:19 +01:00
rt-mutex.txt
rtc.txt
SAK.txt
SecurityBugs
serial-console.txt
sgi-ioc4.txt
SM501.txt
smsc_ece1099.txt
sparse.txt
spinlocks.txt
stable_api_nonsense.txt
stable_kernel_rules.txt stable_kernel_rules: Add pointer to netdev-FAQ for network patches 2014-07-09 15:54:27 -07:00
static-keys.txt doc: fix some typos in documentations 2013-12-02 14:45:19 +01:00
SubmitChecklist
SubmittingDrivers doc: SubmittingPatches: remove dead link, kerneltrap.org no longer works 2014-06-19 15:15:27 +02:00
SubmittingPatches Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2014-08-06 21:03:53 -07:00
svga.txt
sysfs-rules.txt
sysrq.txt
this_cpu_ops.txt
unaligned-memory-access.txt ether_addr_equal: Optimize implementation, remove unused compare_ether_addr 2013-12-06 16:37:43 -05:00
unicode.txt
unshare.txt
vfio.txt drivers/vfio: EEH support for VFIO PCI device 2014-08-05 15:28:48 +10:00
VGA-softcursor.txt
vgaarbiter.txt
video-output.txt
vme_api.txt VME: Rename vme_slot_get to avoid confusion with reference counting 2013-12-03 11:15:58 -08:00
volatile-considered-harmful.txt
workqueue.txt
ww-mutex-design.txt
xz.txt
zorro.txt zorro/UAPI: Disintegrate include/linux/zorro*.h 2013-11-26 11:09:08 +01:00