linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-10 22:21:40 +00:00

History

Johannes Weiner 869712fd3d mm: memcontrol: fix network errors from failing __GFP_ATOMIC charges While upgrading from 4.16 to 5.2, we noticed these allocation errors in the log of the new kernel: SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) cache: tw_sock_TCPv6(960:helper-logs), object size: 232, buffer size: 240, default order: 1, min order: 0 node 0: slabs: 5, objs: 170, free: 0 slab_out_of_memory+1 ___slab_alloc+969 __slab_alloc+14 kmem_cache_alloc+346 inet_twsk_alloc+60 tcp_time_wait+46 tcp_fin+206 tcp_data_queue+2034 tcp_rcv_state_process+784 tcp_v6_do_rcv+405 __release_sock+118 tcp_close+385 inet_release+46 __sock_release+55 sock_close+17 __fput+170 task_work_run+127 exit_to_usermode_loop+191 do_syscall_64+212 entry_SYSCALL_64_after_hwframe+68 accompanied by an increase in machines going completely radio silent under memory pressure. One thing that changed since 4.16 is `e699e2c6a6` ("net, mm: account sock objects to kmemcg"), which made these slab caches subject to cgroup memory accounting and control. The problem with that is that cgroups, unlike the page allocator, do not maintain dedicated atomic reserves. As a cgroup's usage hovers at its limit, atomic allocations - such as done during network rx - can fail consistently for extended periods of time. The kernel is not able to operate under these conditions. We don't want to revert the culprit patch, because it indeed tracks a potentially substantial amount of memory used by a cgroup. We also don't want to implement dedicated atomic reserves for cgroups. There is no point in keeping a fixed margin of unused bytes in the cgroup's memory budget to accomodate a consumer that is impossible to predict - we'd be wasting memory and get into configuration headaches, not unlike what we have going with min_free_kbytes. We do this for physical mem because we have to, but cgroups are an accounting game. Instead, account these privileged allocations to the cgroup, but let them bypass the configured limit if they have to. This way, we get the benefits of accounting the consumed memory and have it exert pressure on the rest of the cgroup, but like with the page allocator, we shift the burden of reclaimining on behalf of atomic allocations onto the regular allocations that can block. Link: http://lkml.kernel.org/r/20191022233708.365764-1-hannes@cmpxchg.org Fixes: `e699e2c6a6` ("net, mm: account sock objects to kmemcg") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: <stable@vger.kernel.org> [4.18+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>		2019-11-06 08:47:50 -08:00
..
kasan	mm: introduce compound_nr()	2019-09-24 15:54:08 -07:00
backing-dev.c	bdi: Do not use freezable workqueue	2019-10-06 09:11:35 -06:00
balloon_compaction.c	mm/balloon_compaction: suppress allocation warnings	2019-09-04 07:42:01 -04:00
cleancache.c	Driver Core and debugfs changes for 5.3-rc1	2019-07-12 12:24:03 -07:00
cma_debug.c	mm/cma_debug.c: fix the break condition in cma_maxchunk_get()	2019-05-14 09:47:45 -07:00
cma.c	mm/cma.c: fail if fixed declaration can't be honored	2019-07-16 19:23:21 -07:00
cma.h
compaction.c	mm, compaction: fix wrong pfn handling in __reset_isolation_pfn()	2019-10-14 15:04:01 -07:00
debug_page_ref.c
debug.c	mm: update references to page _refcount	2019-05-14 19:52:47 -07:00
dmapool.c	mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options	2019-07-12 11:05:46 -07:00
early_ioremap.c
fadvise.c	fs: Export generic_fadvise()	2019-08-30 22:43:58 -07:00
failslab.c	mm/failslab.c: by default, do not fail allocations with direct reclaim only	2019-07-12 11:05:43 -07:00
filemap.c	mm/filemap.c: include <linux/ramfs.h> for generic_file_vm_ops definition	2019-10-19 06:32:32 -04:00
frame_vector.c	mm: untag user pointers in get_vaddr_frames	2019-09-25 17:51:41 -07:00
frontswap.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 482	2019-06-19 17:09:52 +02:00
gup_benchmark.c	mm/gup: replace get_user_pages_longterm() with FOLL_LONGTERM	2019-05-14 09:47:45 -07:00
gup.c	mm/gup: fix a misnamed "write" argument, and a related bug	2019-10-19 06:32:32 -04:00
highmem.c	mm: convert totalram_pages and totalhigh_pages variables to atomic	2018-12-28 12:11:47 -08:00
hmm.c	pagewalk: separate function pointers from iterator data	2019-09-07 04:28:04 -03:00
huge_memory.c	mm/thp: fix node page state in split_huge_page_to_list()	2019-10-19 06:32:32 -04:00
hugetlb_cgroup.c	mm: introduce compound_nr()	2019-09-24 15:54:08 -07:00
hugetlb.c	hugetlbfs: don't access uninitialized memmaps in pfn_range_valid_gigantic()	2019-10-19 06:32:32 -04:00
hwpoison-inject.c	hwpoison-inject: no need to check return value of debugfs_create functions	2019-06-03 15:39:40 +02:00
init-mm.c	mm/init-mm.c: include <linux/mman.h> for vm_committed_as_batch	2019-10-19 06:32:32 -04:00
internal.h	mm: introduce MADV_COLD	2019-09-25 17:51:41 -07:00
interval_tree.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 248	2019-06-19 17:09:08 +02:00
Kconfig	mm,thp: add read-only THP support for (non-shmem) FS	2019-09-24 15:54:11 -07:00
Kconfig.debug	mm, page_owner, debug_pagealloc: save and dump freeing stack trace	2019-09-24 15:54:08 -07:00
khugepaged.c	mm/khugepaged: fix might_sleep() warn with CONFIG_HIGHPTE=y	2019-11-06 08:47:50 -08:00
kmemleak-test.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 333	2019-06-05 17:37:06 +02:00
kmemleak.c	kmemleak: Do not corrupt the object_list during clean-up	2019-10-14 08:56:16 -07:00
ksm.c	mm: move memcmp_pages() and pages_identical()	2019-09-24 15:54:11 -07:00
list_lru.c	mm: memcg/slab: stop setting page->mem_cgroup pointer for slab pages	2019-07-12 11:05:44 -07:00
maccess.c	The main changes in this release include:	2019-07-18 11:51:00 -07:00
madvise.c	mm: factor out common parts between MADV_COLD and MADV_PAGEOUT	2019-09-25 17:51:41 -07:00
Makefile	mm: silence -Woverride-init/initializer-overrides	2019-09-24 15:54:10 -07:00
memblock.c	mm: memblock: do not enforce current limit for memblock_phys* family	2019-10-19 06:32:32 -04:00
memcontrol.c	mm: memcontrol: fix network errors from failing __GFP_ATOMIC charges	2019-11-06 08:47:50 -08:00
memfd.c	mm: page cache: store only head pages in i_pages	2019-09-24 15:54:08 -07:00
memory_hotplug.c	mm/memory_hotplug: fix updating the node span	2019-11-06 08:47:50 -08:00
memory-failure.c	mm/memory-failure.c: don't access uninitialized memmaps in memory_failure()	2019-10-19 06:32:31 -04:00
memory.c	mm: do not hash address in print_bad_pte()	2019-09-24 15:54:09 -07:00
mempolicy.c	Merge branch 'hugepage-fallbacks' (hugepatch patches from David Rientjes)	2019-09-28 14:26:47 -07:00
mempool.c	docs/core-api/mm: fix return value descriptions in mm/	2019-03-05 21:07:20 -08:00
memremap.c	mm/memunmap: don't access uninitialized memmap in memunmap_pages()	2019-10-19 06:32:32 -04:00
memtest.c
migrate.c	mm: untag user pointers passed to memory syscalls	2019-09-25 17:51:41 -07:00
mincore.c	mm: untag user pointers passed to memory syscalls	2019-09-25 17:51:41 -07:00
mlock.c	mm: untag user pointers passed to memory syscalls	2019-09-25 17:51:41 -07:00
mm_init.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
mmap.c	mm: untag user pointers in mmap/munmap/mremap/brk	2019-09-25 17:51:41 -07:00
mmu_context.c
mmu_gather.c	mm: remove quicklist page table caches	2019-09-24 15:54:09 -07:00
mmu_notifier.c	mm/mmu_notifiers: use the right return code for WARN_ON	2019-11-06 08:47:50 -08:00
mmzone.c
mprotect.c	mm: untag user pointers passed to memory syscalls	2019-09-25 17:51:41 -07:00
mremap.c	mm: untag user pointers in mmap/munmap/mremap/brk	2019-09-25 17:51:41 -07:00
msync.c	mm: untag user pointers passed to memory syscalls	2019-09-25 17:51:41 -07:00
nommu.c	mm: introduce page_size()	2019-09-24 15:54:08 -07:00
oom_kill.c	mm: introduce MADV_COLD	2019-09-25 17:51:41 -07:00
page_alloc.c	mm/page_alloc.c: ratelimit allocation failure warnings more aggressively	2019-11-06 08:47:50 -08:00
page_counter.c	memcg: introduce memory.min	2018-06-07 17:34:36 -07:00
page_ext.c	mm, page_owner: fix off-by-one error in __set_page_owner_handle()	2019-10-14 15:04:00 -07:00
page_idle.c	mm/page_idle.c: fix oops because end_pfn is larger than max_pfn	2019-06-29 16:43:45 +08:00
page_io.c	mm, swap: use rbtree for swap_extent	2019-07-12 11:05:43 -07:00
page_isolation.c	mm/page_isolation.c: change the prototype of undo_isolate_page_range()	2019-07-12 11:05:43 -07:00
page_owner.c	mm/page_owner: don't access uninitialized memmaps when reading /proc/pagetypeinfo	2019-10-19 06:32:31 -04:00
page_poison.c	mm/page_poison.c: fix a typo in a comment	2019-09-24 15:54:08 -07:00
page_vma_mapped.c	mm: introduce page_size()	2019-09-24 15:54:08 -07:00
page-writeback.c	writeback, memcg: Implement foreign dirty flushing	2019-08-27 09:22:38 -06:00
pagewalk.c	pagewalk: use lockdep_assert_held for locking validation	2019-09-07 04:28:04 -03:00
percpu-internal.h	percpu: convert chunk hints to be based on pcpu_block_md	2019-03-13 12:25:31 -07:00
percpu-km.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428	2019-06-05 17:37:16 +02:00
percpu-stats.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428	2019-06-05 17:37:16 +02:00
percpu-vm.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428	2019-06-05 17:37:16 +02:00
percpu.c	percpu: Use struct_size() helper	2019-09-04 13:40:49 -07:00
pgtable-generic.c	x86/mm: Page size aware flush_tlb_mm_range()	2018-10-09 16:51:11 +02:00
process_vm_access.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152	2019-05-30 11:26:32 -07:00
readahead.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
rmap.c	mm: include <linux/huge_mm.h> for is_vma_temporary_stack	2019-10-19 06:32:32 -04:00
rodata_test.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441	2019-06-05 17:37:17 +02:00
shmem.c	Merge branch 'work.mount3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2019-10-10 08:16:44 -07:00
shuffle.c	mm: fix -Wmissing-prototypes warnings	2019-10-07 15:47:19 -07:00
shuffle.h	mm: maintain randomization of page free lists	2019-05-14 19:52:48 -07:00
slab_common.c	mm: memcg/slab: fix panic in __free_slab() caused by premature memcg pointer release	2019-10-19 06:32:32 -04:00
slab.c	mm/slab.c: fix kernel-doc warning for __ksize()	2019-10-14 15:04:01 -07:00
slab.h	mm: slab: make page_cgroup_ino() to recognize non-compound slab pages properly	2019-11-06 08:47:50 -08:00
slob.c	mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)	2019-10-07 15:47:20 -07:00
slub.c	mm/slub.c: init_on_free=1 should wipe freelist ptr for bulk allocations	2019-10-14 15:04:01 -07:00
sparse-vmemmap.c	mm/sparsemem: convert kmalloc_section_memmap() to populate_section_memmap()	2019-07-18 17:08:07 -07:00
sparse.c	mm: fix -Wmissing-prototypes warnings	2019-10-07 15:47:19 -07:00
swap_cgroup.c
swap_slots.c	mm, swap, get_swap_pages: use entry_size instead of cluster in parameter	2018-08-22 10:52:44 -07:00
swap_state.c	mm: page cache: store only head pages in i_pages	2019-09-24 15:54:08 -07:00
swap.c	mm: introduce MADV_COLD	2019-09-25 17:51:41 -07:00
swapfile.c	vfs: don't allow writes to swap files	2019-08-20 07:55:16 -07:00
truncate.c	mm/thp: allow dropping THP from page cache	2019-10-19 06:32:33 -04:00
usercopy.c	usercopy: Avoid HIGHMEM pfn warning	2019-09-17 15:20:17 -07:00
userfaultfd.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 499	2019-06-19 17:09:53 +02:00
util.c	arm64, mm: make randomization selected by generic topdown mmap layout	2019-09-24 15:54:11 -07:00
vmacache.c	mm: get rid of vmacache_flush_all() entirely	2018-09-13 15:18:04 -10:00
vmalloc.c	augmented rbtree: add new RB_DECLARE_CALLBACKS_MAX macro	2019-09-25 17:51:39 -07:00
vmpressure.c	mm/vmpressure.c: fix a signedness bug in vmpressure_register_event()	2019-10-07 15:47:19 -07:00
vmscan.c	mm/vmscan.c: support removing arbitrary sized pages from mapping	2019-10-19 06:32:32 -04:00
vmstat.c	mm, vmstat: reduce zone->lock holding time by /proc/pagetypeinfo	2019-11-06 08:47:50 -08:00
workingset.c	mm: workingset: fix vmstat counters for shadow nodes	2019-08-13 16:06:52 -07:00
z3fold.c	mm/z3fold.c: claim page in the beginning of free	2019-10-07 15:47:19 -07:00
zbud.c	treewide: Add SPDX license identifier for more missed files	2019-05-21 10:50:45 +02:00
zpool.c	zpool: add malloc_support_movable to zpool_driver	2019-09-24 15:54:12 -07:00
zsmalloc.c	mm/zsmalloc.c: fix a -Wunused-function warning	2019-09-24 15:54:12 -07:00
zswap.c	zswap: do not map same object twice	2019-09-24 15:54:12 -07:00