linux/mm
Daniel Borkmann 0708a0afe2 mm: Consider __GFP_NOWARN flag for oversized kvmalloc() calls
syzkaller was recently triggering an oversized kvmalloc() warning via
xdp_umem_create().

The triggered warning was added back in 7661809d49 ("mm: don't allow
oversized kvmalloc() calls"). The rationale for the warning for huge
kvmalloc sizes was as a reaction to a security bug where the size was
more than UINT_MAX but not everything was prepared to handle unsigned
long sizes.

Anyway, the AF_XDP related call trace from this syzkaller report was:

  kvmalloc include/linux/mm.h:806 [inline]
  kvmalloc_array include/linux/mm.h:824 [inline]
  kvcalloc include/linux/mm.h:829 [inline]
  xdp_umem_pin_pages net/xdp/xdp_umem.c:102 [inline]
  xdp_umem_reg net/xdp/xdp_umem.c:219 [inline]
  xdp_umem_create+0x6a5/0xf00 net/xdp/xdp_umem.c:252
  xsk_setsockopt+0x604/0x790 net/xdp/xsk.c:1068
  __sys_setsockopt+0x1fd/0x4e0 net/socket.c:2176
  __do_sys_setsockopt net/socket.c:2187 [inline]
  __se_sys_setsockopt net/socket.c:2184 [inline]
  __x64_sys_setsockopt+0xb5/0x150 net/socket.c:2184
  do_syscall_x64 arch/x86/entry/common.c:50 [inline]
  do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
  entry_SYSCALL_64_after_hwframe+0x44/0xae

Björn mentioned that requests for >2GB allocation can still be valid:

  The structure that is being allocated is the page-pinning accounting.
  AF_XDP has an internal limit of U32_MAX pages, which is *a lot*, but
  still fewer than what memcg allows (PAGE_COUNTER_MAX is a LONG_MAX/
  PAGE_SIZE on 64 bit systems). [...]

  I could just change from U32_MAX to INT_MAX, but as I stated earlier
  that has a hacky feeling to it. [...] From my perspective, the code
  isn't broken, with the memcg limits in consideration. [...]

Linus says:

  [...] Pretty much every time this has come up, the kernel warning has
  shown that yes, the code was broken and there really wasn't a reason
  for doing allocations that big.

  Of course, some people would be perfectly fine with the allocation
  failing, they just don't want the warning. I didn't want __GFP_NOWARN
  to shut it up originally because I wanted people to see all those
  cases, but these days I think we can just say "yeah, people can shut
  it up explicitly by saying 'go ahead and fail this allocation, don't
  warn about it'".

  So enough time has passed that by now I'd certainly be ok with [it].

Thus allow call-sites to silence such userspace triggered splats if the
allocation requests have __GFP_NOWARN. For xdp_umem_pin_pages()'s call
to kvcalloc() this is already the case, so nothing else needed there.

Fixes: 7661809d49 ("mm: don't allow oversized kvmalloc() calls")
Reported-by: syzbot+11421fbbff99b989670e@syzkaller.appspotmail.com
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: syzbot+11421fbbff99b989670e@syzkaller.appspotmail.com
Cc: Björn Töpel <bjorn@kernel.org>
Cc: Magnus Karlsson <magnus.karlsson@intel.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: David S. Miller <davem@davemloft.net>
Link: https://lore.kernel.org/bpf/CAJ+HfNhyfsT5cS_U9EC213ducHs9k9zNxX9+abqC0kTrPbQ0gg@mail.gmail.com
Link: https://lore.kernel.org/bpf/20211201202905.b9892171e3f5b9a60f9da251@linux-foundation.org
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Ackd-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-04 10:00:37 -08:00
..
damon mm/damon: hide kernel pointer from tracepoint event 2022-01-15 16:30:33 +02:00
kasan lib/stackdepot: always do filter_irq_stacks() in stack_depot_save() 2022-01-22 08:33:38 +02:00
kfence kfence: make test case compatible with run time set sample interval 2022-02-11 17:55:00 -08:00
backing-dev.c mm: bdi: initialize bdi_min_ratio when bdi is unregistered 2021-12-10 17:10:56 -08:00
balloon_compaction.c mm: fix typos in comments 2021-05-07 00:26:35 -07:00
bootmem_info.c bootmem: Use page->index instead of page->freelist 2022-01-06 12:27:03 +01:00
cma_debug.c mm/cma: change cma mutex to irq safe spinlock 2021-05-05 11:27:21 -07:00
cma_sysfs.c mm: cma: support sysfs 2021-05-05 11:27:24 -07:00
cma.c memblock: rename memblock_free to memblock_phys_free 2021-11-06 13:30:41 -07:00
cma.h mm: cma: support sysfs 2021-05-05 11:27:24 -07:00
compaction.c mm: compaction: fix the migration stats in trace_mm_compaction_migratepages() 2022-01-15 16:30:30 +02:00
debug_page_ref.c
debug_vm_pgtable.c mm/debug_vm_pgtable: remove pte entry from the page table 2022-02-04 09:25:04 -08:00
debug.c mm,fs: split dump_mapping() out from dump_page() 2022-01-15 16:30:26 +02:00
dmapool.c mm/dmapool.c: revert "make dma pool to use kmalloc_node" 2022-01-15 16:30:28 +02:00
early_ioremap.c mm/early_ioremap.c: remove redundant early_ioremap_shutdown() 2021-09-08 11:50:24 -07:00
fadvise.c
failslab.c
filemap.c Merge branch 'akpm' (patches from Andrew) 2022-01-22 11:28:23 +02:00
folio-compat.c filemap: Add filemap_release_folio() 2022-01-04 13:15:34 -05:00
frontswap.c frontswap: remove support for multiple ops 2022-01-22 08:33:38 +02:00
gup_test.c selftests/vm: gup_test: test faulting in kernel, and verify pinnable pages 2021-05-05 11:27:26 -07:00
gup_test.h selftests/vm: gup_test: fix test flag 2021-05-05 11:27:26 -07:00
gup.c Revert "mm/gup: small refactoring: simplify try_grab_page()" 2022-02-03 06:51:42 -08:00
highmem.c Fixes for 5.16 folios: 2021-11-25 10:13:56 -08:00
hmm.c mm/hmm.c: allow VM_MIXEDMAP to work with hmm_range_fault 2022-01-15 16:30:31 +02:00
huge_memory.c Merge branch 'akpm' (patches from Andrew) 2022-01-15 20:37:06 +02:00
hugetlb_cgroup.c hugetlb: add hugetlb.*.numa_stat file 2022-01-15 16:30:29 +02:00
hugetlb_vmemmap.c mm: hugetlb: introduce CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON 2021-06-30 20:47:26 -07:00
hugetlb_vmemmap.h mm: hugetlb: introduce nr_free_vmemmap_pages in the struct hstate 2021-06-30 20:47:25 -07:00
hugetlb.c hugetlbfs: fix a truncation issue in hugepages parameter 2022-02-26 09:51:17 -08:00
hwpoison-inject.c mm: hwpoison: don't drop slab caches for offlining non-LRU page 2021-09-03 09:58:15 -07:00
init-mm.c mm: add setup_initial_init_mm() helper 2021-07-08 11:48:21 -07:00
internal.h Merge branch 'akpm' (patches from Andrew) 2022-01-15 20:37:06 +02:00
interval_tree.c mm/interval_tree: add comments to improve code readability 2021-04-30 11:20:38 -07:00
io-mapping.c mm: add a io_mapping_map_user helper 2021-04-30 11:20:39 -07:00
ioremap.c mm: move ioremap_page_range to vmalloc.c 2021-09-08 11:50:24 -07:00
Kconfig mm: hide the FRONTSWAP Kconfig symbol 2022-01-22 08:33:38 +02:00
Kconfig.debug mm: page table check 2022-01-15 16:30:28 +02:00
khugepaged.c mm/page_table_check: check entries at pmd levels 2022-02-04 09:25:04 -08:00
kmemleak.c mm/kmemleak: avoid scanning potential huge holes 2022-02-04 09:25:05 -08:00
ksm.c mm: ksm: fix use-after-free kasan report in ksm_might_need_to_copy 2022-01-15 16:30:31 +02:00
list_lru.c mm: list_lru: only add memcg-aware lrus to the global lru list 2021-11-06 13:30:35 -07:00
maccess.c ARM: 9115/1: mm/maccess: fix unaligned copy_{from,to}_kernel_nofault 2021-08-20 11:39:25 +01:00
madvise.c mm: move anon_vma declarations to linux/mm_inline.h 2022-01-15 16:30:27 +02:00
Makefile mm: remove cleancache 2022-01-22 08:33:38 +02:00
mapping_dirty_helpers.c mm: move tlb_flush_pending inline helpers to mm_inline.h 2022-01-15 16:30:27 +02:00
memblock.c memblock: use kfree() to release kmalloced memblock regions 2022-02-20 08:45:39 +02:00
memcontrol.c mm: memcg: synchronize objcg lists with a dedicated spinlock 2022-02-11 17:55:00 -08:00
memfd.c mm,hugetlb: remove mlock ulimit for SHM_HUGETLB 2021-11-09 10:02:48 -08:00
memory_hotplug.c treewide: Add missing includes masked by cgroup -> bpf dependency 2021-12-03 10:58:13 -08:00
memory-failure.c memory-failure: fetch compound_head after pgmap_pfn_valid() 2022-01-30 09:56:58 +02:00
memory.c Merge branch 'akpm' (patches from Andrew) 2022-01-20 10:41:01 +02:00
mempolicy.c mm/mempolicy: fix all kernel-doc warnings 2022-01-15 16:30:30 +02:00
mempool.c mm: remove spurious blkdev.h includes 2021-10-18 06:17:01 -06:00
memremap.c Merge branch 'akpm' (patches from Andrew) 2022-01-15 20:37:06 +02:00
memtest.c
migrate.c mm/migrate.c: rework migration_entry_wait() to not take a pageref 2022-01-22 08:33:34 +02:00
mincore.c
mlock.c mm: add a field to store names for private anonymous memory 2022-01-15 16:30:27 +02:00
mm_init.c include/linux/page-flags-layout.h: cleanups 2021-04-30 11:20:42 -07:00
mmap_lock.c mm: mmap_lock: fix disabling preemption directly 2021-07-23 17:43:28 -07:00
mmap.c mm: fix use-after-free bug when mm->mmap is reused after being freed 2022-02-26 09:51:17 -08:00
mmu_gather.c mm: move tlb_flush_pending inline helpers to mm_inline.h 2022-01-15 16:30:27 +02:00
mmu_notifier.c mm/mmu_notifiers: ensure range_end() is paired with range_start() 2021-03-25 09:22:55 -07:00
mmzone.c
mprotect.c mm: don't try to NUMA-migrate COW pages that have other uses 2022-02-17 08:57:47 -08:00
mremap.c mm, hugepages: add mremap() support for hugepage backed vma 2021-11-06 13:30:39 -07:00
msync.c mm/msync: exit early when the flags is an MS_ASYNC and start < vm_start 2021-04-30 11:20:37 -07:00
nommu.c Merge branch 'akpm' (patches from Andrew) 2021-11-06 14:08:17 -07:00
oom_kill.c Merge branch 'signal-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2022-01-17 05:49:30 +02:00
page_alloc.c Merge branch 'akpm' (patches from Andrew) 2022-01-20 10:41:01 +02:00
page_counter.c mm/page_counter: remove an incorrect call to propagate_protected_usage() 2022-01-15 16:30:27 +02:00
page_ext.c mm: make some vars and functions static or __init 2022-01-15 16:30:31 +02:00
page_idle.c mm/idle_page_tracking: make PG_idle reusable 2021-09-08 11:50:24 -07:00
page_io.c delayacct: support swapin delay accounting for swapping without blkio 2022-01-20 08:52:55 +02:00
page_isolation.c Revert "mm/page_isolation: unset migratetype directly for non Buddy page" 2022-02-04 09:25:04 -08:00
page_owner.c lib/stackdepot: allow optional init and stack_table allocation by kvmalloc() 2022-01-22 08:33:37 +02:00
page_poison.c mm: page_poison: print page info when corruption is caught 2021-04-30 11:20:36 -07:00
page_reporting.c mm/page_reporting: allow driver to specify reporting order 2021-06-29 10:53:47 -07:00
page_reporting.h mm/page_reporting: export reporting order as module parameter 2021-06-29 10:53:47 -07:00
page_table_check.c mm/page_table_check: check entries at pmd levels 2022-02-04 09:25:04 -08:00
page_vma_mapped.c mm: device exclusive memory access 2021-07-01 11:06:03 -07:00
page-writeback.c mm/writeback: Improve __folio_mark_dirty() comment 2022-01-02 20:28:57 -05:00
pagewalk.c mm: pagewalk: fix walk for hugepage tables 2021-06-29 10:53:49 -07:00
percpu-internal.h mm: memcg/percpu: account extra objcg space to memory cgroups 2022-01-15 16:30:31 +02:00
percpu-km.c percpu: flush tlb in pcpu_reclaim_populated() 2021-07-04 18:30:17 +00:00
percpu-stats.c percpu: rework memcg accounting 2021-06-05 20:43:15 +00:00
percpu-vm.c percpu: flush tlb in pcpu_reclaim_populated() 2021-07-04 18:30:17 +00:00
percpu.c bitmap patches for 5.17-rc1 2022-01-23 06:20:44 +02:00
pgalloc-track.h mm: fix typos in comments 2021-05-07 00:26:35 -07:00
pgtable-generic.c mm: move tlb_flush_pending inline helpers to mm_inline.h 2022-01-15 16:30:27 +02:00
process_vm_access.c mm/process_vm_access.c: remove duplicate include 2021-05-05 11:27:27 -07:00
ptdump.c mm: ptdump: fix build failure 2021-04-16 16:10:37 -07:00
readahead.c readahead: Convert page_cache_ra_unbounded to folios 2022-01-04 13:15:34 -05:00
rmap.c mm/rmap: fix potential batched TLB flush race 2022-01-15 16:30:31 +02:00
rodata_test.c
secretmem.c mm/secretmem: avoid letting secretmem_users drop to zero 2021-10-28 17:18:55 -07:00
shmem.c mm: simplify try_to_unuse 2022-01-22 08:33:38 +02:00
shuffle.c mm: eliminate "expecting prototype" kernel-doc warnings 2021-04-16 16:10:36 -07:00
shuffle.h mm/shuffle: fix section mismatch warning 2021-05-22 15:09:07 -10:00
slab_common.c Merge branch 'akpm' (patches from Andrew) 2022-01-15 20:37:06 +02:00
slab.c mm/kasan: Convert to struct folio and struct slab 2022-01-06 12:26:14 +01:00
slab.h slab changes for 5.17 - part 2 2022-01-18 06:40:47 +02:00
slob.c mm/slob: Remove unnecessary page_mapcount_reset() function call 2022-01-06 12:27:28 +01:00
slub.c mm/slub: Define struct slab fields for CONFIG_SLUB_CPU_PARTIAL only when enabled 2022-01-06 12:26:53 +01:00
sparse-vmemmap.c mm: remove redundant smp_wmb() 2021-11-06 13:30:36 -07:00
sparse.c bootmem: Use page->index instead of page->freelist 2022-01-06 12:27:03 +01:00
swap_cgroup.c
swap_slots.c treewide: Add missing includes masked by cgroup -> bpf dependency 2021-12-03 10:58:13 -08:00
swap_state.c mm/workingset: Convert workingset_refault() to take a folio 2021-10-18 07:49:40 -04:00
swap.c Merge branch 'akpm' (patches from Andrew) 2022-01-15 20:37:06 +02:00
swapfile.c mm: mark swap_lock and swap_active_head static 2022-01-22 08:33:38 +02:00
truncate.c mm: remove cleancache 2022-01-22 08:33:38 +02:00
usercopy.c mm: Convert check_heap_object() to use struct slab 2022-01-06 12:25:51 +01:00
userfaultfd.c mm: shmem: don't truncate page if memory failure happens 2022-01-15 16:30:26 +02:00
util.c mm: Consider __GFP_NOWARN flag for oversized kvmalloc() calls 2022-03-04 10:00:37 -08:00
vmacache.c
vmalloc.c mm/vmalloc: be more explicit about supported gfp flags. 2022-01-15 16:30:28 +02:00
vmpressure.c mm/vmpressure: fix data-race with memcg->socket_pressure 2021-11-06 13:30:40 -07:00
vmscan.c mm: vmscan: remove deadlock due to throttling failing to make progress 2022-02-11 17:55:00 -08:00
vmstat.c mm/vmstat: add events for THP max_ptes_* exceeds 2022-01-15 16:30:29 +02:00
workingset.c Merge branch 'akpm' (patches from Andrew) 2021-11-09 10:11:53 -08:00
z3fold.c mm/z3fold: add kerneldoc fields for z3fold_pool 2021-07-01 11:06:03 -07:00
zbud.c mm/zbud: add kerneldoc fields for zbud_pool 2021-07-01 11:06:03 -07:00
zpool.c zpool: remove the list of pools_head 2022-01-15 16:30:31 +02:00
zsmalloc.c zsmalloc: replace get_cpu_var with local_lock 2022-01-22 08:33:37 +02:00
zswap.c frontswap: remove support for multiple ops 2022-01-22 08:33:38 +02:00