Define KASAN_SHADOW_OFFSET,KASAN_SHADOW_START and KASAN_SHADOW_END for
the Arm kernel address sanitizer. We are "stealing" lowmem (the 4GB
addressable by a 32bit architecture) out of the virtual address
space to use as shadow memory for KASan as follows:
+----+ 0xffffffff
| |
| | |-> Static kernel image (vmlinux) BSS and page table
| |/
+----+ PAGE_OFFSET
| |
| | |-> Loadable kernel modules virtual address space area
| |/
+----+ MODULES_VADDR = KASAN_SHADOW_END
| |
| | |-> The shadow area of kernel virtual address.
| |/
+----+-> TASK_SIZE (start of kernel space) = KASAN_SHADOW_START the
| | shadow address of MODULES_VADDR
| | |
| | |
| | |-> The user space area in lowmem. The kernel address
| | | sanitizer do not use this space, nor does it map it.
| | |
| | |
| | |
| | |
| |/
------ 0
0 .. TASK_SIZE is the memory that can be used by shared
userspace/kernelspace. It us used for userspace processes and for
passing parameters and memory buffers in system calls etc. We do not
need to shadow this area.
KASAN_SHADOW_START:
This value begins with the MODULE_VADDR's shadow address. It is the
start of kernel virtual space. Since we have modules to load, we need
to cover also that area with shadow memory so we can find memory
bugs in modules.
KASAN_SHADOW_END
This value is the 0x100000000's shadow address: the mapping that would
be after the end of the kernel memory at 0xffffffff. It is the end of
kernel address sanitizer shadow area. It is also the start of the
module area.
KASAN_SHADOW_OFFSET:
This value is used to map an address to the corresponding shadow
address by the following formula:
shadow_addr = (address >> 3) + KASAN_SHADOW_OFFSET;
As you would expect, >> 3 is equal to dividing by 8, meaning each
byte in the shadow memory covers 8 bytes of kernel memory, so one
bit shadow memory per byte of kernel memory is used.
The KASAN_SHADOW_OFFSET is provided in a Kconfig option depending
on the VMSPLIT layout of the system: the kernel and userspace can
split up lowmem in different ways according to needs, so we calculate
the shadow offset depending on this.
When kasan is enabled, the definition of TASK_SIZE is not an 8-bit
rotated constant, so we need to modify the TASK_SIZE access code in the
*.s file.
The kernel and modules may use different amounts of memory,
according to the VMSPLIT configuration, which in turn
determines the PAGE_OFFSET.
We use the following KASAN_SHADOW_OFFSETs depending on how the
virtual memory is split up:
- 0x1f000000 if we have 1G userspace / 3G kernelspace split:
- The kernel address space is 3G (0xc0000000)
- PAGE_OFFSET is then set to 0x40000000 so the kernel static
image (vmlinux) uses addresses 0x40000000 .. 0xffffffff
- On top of that we have the MODULES_VADDR which under
the worst case (using ARM instructions) is
PAGE_OFFSET - 16M (0x01000000) = 0x3f000000
so the modules use addresses 0x3f000000 .. 0x3fffffff
- So the addresses 0x3f000000 .. 0xffffffff need to be
covered with shadow memory. That is 0xc1000000 bytes
of memory.
- 1/8 of that is needed for its shadow memory, so
0x18200000 bytes of shadow memory is needed. We
"steal" that from the remaining lowmem.
- The KASAN_SHADOW_START becomes 0x26e00000, to
KASAN_SHADOW_END at 0x3effffff.
- Now we can calculate the KASAN_SHADOW_OFFSET for any
kernel address as 0x3f000000 needs to map to the first
byte of shadow memory and 0xffffffff needs to map to
the last byte of shadow memory. Since:
SHADOW_ADDR = (address >> 3) + KASAN_SHADOW_OFFSET
0x26e00000 = (0x3f000000 >> 3) + KASAN_SHADOW_OFFSET
KASAN_SHADOW_OFFSET = 0x26e00000 - (0x3f000000 >> 3)
KASAN_SHADOW_OFFSET = 0x26e00000 - 0x07e00000
KASAN_SHADOW_OFFSET = 0x1f000000
- 0x5f000000 if we have 2G userspace / 2G kernelspace split:
- The kernel space is 2G (0x80000000)
- PAGE_OFFSET is set to 0x80000000 so the kernel static
image uses 0x80000000 .. 0xffffffff.
- On top of that we have the MODULES_VADDR which under
the worst case (using ARM instructions) is
PAGE_OFFSET - 16M (0x01000000) = 0x7f000000
so the modules use addresses 0x7f000000 .. 0x7fffffff
- So the addresses 0x7f000000 .. 0xffffffff need to be
covered with shadow memory. That is 0x81000000 bytes
of memory.
- 1/8 of that is needed for its shadow memory, so
0x10200000 bytes of shadow memory is needed. We
"steal" that from the remaining lowmem.
- The KASAN_SHADOW_START becomes 0x6ee00000, to
KASAN_SHADOW_END at 0x7effffff.
- Now we can calculate the KASAN_SHADOW_OFFSET for any
kernel address as 0x7f000000 needs to map to the first
byte of shadow memory and 0xffffffff needs to map to
the last byte of shadow memory. Since:
SHADOW_ADDR = (address >> 3) + KASAN_SHADOW_OFFSET
0x6ee00000 = (0x7f000000 >> 3) + KASAN_SHADOW_OFFSET
KASAN_SHADOW_OFFSET = 0x6ee00000 - (0x7f000000 >> 3)
KASAN_SHADOW_OFFSET = 0x6ee00000 - 0x0fe00000
KASAN_SHADOW_OFFSET = 0x5f000000
- 0x9f000000 if we have 3G userspace / 1G kernelspace split,
and this is the default split for ARM:
- The kernel address space is 1GB (0x40000000)
- PAGE_OFFSET is set to 0xc0000000 so the kernel static
image uses 0xc0000000 .. 0xffffffff.
- On top of that we have the MODULES_VADDR which under
the worst case (using ARM instructions) is
PAGE_OFFSET - 16M (0x01000000) = 0xbf000000
so the modules use addresses 0xbf000000 .. 0xbfffffff
- So the addresses 0xbf000000 .. 0xffffffff need to be
covered with shadow memory. That is 0x41000000 bytes
of memory.
- 1/8 of that is needed for its shadow memory, so
0x08200000 bytes of shadow memory is needed. We
"steal" that from the remaining lowmem.
- The KASAN_SHADOW_START becomes 0xb6e00000, to
KASAN_SHADOW_END at 0xbfffffff.
- Now we can calculate the KASAN_SHADOW_OFFSET for any
kernel address as 0xbf000000 needs to map to the first
byte of shadow memory and 0xffffffff needs to map to
the last byte of shadow memory. Since:
SHADOW_ADDR = (address >> 3) + KASAN_SHADOW_OFFSET
0xb6e00000 = (0xbf000000 >> 3) + KASAN_SHADOW_OFFSET
KASAN_SHADOW_OFFSET = 0xb6e00000 - (0xbf000000 >> 3)
KASAN_SHADOW_OFFSET = 0xb6e00000 - 0x17e00000
KASAN_SHADOW_OFFSET = 0x9f000000
- 0x8f000000 if we have 3G userspace / 1G kernelspace with
full 1 GB low memory (VMSPLIT_3G_OPT):
- The kernel address space is 1GB (0x40000000)
- PAGE_OFFSET is set to 0xb0000000 so the kernel static
image uses 0xb0000000 .. 0xffffffff.
- On top of that we have the MODULES_VADDR which under
the worst case (using ARM instructions) is
PAGE_OFFSET - 16M (0x01000000) = 0xaf000000
so the modules use addresses 0xaf000000 .. 0xaffffff
- So the addresses 0xaf000000 .. 0xffffffff need to be
covered with shadow memory. That is 0x51000000 bytes
of memory.
- 1/8 of that is needed for its shadow memory, so
0x0a200000 bytes of shadow memory is needed. We
"steal" that from the remaining lowmem.
- The KASAN_SHADOW_START becomes 0xa4e00000, to
KASAN_SHADOW_END at 0xaeffffff.
- Now we can calculate the KASAN_SHADOW_OFFSET for any
kernel address as 0xaf000000 needs to map to the first
byte of shadow memory and 0xffffffff needs to map to
the last byte of shadow memory. Since:
SHADOW_ADDR = (address >> 3) + KASAN_SHADOW_OFFSET
0xa4e00000 = (0xaf000000 >> 3) + KASAN_SHADOW_OFFSET
KASAN_SHADOW_OFFSET = 0xa4e00000 - (0xaf000000 >> 3)
KASAN_SHADOW_OFFSET = 0xa4e00000 - 0x15e00000
KASAN_SHADOW_OFFSET = 0x8f000000
- The default value of 0xffffffff for KASAN_SHADOW_OFFSET
is an error value. We should always match one of the
above shadow offsets.
When we do this, TASK_SIZE will sometimes get a bit odd values
that will not fit into immediate mov assembly instructions.
To account for this, we need to rewrite some assembly using
TASK_SIZE like this:
- mov r1, #TASK_SIZE
+ ldr r1, =TASK_SIZE
or
- cmp r4, #TASK_SIZE
+ ldr r0, =TASK_SIZE
+ cmp r4, r0
this is done to avoid the immediate #TASK_SIZE that need to
fit into a limited number of bits.
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: kasan-dev@googlegroups.com
Cc: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Ard Biesheuvel <ardb@kernel.org> # QEMU/KVM/mach-virt/LPAE/8G
Tested-by: Florian Fainelli <f.fainelli@gmail.com> # Brahma SoCs
Tested-by: Ahmad Fatoum <a.fatoum@pengutronix.de> # i.MX6Q
Reported-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Abbott Liu <liuwenliang@huawei.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Disable instrumentation for arch/arm/boot/compressed/*
since that code is executed before the kernel has even
set up its mappings and definately out of scope for
KASan.
Disable instrumentation of arch/arm/vdso/* because that code
is not linked with the kernel image, so the KASan management
code would fail to link.
Disable instrumentation of arch/arm/mm/physaddr.c. See commit
ec6d06efb0 ("arm64: Add support for CONFIG_DEBUG_VIRTUAL")
for more details.
Disable kasan check in the function unwind_pop_register because
it does not matter that kasan checks failed when unwind_pop_register()
reads the stack memory of a task.
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: kasan-dev@googlegroups.com
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Ard Biesheuvel <ardb@kernel.org> # QEMU/KVM/mach-virt/LPAE/8G
Tested-by: Florian Fainelli <f.fainelli@gmail.com> # Brahma SoCs
Tested-by: Ahmad Fatoum <a.fatoum@pengutronix.de> # i.MX6Q
Reported-by: Florian Fainelli <f.fainelli@gmail.com>
Reported-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Abbott Liu <liuwenliang@huawei.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
On ARM, setting up the linear region is tricky, given the constraints
around placement and alignment of the memblocks, and how the kernel
itself as well as the DT are placed in physical memory.
Let's simplify matters a bit, by moving the device tree mapping to the
top of the address space, right between the end of the vmalloc region
and the start of the the fixmap region, and create a read-only mapping
for it that is independent of the size of the linear region, and how it
is organized.
Since this region was formerly used as a guard region, which will now be
populated fully on LPAE builds by this read-only mapping (which will
still be able to function as a guard region for stray writes), bump the
start of the [underutilized] fixmap region by 512 KB as well, to ensure
that there is always a proper guard region here. Doing so still leaves
ample room for the fixmap space, even with NR_CPUS set to its maximum
value of 32.
Tested-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Before moving the DT mapping out of the linear region, let's prepare
for this change by removing all the phys-to-virt translations of the
__atags_pointer variable, and perform this translation only once at
setup time.
Tested-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Pull dma-mapping updates from Christoph Hellwig:
- rework the non-coherent DMA allocator
- move private definitions out of <linux/dma-mapping.h>
- lower CMA_ALIGNMENT (Paul Cercueil)
- remove the omap1 dma address translation in favor of the common code
- make dma-direct aware of multiple dma offset ranges (Jim Quinlan)
- support per-node DMA CMA areas (Barry Song)
- increase the default seg boundary limit (Nicolin Chen)
- misc fixes (Robin Murphy, Thomas Tai, Xu Wang)
- various cleanups
* tag 'dma-mapping-5.10' of git://git.infradead.org/users/hch/dma-mapping: (63 commits)
ARM/ixp4xx: add a missing include of dma-map-ops.h
dma-direct: simplify the DMA_ATTR_NO_KERNEL_MAPPING handling
dma-direct: factor out a dma_direct_alloc_from_pool helper
dma-direct check for highmem pages in dma_direct_alloc_pages
dma-mapping: merge <linux/dma-noncoherent.h> into <linux/dma-map-ops.h>
dma-mapping: move large parts of <linux/dma-direct.h> to kernel/dma
dma-mapping: move dma-debug.h to kernel/dma/
dma-mapping: remove <asm/dma-contiguous.h>
dma-mapping: merge <linux/dma-contiguous.h> into <linux/dma-map-ops.h>
dma-contiguous: remove dma_contiguous_set_default
dma-contiguous: remove dev_set_cma_area
dma-contiguous: remove dma_declare_contiguous
dma-mapping: split <linux/dma-mapping.h>
cma: decrease CMA_ALIGNMENT lower limit to 2
firewire-ohci: use dma_alloc_pages
dma-iommu: implement ->alloc_noncoherent
dma-mapping: add new {alloc,free}_noncoherent dma_map_ops methods
dma-mapping: add a new dma_alloc_pages API
dma-mapping: remove dma_cache_sync
53c700: convert to dma_alloc_noncoherent
...
Merge dma-contiguous.h into dma-map-ops.h, after removing the comment
describing the contiguous allocator into kernel/dma/contigous.c.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Split out all the bits that are purely for dma_map_ops implementations
and related code into a new <linux/dma-map-ops.h> header so that they
don't get pulled into all the drivers. That also means the architecture
specific <asm/dma-mapping.h> is not pulled in by <linux/dma-mapping.h>
any more, which leads to a missing includes that were pulled in by the
x86 or arm versions in a few not overly portable drivers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
This API is the equivalent of alloc_pages, except that the returned memory
is guaranteed to be DMA addressable by the passed in device. The
implementation will also be used to provide a more sensible replacement
for DMA_ATTR_NON_CONSISTENT flag.
Additionally dma_alloc_noncoherent is switched over to use dma_alloc_pages
as its backend.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> (MIPS part)
The L310_PREFETCH_CTRL register bits 28 and 29 to enable data and
instruction prefetch respectively can also be accessed via the
L2X0_AUX_CTRL register. They appear to be actually wired together in
hardware between the registers. Changing them in the prefetch
register only will get undone when restoring the aux control register
later on. For this reason, set these bits in both registers during
initialisation according to the devicetree property values.
Link: https://lore.kernel.org/lkml/76f2f3ad5e77e356e0a5b99ceee1e774a2842c25.1597061474.git.guillaume.tucker@collabora.com/
Fixes: ec3bd0e68a ("ARM: 8391/1: l2c: add options to overwrite prefetching behavior")
Signed-off-by: Guillaume Tucker <guillaume.tucker@collabora.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
After removal of CONFIG_HAVE_MEMBLOCK_NODE_MAP we have two equivalent
functions that call memory_present() for each region in memblock.memory:
sparse_memory_present_with_active_regions() and membocks_present().
Moreover, all architectures have a call to either of these functions
preceding the call to sparse_init() and in the most cases they are called
one after the other.
Mark the regions from memblock.memory as present during sparce_init() by
making sparse_init() call memblocks_present(), make memblocks_present()
and memory_present() functions static and remove redundant
sparse_memory_present_with_active_regions() function.
Also remove no longer required HAVE_MEMORY_PRESENT configuration option.
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20200712083130.22919-1-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "mm: cleanup usage of <asm/pgalloc.h>"
Most architectures have very similar versions of pXd_alloc_one() and
pXd_free_one() for intermediate levels of page table. These patches add
generic versions of these functions in <asm-generic/pgalloc.h> and enable
use of the generic functions where appropriate.
In addition, functions declared and defined in <asm/pgalloc.h> headers are
used mostly by core mm and early mm initialization in arch and there is no
actual reason to have the <asm/pgalloc.h> included all over the place.
The first patch in this series removes unneeded includes of
<asm/pgalloc.h>
In the end it didn't work out as neatly as I hoped and moving
pXd_alloc_track() definitions to <asm-generic/pgalloc.h> would require
unnecessary changes to arches that have custom page table allocations, so
I've decided to move lib/ioremap.c to mm/ and make pgalloc-track.h local
to mm/.
This patch (of 8):
In most cases <asm/pgalloc.h> header is required only for allocations of
page table memory. Most of the .c files that include that header do not
use symbols declared in <asm/pgalloc.h> and do not require that header.
As for the other header files that used to include <asm/pgalloc.h>, it is
possible to move that include into the .c file that actually uses symbols
from <asm/pgalloc.h> and drop the include from the header file.
The process was somewhat automated using
sed -i -E '/[<"]asm\/pgalloc\.h/d' \
$(grep -L -w -f /tmp/xx \
$(git grep -E -l '[<"]asm/pgalloc\.h'))
where /tmp/xx contains all the symbols defined in
arch/*/include/asm/pgalloc.h.
[rppt@linux.ibm.com: fix powerpc warning]
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Matthew Wilcox <willy@infradead.org>
Link: http://lkml.kernel.org/r/20200627143453.31835-1-rppt@kernel.org
Link: http://lkml.kernel.org/r/20200627143453.31835-2-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull ARM updates from Russell King:
- add arch/arm/Kbuild from Masahiro Yamada.
- simplify act_mm macro, since it contains an open-coded
get_thread_info.
- VFP updates for Clang from Stefan Agner.
- Fix unwinder for Clang from Nathan Huckleberry.
- Remove unused it8152 PCI host controller, used by the removed cm-x2xx
platforms from Mike Rapoport.
- Further explanation of __range_ok().
- Remove kimage_voffset that isn't used anymore from Marc Zyngier.
- Drop ancient Thumb-2 workaround for old binutils from Ard Biesheuvel.
- Documentation cleanup for mach-* from Pete Zaitcev.
* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 8996/1: Documentation/Clean up the description of mach-<class>
ARM: 8995/1: drop Thumb-2 workaround for ancient binutils
ARM: 8994/1: mm: drop kimage_voffset which was only used by KVM
ARM: uaccess: add further explanation of __range_ok()
ARM: 8993/1: remove it8152 PCI controller driver
ARM: 8992/1: Fix unwind_frame for clang-built kernels
ARM: 8991/1: use VFP assembler mnemonics if available
ARM: 8990/1: use VFP assembler mnemonics in register load/store macros
ARM: 8989/1: use .fpu assembler directives instead of assembler arguments
ARM: 8982/1: mm: Simplify act_mm macro
ARM: 8981/1: add arch/arm/Kbuild
Now that KVM support has been removed from the 32-bit ARM port,
drop the export kimage_voffset symbol, which no longer has any
users.
Acked-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Commit
84e6ffb2c4 ("arm: add support for folded p4d page tables")
updated create_mapping_late() to take folded P4Ds into account when
creating mappings, but inverted the p4d_alloc() failure test, resulting
in no mapping to be created at all.
When the EFI rtc driver subsequently tries to invoke the EFI GetTime()
service, the memory regions covering the EFI data structures are missing
from the page tables, resulting in a crash like
Unable to handle kernel paging request at virtual address 5ae0cf28
pgd = (ptrval)
[5ae0cf28] *pgd=80000040205003, *pmd=00000000
Internal error: Oops: 207 [#1] SMP THUMB2
Modules linked in:
CPU: 0 PID: 7 Comm: kworker/u32:0 Not tainted 5.7.0+ #92
Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
Workqueue: efi_rts_wq efi_call_rts
PC is at efi_call_rts+0x94/0x294
LR is at efi_call_rts+0x83/0x294
pc : [<c0b4f098>] lr : [<c0b4f087>] psr: 30000033
sp : e6219ef0 ip : 00000000 fp : ffffe000
r10: 00000000 r9 : 00000000 r8 : 30000013
r7 : e6201dd0 r6 : e6201ddc r5 : 00000000 r4 : c181f264
r3 : 5ae0cf10 r2 : 00000001 r1 : e6201dd0 r0 : e6201ddc
Flags: nzCV IRQs on FIQs on Mode SVC_32 ISA Thumb Segment none
Control: 70c5383d Table: 661cc840 DAC: 00000001
Process kworker/u32:0 (pid: 7, stack limit = 0x(ptrval))
...
[<c0b4f098>] (efi_call_rts) from [<c0448219>] (process_one_work+0x16d/0x3d8)
[<c0448219>] (process_one_work) from [<c0448581>] (worker_thread+0xfd/0x408)
[<c0448581>] (worker_thread) from [<c044ca7b>] (kthread+0x103/0x104)
...
Fixes: 84e6ffb2c4 ("arm: add support for folded p4d page tables")
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Better describe what this helper does, and match the naming of
copy_from_kernel_nofault.
Also switch the argument order around, so that it acts and looks
like get_user().
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The powerpc 32-bit implementation of pgtable has nice shortcuts for
accessing kernel PMD and PTE for a given virtual address. Make these
helpers available for all architectures.
[rppt@linux.ibm.com: microblaze: fix page table traversal in setup_rt_frame()]
Link: http://lkml.kernel.org/r/20200518191511.GD1118872@kernel.org
[akpm@linux-foundation.org: s/pmd_ptr_k/pmd_off_k/ in various powerpc places]
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200514170327.31389-9-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull ARM SoC updates from Arnd Bergmann:
"One new platform gets added, the Realtek RTD1195, which is an older
Cortex-a7 based relative of the RTD12xx chips that are already
supported in arch/arm64. The platform may also be extended to support
running 32-bit kernels on those 64-bit chips for memory-constrained
machines.
In the Renesas shmobile platform, we gain support for "RZ/G1H" or
R8A7742, an eight-core chip based on Cortex-A15 and Cortex-A7 cores,
originally released in 2016 as one of the last high-end 32-bit
designs.
There is ongoing cleanup for the integrator, tegra, imx, and omap2
platforms, with integrator getting very close to the goal of having
zero code in arch/arm/, and omap2 moving more of the chip specifics
from old board code into device tree files.
The Versatile Express platform is made more modular, with built-in
drivers now becoming loadable modules. This is part of a greater
effort for the Android OS to have a common kernel binary for all
platforms and any platform specific code in loadable modules.
The PXA platform drops support for Compulab's pxa2xx boards that had
rather unusual flash and PCI drivers but no known users remaining. All
device drivers specific to those boards can now get removed as well.
Across platforms, there is ongoing cleanup, with Geert and Rob
revisiting some a lot of Kconfig options"
* tag 'arm-soc-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (94 commits)
ARM: omap2: fix omap5_realtime_timer_init definition
ARM: zynq: Don't select CONFIG_ICST
ARM: OMAP2+: Fix regression for using local timer on non-SMP SoCs
clk: versatile: Fix kconfig dependency on COMMON_CLK_VERSATILE
ARM: davinci: fix build failure without I2C
power: reset: vexpress: fix build issue
power: vexpress: cleanup: use builtin_platform_driver
power: vexpress: add suppress_bind_attrs to true
Revert "ARM: vexpress: Don't select VEXPRESS_CONFIG"
MAINTAINERS: pxa: remove Compulab arm/pxa support
ARM: pxa: remove Compulab pxa2xx boards
bus: arm-integrator-lm: Fix return value check in integrator_ap_lm_probe()
soc: imx: move cpu code to drivers/soc/imx
ARM: imx: move cpu definitions into a header
ARM: imx: use device_initcall for imx_soc_device_init
ARM: imx: pcm037: make pcm970_sja1000_platform_data static
bus: ti-sysc: Timers no longer need legacy quirk handling
ARM: OMAP2+: Drop old timer code for dmtimer and 32k counter
ARM: dts: Configure system timers for omap2
ARM: dts: Configure system timers for ti81xx
...
Every single architecture (including !CONFIG_HIGHMEM) calls...
pagefault_enable();
preempt_enable();
... before returning from __kunmap_atomic(). Lift this code into the
kunmap_atomic() macro.
While we are at it rename __kunmap_atomic() to kunmap_atomic_high() to
be consistent.
[ira.weiny@intel.com: don't enable pagefault/preempt twice]
Link: http://lkml.kernel.org/r/20200518184843.3029640-1-ira.weiny@intel.com
[akpm@linux-foundation.org: coding style fixes]
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian König <christian.koenig@amd.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Guenter Roeck <linux@roeck-us.net>
Link: http://lkml.kernel.org/r/20200507150004.1423069-8-ira.weiny@intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Every arch has the same code to ensure atomic operations and a check for
!HIGHMEM page.
Remove the duplicate code by defining a core kmap_atomic() which only
calls the arch specific kmap_atomic_high() when the page is high memory.
[akpm@linux-foundation.org: coding style fixes]
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian König <christian.koenig@amd.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20200507150004.1423069-7-ira.weiny@intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
All architectures do exactly the same thing for kunmap(); remove all the
duplicate definitions and lift the call to the core.
This also has the benefit of changing kmap_unmap() on a number of
architectures to be an inline call rather than an actual function.
[akpm@linux-foundation.org: fix CONFIG_HIGHMEM=n build on various architectures]
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian König <christian.koenig@amd.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20200507150004.1423069-5-ira.weiny@intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Recent work with KASan exposed the folling hard-coded bitmask
in arch/arm/mm/proc-macros.S:
bic rd, sp, #8128
bic rd, rd, #63
This forms the bitmask 0x1FFF that is coinciding with
(PAGE_SIZE << THREAD_SIZE_ORDER) - 1, this code was assuming
that THREAD_SIZE is always 8K (8192).
As KASan was increasing THREAD_SIZE_ORDER to 2, I ran into
this bug.
Fix it by this little oneline suggested by Ard:
bic rd, sp, #(THREAD_SIZE - 1) & ~63
Where THREAD_SIZE is defined using THREAD_SIZE_ORDER.
We have to also include <linux/const.h> since the THREAD_SIZE
expands to use the _AC() macro.
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Suggested-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
We would be trying to print the kernel virtual address of the base
register address which is not very useful and is not displayed by
default because of pointer restriction. Print the Device Tree node name
instead which is what was originally intended.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Pull dma-mapping updates from Christoph Hellwig:
- fix an integer overflow in the coherent pool (Kevin Grandemange)
- provide support for in-place uncached remapping and use that for
openrisc
- fix the arm coherent allocator to take the bus limit into account
* tag 'dma-mapping-5.7' of git://git.infradead.org/users/hch/dma-mapping:
ARM/dma-mapping: merge __dma_supported into arm_dma_supported
ARM/dma-mapping: take the bus limit into account in __dma_alloc
ARM/dma-mapping: remove get_coherent_dma_mask
openrisc: use the generic in-place uncached DMA allocator
dma-direct: provide a arch_dma_clear_uncached hook
dma-direct: make uncached_kernel_address more general
dma-direct: consolidate the error handling in dma_direct_alloc_pages
dma-direct: remove the cached_kernel_address hook
dma-coherent: fix integer overflow in the reserved-memory dma allocation
Pull kvm updates from Paolo Bonzini:
"ARM:
- GICv4.1 support
- 32bit host removal
PPC:
- secure (encrypted) using under the Protected Execution Framework
ultravisor
s390:
- allow disabling GISA (hardware interrupt injection) and protected
VMs/ultravisor support.
x86:
- New dirty bitmap flag that sets all bits in the bitmap when dirty
page logging is enabled; this is faster because it doesn't require
bulk modification of the page tables.
- Initial work on making nested SVM event injection more similar to
VMX, and less buggy.
- Various cleanups to MMU code (though the big ones and related
optimizations were delayed to 5.8). Instead of using cr3 in
function names which occasionally means eptp, KVM too has
standardized on "pgd".
- A large refactoring of CPUID features, which now use an array that
parallels the core x86_features.
- Some removal of pointer chasing from kvm_x86_ops, which will also
be switched to static calls as soon as they are available.
- New Tigerlake CPUID features.
- More bugfixes, optimizations and cleanups.
Generic:
- selftests: cleanups, new MMU notifier stress test, steal-time test
- CSV output for kvm_stat"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (277 commits)
x86/kvm: fix a missing-prototypes "vmread_error"
KVM: x86: Fix BUILD_BUG() in __cpuid_entry_get_reg() w/ CONFIG_UBSAN=y
KVM: VMX: Add a trampoline to fix VMREAD error handling
KVM: SVM: Annotate svm_x86_ops as __initdata
KVM: VMX: Annotate vmx_x86_ops as __initdata
KVM: x86: Drop __exit from kvm_x86_ops' hardware_unsetup()
KVM: x86: Copy kvm_x86_ops by value to eliminate layer of indirection
KVM: x86: Set kvm_x86_ops only after ->hardware_setup() completes
KVM: VMX: Configure runtime hooks using vmx_x86_ops
KVM: VMX: Move hardware_setup() definition below vmx_x86_ops
KVM: x86: Move init-only kvm_x86_ops to separate struct
KVM: Pass kvm_init()'s opaque param to additional arch funcs
s390/gmap: return proper error code on ksm unsharing
KVM: selftests: Fix cosmetic copy-paste error in vm_mem_region_move()
KVM: Fix out of range accesses to memslots
KVM: X86: Micro-optimize IPI fastpath delay
KVM: X86: Delay read msr data iff writes ICR MSR
KVM: PPC: Book3S HV: Add a capability for enabling secure guests
KVM: arm64: GICv4.1: Expose HW-based SGIs in debugfs
KVM: arm64: GICv4.1: Allow non-trapping WFI when using HW SGIs
...
The idea comes from a discussion between Linus and Andrea [1].
Before this patch we only allow a page fault to retry once. We achieved
this by clearing the FAULT_FLAG_ALLOW_RETRY flag when doing
handle_mm_fault() the second time. This was majorly used to avoid
unexpected starvation of the system by looping over forever to handle the
page fault on a single page. However that should hardly happen, and after
all for each code path to return a VM_FAULT_RETRY we'll first wait for a
condition (during which time we should possibly yield the cpu) to happen
before VM_FAULT_RETRY is really returned.
This patch removes the restriction by keeping the FAULT_FLAG_ALLOW_RETRY
flag when we receive VM_FAULT_RETRY. It means that the page fault handler
now can retry the page fault for multiple times if necessary without the
need to generate another page fault event. Meanwhile we still keep the
FAULT_FLAG_TRIED flag so page fault handler can still identify whether a
page fault is the first attempt or not.
Then we'll have these combinations of fault flags (only considering
ALLOW_RETRY flag and TRIED flag):
- ALLOW_RETRY and !TRIED: this means the page fault allows to
retry, and this is the first try
- ALLOW_RETRY and TRIED: this means the page fault allows to
retry, and this is not the first try
- !ALLOW_RETRY and !TRIED: this means the page fault does not allow
to retry at all
- !ALLOW_RETRY and TRIED: this is forbidden and should never be used
In existing code we have multiple places that has taken special care of
the first condition above by checking against (fault_flags &
FAULT_FLAG_ALLOW_RETRY). This patch introduces a simple helper to detect
the first retry of a page fault by checking against both (fault_flags &
FAULT_FLAG_ALLOW_RETRY) and !(fault_flag & FAULT_FLAG_TRIED) because now
even the 2nd try will have the ALLOW_RETRY set, then use that helper in
all existing special paths. One example is in __lock_page_or_retry(), now
we'll drop the mmap_sem only in the first attempt of page fault and we'll
keep it in follow up retries, so old locking behavior will be retained.
This will be a nice enhancement for current code [2] at the same time a
supporting material for the future userfaultfd-writeprotect work, since in
that work there will always be an explicit userfault writeprotect retry
for protected pages, and if that cannot resolve the page fault (e.g., when
userfaultfd-writeprotect is used in conjunction with swapped pages) then
we'll possibly need a 3rd retry of the page fault. It might also benefit
other potential users who will have similar requirement like userfault
write-protection.
GUP code is not touched yet and will be covered in follow up patch.
Please read the thread below for more information.
[1] https://lore.kernel.org/lkml/20171102193644.GB22686@redhat.com/
[2] https://lore.kernel.org/lkml/20181230154648.GB9832@redhat.com/
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Suggested-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Brian Geffon <bgeffon@google.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Link: http://lkml.kernel.org/r/20200220160246.9790-1-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>