Commit Graph

386 Commits

Author SHA1 Message Date
Linus Torvalds
e831101a73 arm64 updates for 4.8:
- Kexec support for arm64
 - Kprobes support
 - Expose MIDR_EL1 and REVIDR_EL1 CPU identification registers to sysfs
 - Trapping of user space cache maintenance operations and emulation in
   the kernel (CPU errata workaround)
 - Clean-up of the early page tables creation (kernel linear mapping, EFI
   run-time maps) to avoid splitting larger blocks (e.g. pmds) into
   smaller ones (e.g. ptes)
 - VDSO support for CLOCK_MONOTONIC_RAW in clock_gettime()
 - ARCH_HAS_KCOV enabled for arm64
 - Optimise IP checksum helpers
 - SWIOTLB optimisation to only allocate/initialise the buffer if the
   available RAM is beyond the 32-bit mask
 - Properly handle the "nosmp" command line argument
 - Fix for the initialisation of the CPU debug state during early boot
 - vdso-offsets.h build dependency workaround
 - Build fix when RANDOMIZE_BASE is enabled with MODULES off
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXmF/UAAoJEGvWsS0AyF7x+jwP/2fErtX6FTXmdG0c3HBkTpuy
 gEuzN2ByWbP6Io+unLC6NvbQQb1q6c73PTqjsoeMHUx2o8YK3jgWEBcC+7AuepoZ
 YGl3r08e75a/fGrgNwEQQC1lNlgjpog4kzVDh5ji6oRXNq+OkjJGUtRPe3gBoqxv
 NAjviciID/MegQaq4SaMd26AmnjuUGKogo5vlIaXK0SemX9it+ytW7eLAXuVY+gW
 EvO3Nxk0Y5oZKJF8qRw6oLSmw1bwn2dD26OgfXfCiI30QBookRyWIoXRedUOZmJq
 D0+Tipd7muO4PbjlxS8aY/wd/alfnM5+TJ6HpGDo+Y1BDauXfiXMf3ktDFE5QvJB
 KgtICmC0stWwbDT35dHvz8sETsrCMA2Q/IMrnyxG+nj9BxVQU7rbNrxfCXesJy7Q
 4EsQbcTyJwu+ECildBezfoei99XbFZyWk2vKSkTCFKzgwXpftGFaffgZ3DIzBAHH
 IjecDqIFENC8ymrjyAgrGjeFG+2WB/DBgoSS3Baiz6xwQqC4wFMnI3jPECtJjb/U
 6e13f+onXu5lF1YFKAiRjGmqa/G1ZMr+uKZFsembuGqsZdAPkzzUHyAE9g4JVO8p
 t3gc3/M3T7oLSHuw4xi1/Ow5VGb2UvbslFrp7OpuFZ7CJAvhKlHL5rPe385utsFE
 7++5WHXHAegeJCDNAKY2
 =iJOY
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Catalin Marinas:

 - Kexec support for arm64

 - Kprobes support

 - Expose MIDR_EL1 and REVIDR_EL1 CPU identification registers to sysfs

 - Trapping of user space cache maintenance operations and emulation in
   the kernel (CPU errata workaround)

 - Clean-up of the early page tables creation (kernel linear mapping,
   EFI run-time maps) to avoid splitting larger blocks (e.g.  pmds) into
   smaller ones (e.g.  ptes)

 - VDSO support for CLOCK_MONOTONIC_RAW in clock_gettime()

 - ARCH_HAS_KCOV enabled for arm64

 - Optimise IP checksum helpers

 - SWIOTLB optimisation to only allocate/initialise the buffer if the
   available RAM is beyond the 32-bit mask

 - Properly handle the "nosmp" command line argument

 - Fix for the initialisation of the CPU debug state during early boot

 - vdso-offsets.h build dependency workaround

 - Build fix when RANDOMIZE_BASE is enabled with MODULES off

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (64 commits)
  arm64: arm: Fix-up the removal of the arm64 regs_query_register_name() prototype
  arm64: Only select ARM64_MODULE_PLTS if MODULES=y
  arm64: mm: run pgtable_page_ctor() on non-swapper translation table pages
  arm64: mm: make create_mapping_late() non-allocating
  arm64: Honor nosmp kernel command line option
  arm64: Fix incorrect per-cpu usage for boot CPU
  arm64: kprobes: Add KASAN instrumentation around stack accesses
  arm64: kprobes: Cleanup jprobe_return
  arm64: kprobes: Fix overflow when saving stack
  arm64: kprobes: WARN if attempting to step with PSTATE.D=1
  arm64: debug: remove unused local_dbg_{enable, disable} macros
  arm64: debug: remove redundant spsr manipulation
  arm64: debug: unmask PSTATE.D earlier
  arm64: localise Image objcopy flags
  arm64: ptrace: remove extra define for CPSR's E bit
  kprobes: Add arm64 case in kprobe example module
  arm64: Add kernel return probes support (kretprobes)
  arm64: Add trampoline code for kretprobes
  arm64: kprobes instruction simulation support
  arm64: Treat all entry code as non-kprobe-able
  ...
2016-07-27 11:16:05 -07:00
Linus Torvalds
0e06f5c0de Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton:

 - a few misc bits

 - ocfs2

 - most(?) of MM

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (125 commits)
  thp: fix comments of __pmd_trans_huge_lock()
  cgroup: remove unnecessary 0 check from css_from_id()
  cgroup: fix idr leak for the first cgroup root
  mm: memcontrol: fix documentation for compound parameter
  mm: memcontrol: remove BUG_ON in uncharge_list
  mm: fix build warnings in <linux/compaction.h>
  mm, thp: convert from optimistic swapin collapsing to conservative
  mm, thp: fix comment inconsistency for swapin readahead functions
  thp: update Documentation/{vm/transhuge,filesystems/proc}.txt
  shmem: split huge pages beyond i_size under memory pressure
  thp: introduce CONFIG_TRANSPARENT_HUGE_PAGECACHE
  khugepaged: add support of collapse for tmpfs/shmem pages
  shmem: make shmem_inode_info::lock irq-safe
  khugepaged: move up_read(mmap_sem) out of khugepaged_alloc_page()
  thp: extract khugepaged from mm/huge_memory.c
  shmem, thp: respect MADV_{NO,}HUGEPAGE for file mappings
  shmem: add huge pages support
  shmem: get_unmapped_area align huge page
  shmem: prepare huge= mount option and sysfs knob
  mm, rmap: account shmem thp pages
  ...
2016-07-26 19:55:54 -07:00
Kirill A. Shutemov
dcddffd41d mm: do not pass mm_struct into handle_mm_fault
We always have vma->vm_mm around.

Link: http://lkml.kernel.org/r/1466021202-61880-8-git-send-email-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-26 16:19:19 -07:00
Ard Biesheuvel
1378dc3d4b arm64: mm: run pgtable_page_ctor() on non-swapper translation table pages
The kernel page table creation routines are accessible to other subsystems
(e.g., EFI) via the create_pgd_mapping() entry point, which allows mappings
to be created that are not covered by init_mm.

Since generic code such as apply_to_page_range() may expect translation
table pages that are not associated with init_mm to be covered by fully
constructed struct pages, add a call to pgtable_page_ctor() in the alloc
function used by create_pgd_mapping. Since it is no longer used by
create_mapping_late(), also update the name of this function to better
reflect its purpose.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Laura Abbott <labbott@redhat.com>
Tested-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-07-25 17:43:36 +01:00
Ard Biesheuvel
258a1605c7 arm64: mm: make create_mapping_late() non-allocating
The only purpose served by create_mapping_late() is to remap the already
mapped .text and .rodata kernel segments with read-only permissions. Since
we no longer allow block mappings to be split or merged,
create_mapping_late() should not pass an allocation function pointer into
__create_pgd_mapping(). So pass NULL instead.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Laura Abbott <labbott@redhat.com>
Tested-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-07-25 17:42:53 +01:00
Rafael J. Wysocki
d85f4eb699 Merge branch 'acpi-numa'
* acpi-numa:
  ACPI / NUMA: Enable ACPI based NUMA on ARM64
  arm64, ACPI, NUMA: NUMA support based on SRAT and SLIT
  ACPI / processor: Add acpi_map_madt_entry()
  ACPI / NUMA: Improve SRAT error detection and add messages
  ACPI / NUMA: Move acpi_numa_memory_affinity_init() to drivers/acpi/numa.c
  ACPI / NUMA: remove unneeded acpi_numa=1
  ACPI / NUMA: move bad_srat() and srat_disabled() to drivers/acpi/numa.c
  x86 / ACPI / NUMA: cleanup acpi_numa_processor_affinity_init()
  arm64, NUMA: Cleanup NUMA disabled messages
  arm64, NUMA: rework numa_add_memblk()
  ACPI / NUMA: move acpi_numa_slit_init() to drivers/acpi/numa.c
  ACPI / NUMA: Move acpi_numa_arch_fixup() to ia64 only
  ACPI / NUMA: remove duplicate NULL check
  ACPI / NUMA: Replace ACPI_DEBUG_PRINT() with pr_debug()
  ACPI / NUMA: Use pr_fmt() instead of printk
2016-07-25 13:40:39 +02:00
Catalin Marinas
a95b0644b3 Merge branch 'for-next/kprobes' into for-next/core
* kprobes:
  arm64: kprobes: Add KASAN instrumentation around stack accesses
  arm64: kprobes: Cleanup jprobe_return
  arm64: kprobes: Fix overflow when saving stack
  arm64: kprobes: WARN if attempting to step with PSTATE.D=1
  kprobes: Add arm64 case in kprobe example module
  arm64: Add kernel return probes support (kretprobes)
  arm64: Add trampoline code for kretprobes
  arm64: kprobes instruction simulation support
  arm64: Treat all entry code as non-kprobe-able
  arm64: Blacklist non-kprobe-able symbol
  arm64: Kprobes with single stepping support
  arm64: add conditional instruction simulation support
  arm64: Add more test functions to insn.c
  arm64: Add HAVE_REGS_AND_STACK_ACCESS_API feature
2016-07-21 18:20:41 +01:00
Will Deacon
2ce39ad151 arm64: debug: unmask PSTATE.D earlier
Clearing PSTATE.D is one of the requirements for generating a debug
exception. The arm64 booting protocol requires that PSTATE.D is set,
since many of the debug registers (for example, the hw_breakpoint
registers) are UNKNOWN out of reset and could potentially generate
spurious, fatal debug exceptions in early boot code if PSTATE.D was
clear. Once the debug registers have been safely initialised, PSTATE.D
is cleared, however this is currently broken for two reasons:

(1) The boot CPU clears PSTATE.D in a postcore_initcall and secondary
    CPUs clear PSTATE.D in secondary_start_kernel. Since the initcall
    runs after SMP (and the scheduler) have been initialised, there is
    no guarantee that it is actually running on the boot CPU. In this
    case, the boot CPU is left with PSTATE.D set and is not capable of
    generating debug exceptions.

(2) In a preemptible kernel, we may explicitly schedule on the IRQ
    return path to EL1. If an IRQ occurs with PSTATE.D set in the idle
    thread, then we may schedule the kthread_init thread, run the
    postcore_initcall to clear PSTATE.D and then context switch back
    to the idle thread before returning from the IRQ. The exception
    return path will then restore PSTATE.D from the stack, and set it
    again.

This patch fixes the problem by moving the clearing of PSTATE.D earlier
to proc.S. This has the desirable effect of clearing it in one place for
all CPUs, long before we have to worry about the scheduler or any
exception handling. We ensure that the previous reset of MDSCR_EL1 has
completed before unmasking the exception, so that any spurious
exceptions resulting from UNKNOWN debug registers are not generated.

Without this patch applied, the kprobes selftests have been seen to fail
under KVM, where we end up attempting to step the OOL instruction buffer
with PSTATE.D set and therefore fail to complete the step.

Cc: <stable@vger.kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-07-19 16:56:46 +01:00
Sandeepa Prabhu
2dd0e8d2d2 arm64: Kprobes with single stepping support
Add support for basic kernel probes(kprobes) and jump probes
(jprobes) for ARM64.

Kprobes utilizes software breakpoint and single step debug
exceptions supported on ARM v8.

A software breakpoint is placed at the probe address to trap the
kernel execution into the kprobe handler.

ARM v8 supports enabling single stepping before the break exception
return (ERET), with next PC in exception return address (ELR_EL1). The
kprobe handler prepares an executable memory slot for out-of-line
execution with a copy of the original instruction being probed, and
enables single stepping. The PC is set to the out-of-line slot address
before the ERET. With this scheme, the instruction is executed with the
exact same register context except for the PC (and DAIF) registers.

Debug mask (PSTATE.D) is enabled only when single stepping a recursive
kprobe, e.g.: during kprobes reenter so that probed instruction can be
single stepped within the kprobe handler -exception- context.
The recursion depth of kprobe is always 2, i.e. upon probe re-entry,
any further re-entry is prevented by not calling handlers and the case
counted as a missed kprobe).

Single stepping from the x-o-l slot has a drawback for PC-relative accesses
like branching and symbolic literals access as the offset from the new PC
(slot address) may not be ensured to fit in the immediate value of
the opcode. Such instructions need simulation, so reject
probing them.

Instructions generating exceptions or cpu mode change are rejected
for probing.

Exclusive load/store instructions are rejected too.  Additionally, the
code is checked to see if it is inside an exclusive load/store sequence
(code from Pratyush).

System instructions are mostly enabled for stepping, except MSR/MRS
accesses to "DAIF" flags in PSTATE, which are not safe for
probing.

This also changes arch/arm64/include/asm/ptrace.h to use
include/asm-generic/ptrace.h.

Thanks to Steve Capper and Pratyush Anand for several suggested
Changes.

Signed-off-by: Sandeepa Prabhu <sandeepa.s.prabhu@gmail.com>
Signed-off-by: David A. Long <dave.long@linaro.org>
Signed-off-by: Pratyush Anand <panand@redhat.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-07-19 15:03:20 +01:00
Lorenzo Pieralisi
16c11325cc arm64: mm: change IOMMU notifier action to attach DMA ops
Current bus notifier in ARM64 (__iommu_attach_notifier)
attempts to attach dma_ops to a device on BUS_NOTIFY_ADD_DEVICE
action notification.

This will cause issues on ACPI based systems, where PCI devices
can be added before the IOMMUs the devices are attached to
had a chance to be probed, causing failures on attempts to
attach dma_ops in that the domain for the respective IOMMU
may not be set-up yet by the time the bus notifier is run.

Devices dma_ops do not require to be set-up till the matching
device drivers are probed. This means that instead of running
the notifier attaching dma_ops to devices (__iommu_attach_notifier)
on BUS_NOTIFY_ADD_DEVICE action, it can be run just before the
device driver is bound to the device in question (on action
BUS_NOTIFY_BIND_DRIVER) so that it is certain that its IOMMU
group and domain are set-up accordingly at the time the
notifier is triggered.

This patch changes the notifier action upon which dma_ops
are attached to devices and defer it to driver binding time,
so that IOMMU devices have a chance to be probed and to register
their bus notifiers before the dma_ops attach sequence for a
device is actually carried out.

As a result we also no longer need worry about racing with
iommu_bus_notifier(), or about retrying the queue in case devices
were added too early on DT-based systems, so clean up the notifier
itself plus the additional workaround from 722ec35f7f ("arm64:
dma-mapping: fix handling of devices registered before arch_initcall")

Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
[rm: get rid of other now-redundant bits]
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-07-08 18:06:04 +01:00
James Morse
e19a6ee246 arm64: kernel: Save and restore UAO and addr_limit on exception entry
If we take an exception while at EL1, the exception handler inherits
the original context's addr_limit and PSTATE.UAO values. To be consistent
always reset addr_limit and PSTATE.UAO on (re-)entry to EL1. This
prevents accidental re-use of the original context's addr_limit.

Based on a similar patch for arm from Russell King.

Cc: <stable@vger.kernel.org> # 4.6-
Acked-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-07-07 15:55:37 +01:00
Ard Biesheuvel
40f87d3114 arm64: mm: fold init_pgd() into __create_pgd_mapping()
The routine __create_pgd_mapping() does nothing except calling init_pgd(),
which has no other callers. So fold the latter into the former. Also, drop
a comment that has gone stale.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-07-01 11:56:27 +01:00
Catalin Marinas
4133af6c04 arm64: mm: Remove split_p*d() functions
Since the efi_create_mapping() no longer generates block mappings
and being the last user of the split_p*d code, remove these functions
and the corresponding TLBI.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
[ardb: replace 'overlapping regions' with 'block mappings' in commit log]
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-07-01 11:56:27 +01:00
Ard Biesheuvel
53e1b32910 arm64: mm: add param to force create_pgd_mapping() to use page mappings
Add a bool parameter 'allow_block_mappings' to create_pgd_mapping() and
the various helper functions that it descends into, to give the caller
control over whether block entries may be used to create the mapping.

The UEFI runtime mapping routines will use this to avoid creating block
entries that would need to split up into page entries when applying the
permissions listed in the Memory Attributes firmware table.

This also replaces the block_mappings_allowed() helper function that was
added for DEBUG_PAGEALLOC functionality, but the resulting code is
functionally equivalent (given that debug_page_alloc does not operate on
EFI page table entries anyway)

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-07-01 11:56:26 +01:00
Andre Przywara
290622efc7 arm64: fix "dc cvau" cache operation on errata-affected core
The ARM errata 819472, 826319, 827319 and 824069 for affected
Cortex-A53 cores demand to promote "dc cvau" instructions to
"dc civac" as well.
Attribute the usage of the instruction in __flush_cache_user_range
to also be covered by our alternative patching efforts.
For that we introduce an assembly macro which both deals with
alternatives while still tagging the instructions as USER.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-07-01 11:26:20 +01:00
Kefeng Wang
6c5269f33e arm64: mm: remove unnecessary BUG_ON
The memblock_alloc() and memblock_alloc_base() will panic on their own
if no free memory, remove pointless BUG_ON.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-06-30 17:55:04 +01:00
Ard Biesheuvel
9fdc14c55c arm64: mm: fix location of _etext
As Kees Cook notes in the ARM counterpart of this patch [0]:

  The _etext position is defined to be the end of the kernel text code,
  and should not include any part of the data segments. This interferes
  with things that might check memory ranges and expect executable code
  up to _etext.

In particular, Kees is referring to the HARDENED_USERCOPY patch set [1],
which rejects attempts to call copy_to_user() on kernel ranges containing
executable code, but does allow access to the .rodata segment. Regardless
of whether one may or may not agree with the distinction, it makes sense
for _etext to have the same meaning across architectures.

So let's put _etext where it belongs, between .text and .rodata, and fix
up existing references to use __init_begin instead, which unlike _end_rodata
includes the exception and notes sections as well.

The _etext references in kaslr.c are left untouched, since its references
to [_stext, _etext) are meant to capture potential jump instruction targets,
and so disregarding .rodata is actually an improvement here.

[0] http://article.gmane.org/gmane.linux.kernel/2245084
[1] http://thread.gmane.org/gmane.linux.kernel.hardened.devel/2502

Reported-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-06-27 18:21:27 +01:00
Mark Rutland
ea2cbee3bc arm64: mm: simplify memblock numa node extraction
We currently open-code extracting the NUMA node of a memblock region,
which requires an ifdef to cater for !CONFIG_NUMA builds where the
memblock_region::nid field does not exist.

The generic memblock_get_region_node helper is intended to cater for
this. For CONFIG_HAVE_MEMBLOCK_NODE_MAP, builds this returns reg->nid,
and for for !CONFIG_HAVE_MEMBLOCK_NODE_MAP builds this is a static
inline that returns 0. Note that for arm64,
CONFIG_HAVE_MEMBLOCK_NODE_MAP is selected iff CONFIG_NUMA is.

This patch makes use of memblock_get_region_node to simplify the arm64
code. At the same time, we can move the nid variable definition into the
loop, as this is the only place it is used.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-06-27 18:05:39 +01:00
Shaokun Zhang
20c27a4270 arm64: mm: remove page_mapping check in __sync_icache_dcache
__sync_icache_dcache unconditionally skips the cache maintenance for
anonymous pages, under the assumption that flushing is only required in
the presence of D-side aliases [see 7249b79f6b ("arm64: Do not flush
the D-cache for anonymous pages")].

Unfortunately, this breaks migration of anonymous pages holding
self-modifying code, where userspace cannot be reasonably expected to
reissue maintenance instructions in response to a migration.

This patch fixes the problem by removing the broken page_mapping(page)
check from the cache syncing code, otherwise we may end up fetching and
executing stale instructions from the PoU.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: <stable@vger.kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-06-21 20:10:18 +01:00
Jean-Philippe Brucker
f7e0efc9b5 arm64: update ASID limit
During a rollover, we mark the active ASID on each CPU as reserved, before
allocating a new ID for the task that caused the rollover. This means that
with N CPUs, we can only guarantee the new task to obtain a valid ASID if
we have at least N+1 ASIDs. Update this limit in the initcall check.

Note that this restriction was introduced by commit 8e648066 on the
arch/arm side, which disallow re-using the previously active ASID on the
local CPU, as it would introduce a TLB race.

In addition, we only dispose of NUM_USER_ASIDS-1, since ASID 0 is
reserved. Add this restriction as well.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-06-21 20:10:18 +01:00
Mark Rutland
541ec870ef arm64: kill ESR_LNX_EXEC
Currently we treat ESR_EL1 bit 24 as software-defined for distinguishing
instruction aborts from data aborts, but this bit is architecturally
RES0 for instruction aborts, and could be allocated for an arbitrary
purpose in future. Additionally, we hard-code the value in entry.S
without the mnemonic, making the code difficult to understand.

Instead, remove ESR_LNX_EXEC, and distinguish aborts based on the esr,
which we already pass to the sole use of ESR_LNX_EXEC. A new helper,
is_el0_instruction_abort() is added to make the logic clear. Any
instruction aborts taken from EL1 will already have been handled by
bad_mode, so we need not handle that case in the helper.

For consistency, the existing permission_fault helper is renamed to
is_permission_fault, and the return type is changed to bool. There
should be no functional changes as the return value was a boolean
expression, and the result is only used in another boolean expression.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Dave P Martin <dave.martin@arm.com>
Cc: Huang Shijie <shijie.huang@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-06-21 17:07:48 +01:00
Mark Rutland
275f344bec arm64: add macro to extract ESR_ELx.EC
Several places open-code extraction of the EC field from an ESR_ELx
value, in subtly different ways. This is unfortunate duplication and
variation, and the precise logic used to extract the field is a
distraction.

This patch adds a new macro, ESR_ELx_EC(), to extract the EC field from
an ESR_ELx value in a consistent fashion.

Existing open-coded extractions in core arm64 code are moved over to the
new helper. KVM code is left as-is for the moment.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Huang Shijie <shijie.huang@arm.com>
Cc: Dave P Martin <dave.martin@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-06-21 17:07:09 +01:00
Jisheng Zhang
b67a8b29df arm64: mm: only initialize swiotlb when necessary
we only initialize swiotlb when swiotlb_force is true or not all system
memory is DMA-able, this trivial optimization saves us 64MB when
swiotlb is not necessary.

Signed-off-by: Jisheng Zhang <jszhang@marvell.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-06-21 16:54:53 +01:00
Mark Rutland
4674fdb9f1 arm64: mm: dump: make page table dumping reusable
For debugging purposes, it would be nice if we could export page tables
other than the swapper_pg_dir to userspace. To enable this, this patch
refactors the arm64 page table dumping code such that multiple tables
may be registered with the framework, and exported under debugfs.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Laura Abbott <labbott@fedoraproject.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-06-21 15:09:11 +01:00
Mark Rutland
bbb1681ee3 arm64: mm: mark fault_info table const
Unlike the debug_fault_info table, we never intentionally alter the
fault_info table at runtime, and all derived pointers are treated as
const currently.

Make the table const so that it can be placed in .rodata and protected
from unintentional writes, as we do for the syscall tables.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-06-14 15:02:34 +01:00
Will Deacon
0106d456c4 arm64: mm: always take dirty state from new pte in ptep_set_access_flags
Commit 66dbd6e61a ("arm64: Implement ptep_set_access_flags() for
hardware AF/DBM") ensured that pte flags are updated atomically in the
face of potential concurrent, hardware-assisted updates. However, Alex
reports that:

 | This patch breaks swapping for me.
 | In the broken case, you'll see either systemd cpu time spike (because
 | it's stuck in a page fault loop) or the system hang (because the
 | application owning the screen is stuck in a page fault loop).

It turns out that this is because the 'dirty' argument to
ptep_set_access_flags is always 0 for read faults, and so we can't use
it to set PTE_RDONLY. The failing sequence is:

  1. We put down a PTE_WRITE | PTE_DIRTY | PTE_AF pte
  2. Memory pressure -> pte_mkold(pte) -> clear PTE_AF
  3. A read faults due to the missing access flag
  4. ptep_set_access_flags is called with dirty = 0, due to the read fault
  5. pte is then made PTE_WRITE | PTE_DIRTY | PTE_AF | PTE_RDONLY (!)
  6. A write faults, but pte_write is true so we get stuck

The solution is to check the new page table entry (as would be done by
the generic, non-atomic definition of ptep_set_access_flags that just
calls set_pte_at) to establish the dirty state.

Cc: <stable@vger.kernel.org> # 4.3+
Fixes: 66dbd6e61a ("arm64: Implement ptep_set_access_flags() for hardware AF/DBM")
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Alexander Graf <agraf@suse.de>
Tested-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-06-08 10:23:44 +01:00
Mark Rutland
48dd73c55d arm64: mm: dump: log span level
The page table dump code logs spans of entries at the same level
(pgd/pud/pmd/pte) which have the same attributes. While we log the
(decoded) attributes, we don't log the level, which leaves the output
ambiguous and/or confusing in some cases.

For example:

0xffff800800000000-0xffff800980000000           6G       RW NX SHD AF        BLK UXN MEM/NORMAL

If using 4K pages, this may describe a span of 6 1G block entries at the
PGD/PUD level, or 3072 2M block entries at the PMD level.

This patch adds the page table level to each output line, removing this
ambiguity. For the example above, this will produce:

0xffffffc800000000-0xffffffc980000000           6G PUD       RW NX SHD AF        BLK UXN MEM/NORMAL

When 3 level tables are in use, and we use the asm-generic/nopud.h
definitions, the dump code treats each entry in the PGD as a 1 element
table at the PUD level, and logs spans as being PUDs, which can be
confusing. To counteract this, the "PUD" mnemonic is replaced with "PGD"
when CONFIG_PGTABLE_LEVELS <= 3. Likewise for "PMD" when
CONFIG_PGTABLE_LEVELS <= 2.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Huang Shijie <shijie.huang@arm.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-06-03 10:16:22 +01:00
Will Deacon
ab2e1b8923 Revert "arm64: hugetlb: partial revert of 66b3923a1a0f"
This reverts commit ff7925848b.

Now that the contiguous-hint hugetlb regression has been debugged and
fixed upstream by 66ee95d16a ("mm: exclude HugeTLB pages from THP
page_mapped() logic"), we can revert the previous partial revert of this
feature.

Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-05-31 11:00:09 +01:00
Hanjun Guo
d8b47fca8c arm64, ACPI, NUMA: NUMA support based on SRAT and SLIT
Introduce a new file to hold ACPI based NUMA information parsing from
SRAT and SLIT.

SRAT includes the CPU ACPI ID to Proximity Domain mappings and memory
ranges to Proximity Domain mapping.  SLIT has the information of inter
node distances(relative number for access latency).

Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com>
[rrichter@cavium.com Reworked for numa v10 series ]
Signed-off-by: Robert Richter <rrichter@cavium.com>
[david.daney@cavium.com reorderd and combinded with other patches in
Hanjun Guo's original set, removed get_mpidr_in_madt() and use
acpi_map_madt_entry() instead.]
Signed-off-by: David Daney <david.daney@cavium.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Dennis Chen <dennis.chen@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-05-30 14:27:09 +02:00
David Daney
34c3337052 arm64, NUMA: Cleanup NUMA disabled messages
As noted by Dennis Chen, we don't want to print "No NUMA configuration
found" if NUMA was forced off from the command line.

Change the type of numa_off to bool, and clean up printing code.
Print "NUMA disabled" if forced off on command line and "No NUMA
configuration found" if there was no firmware NUMA information.

Signed-off-by: David Daney <david.daney@cavium.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Dennis Chen <dennis.chen@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-05-30 14:27:08 +02:00
Hanjun Guo
8ccbbdaa2b arm64, NUMA: rework numa_add_memblk()
Rework numa_add_memblk() to update the parameter "u64 size" to "u64
end", this will make it consistent with x86 and simplifies the arm64
ACPI NUMA code to be added later.

Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Robert Richter <rrichter@cavium.com>
Signed-off-by: David Daney <david.daney@cavium.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-05-30 14:27:07 +02:00
Linus Torvalds
a05a70db34 Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton:

 - fsnotify fix

 - poll() timeout fix

 - a few scripts/ tweaks

 - debugobjects updates

 - the (small) ocfs2 queue

 - Minor fixes to kernel/padata.c

 - Maybe half of the MM queue

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (117 commits)
  mm, page_alloc: restore the original nodemask if the fast path allocation failed
  mm, page_alloc: uninline the bad page part of check_new_page()
  mm, page_alloc: don't duplicate code in free_pcp_prepare
  mm, page_alloc: defer debugging checks of pages allocated from the PCP
  mm, page_alloc: defer debugging checks of freed pages until a PCP drain
  cpuset: use static key better and convert to new API
  mm, page_alloc: inline pageblock lookup in page free fast paths
  mm, page_alloc: remove unnecessary variable from free_pcppages_bulk
  mm, page_alloc: pull out side effects from free_pages_check
  mm, page_alloc: un-inline the bad part of free_pages_check
  mm, page_alloc: check multiple page fields with a single branch
  mm, page_alloc: remove field from alloc_context
  mm, page_alloc: avoid looking up the first zone in a zonelist twice
  mm, page_alloc: shortcut watermark checks for order-0 pages
  mm, page_alloc: reduce cost of fair zone allocation policy retry
  mm, page_alloc: shorten the page allocator fast path
  mm, page_alloc: check once if a zone has isolated pageblocks
  mm, page_alloc: move __GFP_HARDWALL modifications out of the fastpath
  mm, page_alloc: simplify last cpupid reset
  mm, page_alloc: remove unnecessary initialisation from __alloc_pages_nodemask()
  ...
2016-05-19 20:00:06 -07:00
Vaishali Thakkar
d77e20cea7 arm64: mm: use hugetlb_bad_size()
Update setup_hugepagesz() to call hugetlb_bad_size() when unsupported
hugepage size is found.

Signed-off-by: Vaishali Thakkar <vaishali.thakkar@oracle.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Yaowei Bai <baiyaowei@cmss.chinamobile.com>
Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-19 19:12:14 -07:00
Linus Torvalds
e0fb1b3639 IOMMU Updates for Linux v4.7
The updates include:
 
 	* Rate limiting for the VT-d fault handler
 
 	* Remove statistics code from the AMD IOMMU driver. It is unused
 	  and should be replaced by something more generic if needed
 
 	* Per-domain pagesize-bitmaps in IOMMU core code to support
 	  systems with different types of IOMMUs
 
 	* Support for ACPI devices in the AMD IOMMU driver
 
 	* 4GB mode support for Mediatek IOMMU driver
 
 	* ARM-SMMU updates from Will Deacon:
 
 		- Support for 64k pages with SMMUv1 implementations
 		  (e.g MMU-401)
 
 		- Remove open-coded 64-bit MMIO accessors
 
 		- Initial support for 16-bit VMIDs, as supported by some
 		  ThunderX SMMU implementations
 
 		- A couple of errata workarounds for silicon in the
 		  field
 
 	* Various fixes here and there
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABAgAGBQJXPeM1AAoJECvwRC2XARrjA2QP/2Cz+pVkpQCuvhAse57eN4rB
 wWXKTjqSFZ4PcA3Vu5yvX6XMv15g46xXFJAhf2spE5//8+xgFfYBgkBRpnqu1brw
 SL6f8A912MnfMRgWqcdKkJNeHbiN0kOvcIQv1J8GNfciqMiyYFhiLP6fFiRmWR/F
 XDBjUeFZ5+Uwf1BAGqw0cVPexeakEbsLHUGqxFsh5g2T4i43aHzO2HJT3IdwWHDt
 F2ivs8gNFGBeJEyzhW8TD0rOEEyHAnM3N18qPEU9+dD0UmjnTQPymEZSbsGW5d4j
 Cn40QYlA+Zmbwgx6LaDVChzQyRJu6O3uvFThyRviiYKCri/Nc9cUT4vHsFGU4MXb
 1d3bqrgzaw7vw31BN7S1Py3MV+WpVnEYjFm2O+hW28OjtSpm6ZvbI8wc0rF4UT/I
 KgL0gSeA8tp25uVISM+ktpIrObYsAcoCz8nvurpDv2AGkKRzhyoSze0Jg43rusD8
 BH7iFWu1LRPlulTGlrHMtNmbZeEApUPbObcQAOcrBOj9vjuFaZ8qduZmB+hwS2iV
 p9atn+54LmGO0LuzqsGrhApIeXTeTZSrGyjlbUADWBJlTw8Xyk/CR39Wf3m/Xmpr
 DiJ/5oa8SKQtNbwvbScn1+sInNWP/pH/JgnRO3Yvqth8HWF/DlpzNj5XxAB8czwr
 qjk9WjpEXun50ocPFQeS
 =jpPD
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull IOMMU updates from Joerg Roedel:
 "The updates include:

   - rate limiting for the VT-d fault handler

   - remove statistics code from the AMD IOMMU driver.  It is unused and
     should be replaced by something more generic if needed

   - per-domain pagesize-bitmaps in IOMMU core code to support systems
     with different types of IOMMUs

   - support for ACPI devices in the AMD IOMMU driver

   - 4GB mode support for Mediatek IOMMU driver

   - ARM-SMMU updates from Will Deacon:
      - support for 64k pages with SMMUv1 implementations (e.g MMU-401)
      - remove open-coded 64-bit MMIO accessors
      - initial support for 16-bit VMIDs, as supported by some ThunderX
        SMMU implementations
      - a couple of errata workarounds for silicon in the field

   - various fixes here and there"

* tag 'iommu-updates-v4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (44 commits)
  iommu/arm-smmu: Use per-domain page sizes.
  iommu/amd: Remove statistics code
  iommu/dma: Finish optimising higher-order allocations
  iommu: Allow selecting page sizes per domain
  iommu: of: enforce const-ness of struct iommu_ops
  iommu: remove unused priv field from struct iommu_ops
  iommu/dma: Implement scatterlist segment merging
  iommu/arm-smmu: Clear cache lock bit of ACR
  iommu/arm-smmu: Support SMMUv1 64KB supplement
  iommu/arm-smmu: Decouple context format from kernel config
  iommu/arm-smmu: Tidy up 64-bit/atomic I/O accesses
  io-64-nonatomic: Add relaxed accessor variants
  iommu/arm-smmu: Work around MMU-500 prefetch errata
  iommu/arm-smmu: Convert ThunderX workaround to new method
  iommu/arm-smmu: Differentiate specific implementations
  iommu/arm-smmu: Workaround for ThunderX erratum #27704
  iommu/arm-smmu: Add support for 16 bit VMID
  iommu/amd: Move get_device_id() and friends to beginning of file
  iommu/amd: Don't use IS_ERR_VALUE to check integer values
  iommu/amd: Signedness bug in acpihid_device_group()
  ...
2016-05-19 17:07:04 -07:00
Robin Murphy
3b6b7e19e3 iommu/dma: Finish optimising higher-order allocations
Now that we know exactly which page sizes our caller wants to use in the
given domain, we can restrict higher-order allocation attempts to just
those sizes, if any, and avoid wasting any time or effort on other sizes
which offer no benefit. In the same vein, this also lets us accommodate
a minimum order greater than 0 for special cases.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Tested-by: Yong Wu <yong.wu@mediatek.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2016-05-09 15:33:29 +02:00
Robin Murphy
53c92d7933 iommu: of: enforce const-ness of struct iommu_ops
As a set of driver-provided callbacks and static data, there is no
compelling reason for struct iommu_ops to be mutable in core code, so
enforce const-ness throughout.

Acked-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2016-05-09 15:33:29 +02:00
Yang Shi
394bf2f248 arm64: mm: remove unnecessary EXPORT_SYMBOL_GPL
arch_pick_mmap_layout is only called by fs/exec.c which is always built into
kernel, it looks the EXPORT_SYMBOL_GPL is pointless and no architectures export
it other than ARM64.

Signed-off-by: Yang Shi <yang.shi@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-05-05 09:49:38 +01:00
James Morse
cabe1c81ea arm64: Change cpu_resume() to enable mmu early then access sleep_sp by va
By enabling the MMU early in cpu_resume(), the sleep_save_sp and stack can
be accessed by VA, which avoids the need to convert-addresses and clean to
PoC on the suspend path.

MMU setup is shared with the boot path, meaning the swapper_pg_dir is
restored directly: ttbr1_el1 is no longer saved/restored.

struct sleep_save_sp is removed, replacing it with a single array of
pointers.

cpu_do_{suspend,resume} could be further reduced to not restore: cpacr_el1,
mdscr_el1, tcr_el1, vbar_el1 and sctlr_el1, all of which are set by
__cpu_setup(). However these values all contain res0 bits that may be used
to enable future features.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-04-28 12:05:46 +01:00
Geoff Levand
7b7293ae3d arm64: Fold proc-macros.S into assembler.h
To allow the assembler macros defined in arch/arm64/mm/proc-macros.S to
be used outside the mm code move the contents of proc-macros.S to
asm/assembler.h.  Also, delete proc-macros.S, and fix up all references
to proc-macros.S.

Signed-off-by: Geoff Levand <geoff@infradead.org>
Acked-by: Pavel Machek <pavel@ucw.cz>
[rebased, included dcache_by_line_op]
Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-04-28 12:05:45 +01:00
Ard Biesheuvel
d8fc68a04d arm64: ptdump: add region marker for kasan shadow region
Annotate the KASAN shadow region with boundary markers, so that its
mappings stand out in the page table dumper output.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-04-25 12:05:21 +01:00
Ard Biesheuvel
c8f8cca483 arm64: ptdump: use static initializers for vmemmap region boundaries
There is no need to initialize the vmemmap region boundaries dynamically,
since they are compile time constants. So just add these constants to the
global struct initializer, and drop the dynamic assignment and related code.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-04-25 12:04:39 +01:00
Robin Murphy
921b1f52c9 arm64/dma-mapping: Remove default domain workaround
With the IOMMU core now taking care of default domains for groups
regardless of bus type, we can gleefully rip out this stop-gap, as
slight recompense for having to expand the other one.

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-04-21 17:34:34 +01:00
Robin Murphy
226d89cbb2 arm64/dma-mapping: Extend DMA ops workaround to PCI devices
PCI devices now suffer the same hiccup as platform devices, in that they
get their DMA ops configured before they have been added to their bus,
and thus before we know whether they have successfully registered with
an IOMMU or not. Until the necessary driver core changes to reorder
calls during device creation have been worked out, extend our delayed
notifier trick onto the PCI bus so as to avoid broken DMA ops once
IOMMUs get plugged into the PCI code.

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-04-21 17:34:34 +01:00
Kefeng Wang
9974723e31 arm64: mm: Show bss segment in kernel memory layout
Show the bss segment information as with text and data in Virtual
memory kernel layout.

Acked-by: James Morse <james.morse@arm.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-04-19 17:03:31 +01:00
Kefeng Wang
d32351c824 arm64: mm: make pr_cont() per line in Virtual kernel memory layout
Each line with single pr_cont() in Virtual kernel memory layout,
or the dump of the kernel memory layout in dmesg is not aligned
when PRINTK_TIME enabled, due to the missing time stamps.

Tested-by: James Morse <james.morse@arm.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-04-19 17:01:34 +01:00
Huang Shijie
3a72db703c arm64: mm: remove the redundant code
We already re-enable interrupts where necessary in the entry code, so
there is no need to do it again in do_page fault. This patch removes
the redundant code.

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Huang Shijie <shijie.huang@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-04-19 09:52:51 +01:00
Catalin Marinas
66dbd6e61a arm64: Implement ptep_set_access_flags() for hardware AF/DBM
When hardware updates of the access and dirty states are enabled, the
default ptep_set_access_flags() implementation based on calling
set_pte_at() directly is potentially racy. This triggers the "racy dirty
state clearing" warning in set_pte_at() because an existing writable PTE
is overridden with a clean entry.

There are two main scenarios for this situation:

1. The CPU getting an access fault does not support hardware updates of
   the access/dirty flags. However, a different agent in the system
   (e.g. SMMU) can do this, therefore overriding a writable entry with a
   clean one could potentially lose the automatically updated dirty
   status

2. A more complex situation is possible when all CPUs support hardware
   AF/DBM:

   a) Initial state: shareable + writable vma and pte_none(pte)
   b) Read fault taken by two threads of the same process on different
      CPUs
   c) CPU0 takes the mmap_sem and proceeds to handling the fault. It
      eventually reaches do_set_pte() which sets a writable + clean pte.
      CPU0 releases the mmap_sem
   d) CPU1 acquires the mmap_sem and proceeds to handle_pte_fault(). The
      pte entry it reads is present, writable and clean and it continues
      to pte_mkyoung()
   e) CPU1 calls ptep_set_access_flags()

   If between (d) and (e) the hardware (another CPU) updates the dirty
   state (clears PTE_RDONLY), CPU1 will override the PTR_RDONLY bit
   marking the entry clean again.

This patch implements an arm64-specific ptep_set_access_flags() function
to perform an atomic update of the PTE flags.

Fixes: 2f4b829c62 ("arm64: Add support for hardware updates of the access and dirty pte bits")
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Ming Lei <tom.leiming@gmail.com>
Tested-by: Julien Grall <julien.grall@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: <stable@vger.kernel.org> # 4.3+
[will: reworded comment]
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-04-15 18:06:09 +01:00
Ganapatrao Kulkarni
1a2db30034 arm64, numa: Add NUMA support for arm64 platforms.
Attempt to get the memory and CPU NUMA node via of_numa.  If that
fails, default the dummy NUMA node and map all memory and CPUs to node
0.

Tested-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Robert Richter <rrichter@cavium.com>
Signed-off-by: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com>
Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-04-15 18:06:09 +01:00
David Daney
3194ac6e66 arm64: Move unflatten_device_tree() call earlier.
In order to extract NUMA information from the device tree, we need to
have the tree in its unflattened form.

Move the call to bootmem_init() in the tail of paging_init() into
setup_arch, and adjust header files so that its declaration is
visible.

Move the unflatten_device_tree() call between the calls to
paging_init() and bootmem_init().  Follow on patches add NUMA handling
to bootmem_init().

Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-04-15 18:06:08 +01:00
Suzuki K Poulose
17eebd1a43 arm64: Add cpu_panic_kernel helper
During the activation of a secondary CPU, we could report serious
configuration issues and hence request to crash the kernel. We do
this for CPU ASID bit check now. We will need it also for handling
mismatched exception levels for the CPUs with VHE. Hence, add a
helper to do the same for reusability.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-04-15 18:06:06 +01:00