VGIC implementation.
- s390: support for trapping software breakpoints, nested virtualization
(vSIE), the STHYI opcode, initial extensions for CPU model support.
- MIPS: support for MIPS64 hosts (32-bit guests only) and lots of cleanups,
preliminary to this and the upcoming support for hardware virtualization
extensions.
- x86: support for execute-only mappings in nested EPT; reduced vmexit
latency for TSC deadline timer (by about 30%) on Intel hosts; support for
more than 255 vCPUs.
- PPC: bugfixes.
The ugly bit is the conflicts. A couple of them are simple conflicts due
to 4.7 fixes, but most of them are with other trees. There was definitely
too much reliance on Acked-by here. Some conflicts are for KVM patches
where _I_ gave my Acked-by, but the worst are for this pull request's
patches that touch files outside arch/*/kvm. KVM submaintainers should
probably learn to synchronize better with arch maintainers, with the
latter providing topic branches whenever possible instead of Acked-by.
This is what we do with arch/x86. And I should learn to refuse pull
requests when linux-next sends scary signals, even if that means that
submaintainers have to rebase their branches.
Anyhow, here's the list:
- arch/x86/kvm/vmx.c: handle_pcommit and EXIT_REASON_PCOMMIT was removed
by the nvdimm tree. This tree adds handle_preemption_timer and
EXIT_REASON_PREEMPTION_TIMER at the same place. In general all mentions
of pcommit have to go.
There is also a conflict between a stable fix and this patch, where the
stable fix removed the vmx_create_pml_buffer function and its call.
- virt/kvm/kvm_main.c: kvm_cpu_notifier was removed by the hotplug tree.
This tree adds kvm_io_bus_get_dev at the same place.
- virt/kvm/arm/vgic.c: a few final bugfixes went into 4.7 before the
file was completely removed for 4.8.
- include/linux/irqchip/arm-gic-v3.h: this one is entirely our fault;
this is a change that should have gone in through the irqchip tree and
pulled by kvm-arm. I think I would have rejected this kvm-arm pull
request. The KVM version is the right one, except that it lacks
GITS_BASER_PAGES_SHIFT.
- arch/powerpc: what a mess. For the idle_book3s.S conflict, the KVM
tree is the right one; everything else is trivial. In this case I am
not quite sure what went wrong. The commit that is causing the mess
(fd7bacbca4, "KVM: PPC: Book3S HV: Fix TB corruption in guest exit
path on HMI interrupt", 2016-05-15) touches both arch/powerpc/kernel/
and arch/powerpc/kvm/. It's large, but at 396 insertions/5 deletions
I guessed that it wasn't really possible to split it and that the 5
deletions wouldn't conflict. That wasn't the case.
- arch/s390: also messy. First is hypfs_diag.c where the KVM tree
moved some code and the s390 tree patched it. You have to reapply the
relevant part of commits 6c22c98637, plus all of e030c1125e, to
arch/s390/kernel/diag.c. Or pick the linux-next conflict
resolution from http://marc.info/?l=kvm&m=146717549531603&w=2.
Second, there is a conflict in gmap.c between a stable fix and 4.8.
The KVM version here is the correct one.
I have pushed my resolution at refs/heads/merge-20160802 (commit
3d1f53419842) at git://git.kernel.org/pub/scm/virt/kvm/kvm.git.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJXoGm7AAoJEL/70l94x66DugQIAIj703ePAFepB/fCrKHkZZia
SGrsBdvAtNsOhr7FQ5qvvjLxiv/cv7CymeuJivX8H+4kuUHUllDzey+RPHYHD9X7
U6n1PdCH9F15a3IXc8tDjlDdOMNIKJixYuq1UyNZMU6NFwl00+TZf9JF8A2US65b
x/41W98ilL6nNBAsoDVmCLtPNWAqQ3lajaZELGfcqRQ9ZGKcAYOaLFXHv2YHf2XC
qIDMf+slBGSQ66UoATnYV2gAopNlWbZ7n0vO6tE2KyvhHZ1m399aBX1+k8la/0JI
69r+Tz7ZHUSFtmlmyByi5IAB87myy2WQHyAPwj+4vwJkDGPcl0TrupzbG7+T05Y=
=42ti
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
- ARM: GICv3 ITS emulation and various fixes. Removal of the
old VGIC implementation.
- s390: support for trapping software breakpoints, nested
virtualization (vSIE), the STHYI opcode, initial extensions
for CPU model support.
- MIPS: support for MIPS64 hosts (32-bit guests only) and lots
of cleanups, preliminary to this and the upcoming support for
hardware virtualization extensions.
- x86: support for execute-only mappings in nested EPT; reduced
vmexit latency for TSC deadline timer (by about 30%) on Intel
hosts; support for more than 255 vCPUs.
- PPC: bugfixes.
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (302 commits)
KVM: PPC: Introduce KVM_CAP_PPC_HTM
MIPS: Select HAVE_KVM for MIPS64_R{2,6}
MIPS: KVM: Reset CP0_PageMask during host TLB flush
MIPS: KVM: Fix ptr->int cast via KVM_GUEST_KSEGX()
MIPS: KVM: Sign extend MFC0/RDHWR results
MIPS: KVM: Fix 64-bit big endian dynamic translation
MIPS: KVM: Fail if ebase doesn't fit in CP0_EBase
MIPS: KVM: Use 64-bit CP0_EBase when appropriate
MIPS: KVM: Set CP0_Status.KX on MIPS64
MIPS: KVM: Make entry code MIPS64 friendly
MIPS: KVM: Use kmap instead of CKSEG0ADDR()
MIPS: KVM: Use virt_to_phys() to get commpage PFN
MIPS: Fix definition of KSEGX() for 64-bit
KVM: VMX: Add VMCS to CPU's loaded VMCSs before VMPTRLD
kvm: x86: nVMX: maintain internal copy of current VMCS
KVM: PPC: Book3S HV: Save/restore TM state in H_CEDE
KVM: PPC: Book3S HV: Pull out TM state save/restore into separate procedures
KVM: arm64: vgic-its: Simplify MAPI error handling
KVM: arm64: vgic-its: Make vgic_its_cmd_handle_mapi similar to other handlers
KVM: arm64: vgic-its: Turn device_id validation into generic ID validation
...
It is seldom used in the kernel code and can be easily replaced by
either RELOCATABLE or PPC32. So there is no reason to keep a separate
kernel option for this.
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
idle_power7.S handles idle entry/exit for POWER7, POWER8 and in next
patch for POWER9. Rename the file to a non-hardware specific
name.
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When a guest is assigned to a core it converts the host Timebase (TB)
into guest TB by adding guest timebase offset before entering into
guest. During guest exit it restores the guest TB to host TB. This means
under certain conditions (Guest migration) host TB and guest TB can differ.
When we get an HMI for TB related issues the opal HMI handler would
try fixing errors and restore the correct host TB value. With no guest
running, we don't have any issues. But with guest running on the core
we run into TB corruption issues.
If we get an HMI while in the guest, the current HMI handler invokes opal
hmi handler before forcing guest to exit. The guest exit path subtracts
the guest TB offset from the current TB value which may have already
been restored with host value by opal hmi handler. This leads to incorrect
host and guest TB values.
With split-core, things become more complex. With split-core, TB also gets
split and each subcore gets its own TB register. When a hmi handler fixes
a TB error and restores the TB value, it affects all the TB values of
sibling subcores on the same core. On TB errors all the thread in the core
gets HMI. With existing code, the individual threads call opal hmi handle
independently which can easily throw TB out of sync if we have guest
running on subcores. Hence we will need to co-ordinate with all the
threads before making opal hmi handler call followed by TB resync.
This patch introduces a sibling subcore state structure (shared by all
threads in the core) in paca which holds information about whether sibling
subcores are in Guest mode or host mode. An array in_guest[] of size
MAX_SUBCORE_PER_CORE=4 is used to maintain the state of each subcore.
The subcore id is used as index into in_guest[] array. Only primary
thread entering/exiting the guest is responsible to set/unset its
designated array element.
On TB error, we get HMI interrupt on every thread on the core. Upon HMI,
this patch will now force guest to vacate the core/subcore. Primary
thread from each subcore will then turn off its respective bit
from the above bitmap during the guest exit path just after the
guest->host partition switch is complete.
All other threads that have just exited the guest OR were already in host
will wait until all other subcores clears their respective bit.
Once all the subcores turn off their respective bit, all threads will
will make call to opal hmi handler.
It is not necessary that opal hmi handler would resync the TB value for
every HMI interrupts. It would do so only for the HMI caused due to
TB errors. For rest, it would not touch TB value. Hence to make things
simpler, primary thread would call TB resync explicitly once for each
core immediately after opal hmi handler instead of subtracting guest
offset from TB. TB resync call will restore the TB with host value.
Thus we can be sure about the TB state.
One of the primary threads exiting the guest will take up the
responsibility of calling TB resync. It will use one of the top bits
(bit 63) from subcore state flags bitmap to make the decision. The first
primary thread (among the subcores) that is able to set the bit will
have to call the TB resync. Rest all other threads will wait until TB
resync is complete. Once TB resync is complete all threads will then
proceed.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Rather than open-coding -pg whereever we want to disable ftrace, use the
existing $(CC_FLAGS_FTRACE) variable.
This has the advantage that it will work in future when we use a
different set of flags to enable ftrace.
Signed-off-by: Torsten Duwe <duwe@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This hooks up UBSAN support for PowerPC.
So far it's found some interesting cases where we don't properly sanitise
input to shifts, including one in our futex handling. Nothing critical,
but interesting and worth fixing.
[valentinrothberg@gmail.com: arch/powerpc/Kconfig: fix typo in select statement]
Signed-off-by: Daniel Axtens <dja@axtens.net>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Tested-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Valentin Rothberg <valentinrothberg@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a new powerpc-specific trace clock using the timebase register,
similar to x86-tsc. This gives us
- a fast, monotonic, hardware clock source for trace entries, and
- a clock that can be used to correlate events across cpus as well as across
hypervisor and guests.
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
A whole lot of bug fixes. Nothing stands out here except the ability to
enable CONFIG_OF on every architecture, and an import of a newer version
of dtc.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJVlAkwAAoJEMWQL496c2LNNYMP/23EdDPyRneoaIynd0nNk9SO
UfhOSJdSo7vMmT9Rea2eBHdn3leJrx9m9JXvIrBwGdcDxMNsS4mS1k9Bj63aqEVn
kK+IrI1Jbx7F6/AlBh3u4nHixIjoTc3IWlFdxUTBKQ2ATYKmCVhVCsf6UyfSxAj+
xPL6bmALegEZ2kJzK+qhk6K0j7GeQDnk1SAS3xMvTpJH76Ac2F+Gi9u7J68GqXAS
d7WBCAjijkqskfAdeP13XasvSdU7ZCOnDjClwJd83ZQGmtp77T8PWF0lzLlnC8Ho
sMwDhoWHnCtFP0U1hnhUF1pXhhn8W9NlxymtYbxR1tJcku0fSiYlibZ6jnzTRc2m
TsqzaWDR3U/VX4t5wH5FtXM1Cum/eAfV6HX9fGXeYYP7Einl7Kg6yXYjIY+b7HG9
R3znQ2TKoYPsUr/WWXrZK52ZTesTe+LG98WYH1YhNbZ5riev9fLZxI2zMl/h83/Z
LrF0g0MLQobHuBCUSIXSUot6RTQgLzFWHtnSrNOUycMwlRNZHYOY3DSvzLYLw+hJ
XwV9p2k3DV/l/XnQJPy3y/MA+7jEudzlq7HukmtYVhh9rOy3y+Sq3GMGAiUFjAqj
YDxBrrIpoPWNp/OJJX2yhnTvnNaV/BjhCB1CiJooFCjHz78I5daqBXO155hn9msY
7To1PHvyEngabBpdN/MZ
=tm5y
-----END PGP SIGNATURE-----
Merge tag 'devicetree-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/glikely/linux
Pull devicetree updates from Grant Likely:
"A whole lot of bug fixes.
Nothing stands out here except the ability to enable CONFIG_OF on
every architecture, and an import of a newer version of dtc"
* tag 'devicetree-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/glikely/linux: (22 commits)
of/irq: Rename "intc_desc" to "of_intc_desc" to fix OF on sh
of/irq: Fix pSeries boot failure
Documentation: DT: Fix a typo in the filename "lantiq,<chip>-pinumx.txt"
of: define of_find_node_by_phandle for !CONFIG_OF
of/address: use atomic allocation in pci_register_io_range()
of: Add vendor prefix for Zodiac Inflight Innovations
dt/fdt: add empty versions of early_init_dt_*_memory_arch
of: clean-up unnecessary libfdt include paths
of: make unittest select OF_EARLY_FLATTREE instead of depend on it
of: make CONFIG_OF user selectable
MIPS: prepare for user enabling of CONFIG_OF
of/fdt: fix argument name and add comments of unflatten_dt_node()
of: return NUMA_NO_NODE from fallback of_node_to_nid()
tps6507x.txt: Remove executable permission
of/overlay: Grammar s/an negative/a negative/
of/fdt: Make fdt blob input parameters of unflatten functions const
of: add helper function to retrive match data
of: Grammar s/property exist/property exists/
of: Move OF flags to be visible even when !CONFIG_OF
scripts/dtc: Update to upstream version 9d3649bd3be245c9
...
With the libfdt include fixups to use "" instead of <> in the
latest dtc import in commit 4760597 (scripts/dtc: Update to upstream
version 9d3649bd3be245c9), it is no longer necessary to add explicit
include paths to use libfdt. Remove these across the kernel.
Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Grant Likely <grant.likely@linaro.org>
Cc: linux-mips@linux-mips.org
Cc: linuxppc-dev@lists.ozlabs.org
The only little endian configuration we support is ppc64le. As such if
we're building little endian we don't need a 32-bit VDSO, because there
is no 32-bit userspace.
This patch is a fairly ugly mess of #ifdefs, but is the minimal logic
required to disable the 32-bit VDSO. We can hopefully clean up the
result in future with some further refactoring.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We have set CONFIG_PPC_OF to always 'y' in commit 0a498d96a3
("powerpc: set CONFIG_PPC_OF=y always for ARCH=powerpc") nine years
ago. And the arch/ppc also has gone away for many years. The OF
functionality was also moved to a common place and be used by many
archs. So it does make no sense to keep such a option in the current
kernel. Just kill it.
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Simplify things considerably by moving all the ppc32 specific
symbol exports into its own file.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
__attribute__ ((unused))
WSP is the last user of CONFIG_PPC_A2, so we remove that as well.
Although CONFIG_PPC_ICSWX still exists, it's no longer selectable for
any Book3E platform, so we can remove the code in mmu-book3e.h that
depended on it.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
With libfdt support, we can take advantage of helper accessors in libfdt
for accessing the FDT header data. This makes the code more readable and
makes the FDT blob structure more opaque to the kernel. This also
prepares for removing struct boot_param_header completely.
Signed-off-by: Rob Herring <robh@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Tested-by: Grant Likely <grant.likely@linaro.org>
Tested-by: Stephen Chivers <schivers@csc.com>
the setup before the change was
- arch/powerpc/Kconfig had the PPC_CLOCK option, off by default
- depending on the PPC_CLOCK option the arch/powerpc/kernel/clock.c file
was built, which implements the clk.h API but always returns -ENOSYS
unless a platform registers specific callbacks
- the MPC52xx platform selected PPC_CLOCK but did not register any
callbacks, thus all clk.h API calls keep resulting in -ENOSYS errors
(which is OK, all peripheral drivers deal with the situation)
- the MPC512x platform selected PPC_CLOCK and registered specific
callbacks implemented in arch/powerpc/platforms/512x/clock.c, thus
provided real support for the clock API
- no other powerpc platform did select PPC_CLOCK
the situation after the change is
- the MPC512x platform implements the COMMON_CLK interface, and thus the
PPC_CLOCK approach in arch/powerpc/platforms/512x/clock.c has become
obsolete
- the MPC52xx platform still lacks genuine support for the clk.h API
while this is not a change against the previous situation (the error
code returned from COMMON_CLK stubs differs but every call still
results in an error)
- with all references gone, the arch/powerpc/kernel/clock.c wrapper and
the PPC_CLOCK option have become obsolete, as did the clk_interface.h
header file
the switch from PPC_CLOCK to COMMON_CLK is done for all platforms within
the same commit such that multiplatform kernels (the combination of 512x
and 52xx within one executable) keep working
Cc: Mike Turquette <mturquette@linaro.org>
Cc: Anatolij Gustschin <agust@denx.de>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Gerhard Sittig <gsi@denx.de>
Signed-off-by: Anatolij Gustschin <agust@denx.de>
Now that we handle machine check in linux, the MCE decoding should also
take place in linux host. This info is crucial to log before we go down
in case we can not handle the machine check errors. This patch decodes
and populates a machine check event which contain high level meaning full
MCE information.
We do this in real mode C code with ME bit on. The MCE information is still
available on emergency stack (in pt_regs structure format). Even if we take
another exception at this point the MCE early handler will allocate a new
stack frame on top of current one. So when we return back here we still have
our MCE information safe on current stack.
We use per cpu buffer to save high level MCE information. Each per cpu buffer
is an array of machine check event structure indexed by per cpu counter
mce_nest_count. The mce_nest_count is incremented every time we enter
machine check early handler in real mode to get the current free slot
(index = mce_nest_count - 1). The mce_nest_count is decremented once the
MCE info is consumed by virtual mode machine exception handler.
This patch provides save_mce_event(), get_mce_event() and release_mce_event()
generic routines that can be used by machine check handlers to populate and
retrieve the event. The routine release_mce_event() will free the event slot so
that it can be reused. Caller can invoke get_mce_event() with a release flag
either to release the event slot immediately OR keep it so that it can be
fetched again. The event slot can be also released anytime by invoking
release_mce_event().
This patch also updates kvm code to invoke get_mce_event to retrieve generic
mce event rather than paca->opal_mce_evt.
The KVM code always calls get_mce_event() with release flags set to false so
that event is available for linus host machine
If machine check occurs while we are in guest, KVM tries to handle the error.
If KVM is able to handle MC error successfully, it enters the guest and
delivers the machine check to guest. If KVM is not able to handle MC error, it
exists the guest and passes the control to linux host machine check handler
which then logs MC event and decides how to handle it in linux host. In failure
case, KVM needs to make sure that the MC event is available for linux host to
consume. Hence KVM always calls get_mce_event() with release flags set to false
and later it invokes release_mce_event() only if it succeeds to handle error.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
If we get a machine check exception due to SLB or TLB errors, then flush
SLBs/TLBs and reload SLBs to recover. We do this in real mode before turning
on MMU. Otherwise we would run into nested machine checks.
If we get a machine check when we are in guest, then just flush the
SLBs and continue. This patch handles errors for power7. The next
patch will handle errors for power8
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This file is entirely pseries specific nowadays, so move it out
of arch/powerpc/kernel where it doesn't belong anymore.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The Kconfig symbol 8XX_MINIMAL_FPEMU was removed in commit 968219fa33
("powerpc/8xx: Remove 8xx specific "minimal FPU emulation""). But that
commit didn't remove all code depending on that symbol. Do so now.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Remove the generic PPC_INDIRECT_IO and ensure we only add overhead
to the right accessors. IE. If only CONFIG_PPC_INDIRECT_PIO is set,
we don't add overhead to all MMIO accessors.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The patch moves the common part of EEH core into arch/powerpc/kernel
directory so that we needn't PPC_PSERIES while compiling POWERNV
platform:
* Move the EEH common part into arch/powerpc/kernel
* Move the functions for PCI hotplug from pSeries platform to
arch/powerpc/kernel/pci-hotplug.c
* Move CONFIG_EEH from arch/powerpc/platforms/pseries/Kconfig to
arch/powerpc/platforms/Kconfig
* Adjust makefile accordingly
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Here we add the helper functions to be used when context switching. These
allow us to fully reclaim and recheckpoint a transaction.
We introduce a new paca field called tm_scratch to help us store away register
values when doing the low level tm reclaim register save.
Signed-off-by: Matt Evans <matt@ozlabs.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Finally remove the two level TOC and build with -mcmodel=medium.
Unfortunately we can't build modules with -mcmodel=medium due to
the tricks the kernel module loader plays with percpu data:
# -mcmodel=medium breaks modules because it uses 32bit offsets from
# the TOC pointer to create pointers where possible. Pointers into the
# percpu data area are created by this method.
#
# The kernel module loader relocates the percpu data section from the
# original location (starting with 0xd...) to somewhere in the base
# kernel percpu data space (starting with 0xc...). We need a full
# 64bit relocation for this to work, hence -mcmodel=large.
On older kernels we fall back to the two level TOC (-mminimal-toc)
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The ppc64 kernel can get loaded at any address which means
our very early init code in prom_init.c must be relocatable. We do
this with a pretty nasty RELOC() macro that we wrap accesses of
variables with. It is very fragile and sometimes we forget to add a
RELOC() to an uncommon path or sometimes a compiler change breaks it.
32bit has a much more elegant solution where we build prom_init.c
with -mrelocatable and then process the relocations manually.
Unfortunately we can't do the equivalent on 64bit and we would
have to build the entire kernel relocatable (-pie), resulting in a
large increase in kernel footprint (megabytes of relocation data).
The relocation data will be marked __initdata but it still creates
more pressure on our already tight memory layout at boot.
Alan Modra pointed out that the 64bit ABI is relocatable even
if we don't build with -pie, we just need to relocate the TOC.
This patch implements that idea and relocates the TOC entries of
prom_init.c. An added bonus is there are very few relocations to
process which helps keep boot times on simulators down.
gcc does not put 64bit integer constants into the TOC but to be
safe we may want a build time script which passes through the
prom_init.c TOC entries to make sure everything looks reasonable.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Move the rule to build doorbell support out of the Makefile and into a
new Kconfig boolean that platforms can select.
We will add doorbell support to pseries as well in the next patch.
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Tested-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We are going to reuse this in POWER8 so make the name generic.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This is the port of uprobes to powerpc. Usage is similar to x86.
[root@xxxx ~]# ./bin/perf probe -x /lib64/libc.so.6 malloc
Added new event:
probe_libc:malloc (on 0xb4860)
You can now use it in all perf tools, such as:
perf record -e probe_libc:malloc -aR sleep 1
[root@xxxx ~]# ./bin/perf record -e probe_libc:malloc -aR sleep 20
[ perf record: Woken up 22 times to write data ]
[ perf record: Captured and wrote 5.843 MB perf.data (~255302 samples) ]
[root@xxxx ~]# ./bin/perf report --stdio
...
69.05% tar libc-2.12.so [.] malloc
28.57% rm libc-2.12.so [.] malloc
1.32% avahi-daemon libc-2.12.so [.] malloc
0.58% bash libc-2.12.so [.] malloc
0.28% sshd libc-2.12.so [.] malloc
0.08% irqbalance libc-2.12.so [.] malloc
0.05% bzip2 libc-2.12.so [.] malloc
0.04% sleep libc-2.12.so [.] malloc
0.03% multipathd libc-2.12.so [.] malloc
0.01% sendmail libc-2.12.so [.] malloc
0.01% automount libc-2.12.so [.] malloc
The trap_nr addition patch is a prereq.
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
epapr paravirtualization support is now a Kconfig
selectable option
Signed-off-by: Liu Yu <yu.liu@freescale.com>
[stuart.yoder@freescale.com: misc minor fixes, description update]
Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
The perf code has grown a lot since it started, and is big enough to
warrant its own subdirectory. For reference it's ~60% bigger than the
oprofile code. It declutters the kernel directory, makes it simpler to
grep for "just perf stuff", and allows us to shorten some filenames.
While we're at it, make it more obvious that we have two implementations
of the core perf logic. One for (roughly) Book3S CPUs, which was the
original implementation, and the other for Freescale embedded CPUs.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reserve the memory during early boot to preserve CPU state data, HPTE region
and RMA (real mode area) region data in case of kernel crash. At the time of
crash, powerpc firmware will store CPU state data, HPTE region data and move
RMA region data to the reserved memory area.
If the firmware-assisted dump fails to reserve the memory, then fallback
to existing kexec-based kdump.
Most of the code implementation to reserve memory has been
adapted from phyp assisted dump implementation written by Linas Vepstas
and Manish Ahuja
This patch also introduces a config option CONFIG_FA_DUMP for firmware
assisted dump feature on Powerpc (ppc64) architecture.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The following patch implements the dynamic relocation processing for
PPC32 kernel. relocate() accepts the target virtual address and relocates
the kernel image to the same.
Currently the following relocation types are handled :
R_PPC_RELATIVE
R_PPC_ADDR16_LO
R_PPC_ADDR16_HI
R_PPC_ADDR16_HA
The last 3 relocations in the above list depends on value of Symbol indexed
whose index is encoded in the Relocation entry. Hence we need the Symbol
Table for processing such relocations.
Note: The GNU ld for ppc32 produces buggy relocations for relocation types
that depend on symbols. The value of the symbols with STB_LOCAL scope
should be assumed to be zero. - Alan Modra
Signed-off-by: Suzuki K. Poulose <suzuki@in.ibm.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@linux.vnet.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Alan Modra <amodra@au1.ibm.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Signed-off-by: Josh Boyer <jwboyer@gmail.com>
This patch adds support for the new "jump label" feature.
Unlike x86 and sparc we just merrily patch the code with no locks etc,
as far as I know this is safe, but I'm not really sure what the x86/sparc
code is protecting against so maybe it's not.
I also don't see any reason for us to implement the poke_early() routine,
even though sparc does.
[BenH: Updated the patch to upstream generic changes]
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch implements the raw syscall tracepoints on PowerPC and exports
them for ftrace syscalls to use.
To minimise reworking existing code, I slightly re-ordered the thread
info flags such that the new TIF_SYSCALL_TRACEPOINT bit would still fit
within the 16 bits of the andi. instruction's UI field. The instructions
in question are in /arch/powerpc/kernel/entry_{32,64}.S to and the
_TIF_SYSCALL_T_OR_A with the thread flags to see if system call tracing
is enabled.
In the case of 64bit PowerPC, arch_syscall_addr and
arch_syscall_match_sym_name are overridden to allow ftrace syscalls to
work given the unusual system call table structure and symbol names that
start with a period.
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The patch below removes an unused config variable found by using a kernel
cleanup script.
Note: I did try to cross compile these but hit erros while doing so..
(gcc is not setup to cross compile) and am unsure if anymore needs to be done.
Please have a look if/when anybody has free time.
Signed-off-by: Justin P. Mattock <justinmattock@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Add the cputable entry, regs and setup & restore entries for
the PowerPC A2 core.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Wakeup comes from the system reset handler with a potential loss of
the non-hypervisor CPU state. We save the non-volatile state on the
stack and a pointer to it in the PACA, which the system reset handler
uses to restore things
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This bit indicates that we are operating in hypervisor mode on a CPU
compliant to architecture 2.06 or later (currently server only).
We set it on POWER7 and have a boot-time CPU setup function that
clears it if MSR:HV isn't set (booting under a hypervisor).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
No need to have three of them.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* 'kvm-updates/2.6.37' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (321 commits)
KVM: Drop CONFIG_DMAR dependency around kvm_iommu_map_pages
KVM: Fix signature of kvm_iommu_map_pages stub
KVM: MCE: Send SRAR SIGBUS directly
KVM: MCE: Add MCG_SER_P into KVM_MCE_CAP_SUPPORTED
KVM: fix typo in copyright notice
KVM: Disable interrupts around get_kernel_ns()
KVM: MMU: Avoid sign extension in mmu_alloc_direct_roots() pae root address
KVM: MMU: move access code parsing to FNAME(walk_addr) function
KVM: MMU: audit: check whether have unsync sps after root sync
KVM: MMU: audit: introduce audit_printk to cleanup audit code
KVM: MMU: audit: unregister audit tracepoints before module unloaded
KVM: MMU: audit: fix vcpu's spte walking
KVM: MMU: set access bit for direct mapping
KVM: MMU: cleanup for error mask set while walk guest page table
KVM: MMU: update 'root_hpa' out of loop in PAE shadow path
KVM: x86 emulator: Eliminate compilation warning in x86_decode_insn()
KVM: x86: Fix constant type in kvm_get_time_scale
KVM: VMX: Add AX to list of registers clobbered by guest switch
KVM guest: Move a printk that's using the clock before it's ready
KVM: x86: TSC catchup mode
...
We have all the hypervisor pieces in place now, but the guest parts are still
missing.
This patch implements basic awareness of KVM when running Linux as guest. It
doesn't do anything with it yet though.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
To communicate with KVM directly we need to plumb some sort of interface
between the guest and KVM. Usually those interfaces use hypercalls.
This hypercall implementation is described in the last patch of the series
in a special documentation file. Please read that for further information.
This patch implements stubs to handle KVM PPC hypercalls on the host and
guest side alike.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
The new e5500 core is similar to the e500mc core but adds 64-bit
support. We support running it in 32-bit mode as it is identical to the
e500mc.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
* 'next-devicetree' of git://git.secretlab.ca/git/linux-2.6: (63 commits)
of/platform: Register of_platform_drivers with an "of:" prefix
of/address: Clean up function declarations
of/spi: call of_register_spi_devices() from spi core code
of: Provide default of_node_to_nid() implementation.
of/device: Make of_device_make_bus_id() usable by other code.
of/irq: Fix endian issues in parsing interrupt specifiers
of: Fix phandle endian issues
of/flattree: fix of_flat_dt_is_compatible() to match the full compatible string
of: remove of_default_bus_ids
of: make of_find_device_by_node generic
microblaze: remove references to of_device and to_of_device
sparc: remove references to of_device and to_of_device
powerpc: remove references to of_device and to_of_device
of/device: Replace of_device with platform_device in includes and core code
of/device: Protect against binding of_platform_drivers to non-OF devices
of: remove asm/of_device.h
of: remove asm/of_platform.h
of/platform: remove all of_bus_type and of_platform_bus_type references
of: Merge of_platform_bus_type with platform_bus_type
drivercore/of: Add OF style matching to platform bus
...
Fix up trivial conflicts in arch/microblaze/kernel/Makefile due to just
some obj-y removals by the devicetree branch, while the microblaze
updates added a new file.
We use a similar technique to ppc32: We set a thread local flag
to indicate that we are about to enter or have entered the stop
state, and have fixup code in the async interrupt entry code that
reacts to this flag to make us return to a different location
(sets NIP to LINK in our case).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
--
v2. Fix lockdep bug
Re-mask interrupts when coming back from idle
Note that critical doorbells are an unimplemented stub just like
other critical or machine check handlers, since we haven't done
support for "levelled" exceptions yet.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch merges the common routines of_device_alloc() and
of_device_make_bus_id() from powerpc and microblaze.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
CC: Michal Simek <monstr@monstr.eu>
CC: Grant Likely <grant.likely@secretlab.ca>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Stephen Rothwell <sfr@canb.auug.org.au>
CC: microblaze-uclinux@itee.uq.edu.au
CC: linuxppc-dev@ozlabs.org
CC: devicetree-discuss@lists.ozlabs.org
Implement perf-events based hw-breakpoint interfaces for PowerPC
64-bit server (Book III S) processors. This allows access to a
given location to be used as an event that can be counted or
profiled by the perf_events subsystem.
This is done using the DABR (data breakpoint register), which can
also be used for process debugging via ptrace. When perf_event
hw_breakpoint support is configured in, the perf_event subsystem
manages the DABR and arbitrates access to it, and ptrace then
creates a perf_event when it is requested to set a data breakpoint.
[Adopted suggestions from Paul Mackerras <paulus@samba.org> to
- emulate_step() all system-wide breakpoints and single-step only the
per-task breakpoints
- perform arch-specific cleanup before unregistration through
arch_unregister_hw_breakpoint()
]
Signed-off-by: K.Prasad <prasad@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This is started as swsusp_32.S modifications, but the amount of #ifdefs
made the whole file horribly unreadable, so let's put the support into
its own separate file.
The code should be relatively easy to modify to support 44x BookEs as
well, but since I don't have any 44x to test, let's confine the code to
FSL BookE. (The only FSL-specific part so far is 'flush_dcache_L1'.)
Signed-off-by: Anton Vorontsov <avorontsov@mvista.com>
Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>