Try to improve the predictability of ITS save/restores by failing
commands that would lead to failed saves. More specifically, fail any
command that adds an entry into an ITS table that is not in guest
memory, which would otherwise lead to a failed ITS save ioctl. There
are already checks for collection and device entries, but not for
ITEs. Add the corresponding check for the ITT when adding ITEs.
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220510001633.552496-2-ricarkol@google.com
Moving kvm_pmu_events into the vcpu (and refering to it) broke the
somewhat unusual case where the kernel has no support for a PMU
at all.
In order to solve this, move things around a bit so that we can
easily avoid refering to the pmu structure outside of PMU-aware
code. As a bonus, pmu.c isn't compiled in when HW_PERF_EVENTS
isn't selected.
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/202205161814.KQHpOzsJ-lkp@intel.com
There are cases where a context synchronization event is necessary
between an IRQ being raised and being handled, and there are races such
that we cannot rely upon the exception entry being subsequent to the
interrupt being raised. To fix this, we place an ISB between a read of
IAR and the subsequent invocation of an IRQ handler.
When EOI mode 1 is in use, we need to EOI an interrupt prior to invoking
its handler, and we have a write to EOIR for this. As this write to EOIR
requires an ISB, and this is provided by the gic_write_eoir() helper, we
omit the usual ISB in this case, with the logic being:
| if (static_branch_likely(&supports_deactivate_key))
| gic_write_eoir(irqnr);
| else
| isb();
This is somewhat opaque, and it would be a little clearer if there were
an unconditional ISB, with only the write to EOIR being conditional,
e.g.
| if (static_branch_likely(&supports_deactivate_key))
| write_gicreg(irqnr, ICC_EOIR1_EL1);
|
| isb();
This patch rewrites the code that way, with this logic factored into a
new helper function with comments explaining what the ISB is for, as
were originally laid out in commit:
39a06b67c2 ("irqchip/gic: Ensure we have an ISB between ack and ->handle_irq")
Note that since then, we removed the IAR polling in commit:
342677d70a ("irqchip/gic-v3: Remove acknowledge loop")
... which removed one of the two race conditions.
For consistency, other portions of the driver are made to manipulate
EOIR using write_gicreg() and explcit ISBs, and the gic_write_eoir()
helper function is removed.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220513133038.226182-3-mark.rutland@arm.com
Will reported the following splat when running with Protected KVM
enabled:
[ 2.427181] ------------[ cut here ]------------
[ 2.427668] WARNING: CPU: 3 PID: 1 at arch/arm64/kvm/mmu.c:489 __create_hyp_private_mapping+0x118/0x1ac
[ 2.428424] Modules linked in:
[ 2.429040] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 5.18.0-rc2-00084-g8635adc4efc7 #1
[ 2.429589] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
[ 2.430286] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 2.430734] pc : __create_hyp_private_mapping+0x118/0x1ac
[ 2.431091] lr : create_hyp_exec_mappings+0x40/0x80
[ 2.431377] sp : ffff80000803baf0
[ 2.431597] x29: ffff80000803bb00 x28: 0000000000000000 x27: 0000000000000000
[ 2.432156] x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000
[ 2.432561] x23: ffffcd96c343b000 x22: 0000000000000000 x21: ffff80000803bb40
[ 2.433004] x20: 0000000000000004 x19: 0000000000001800 x18: 0000000000000000
[ 2.433343] x17: 0003e68cf7efdd70 x16: 0000000000000004 x15: fffffc81f602a2c8
[ 2.434053] x14: ffffdf8380000000 x13: ffffcd9573200000 x12: ffffcd96c343b000
[ 2.434401] x11: 0000000000000004 x10: ffffcd96c1738000 x9 : 0000000000000004
[ 2.434812] x8 : ffff80000803bb40 x7 : 7f7f7f7f7f7f7f7f x6 : 544f422effff306b
[ 2.435136] x5 : 000000008020001e x4 : ffff207d80a88c00 x3 : 0000000000000005
[ 2.435480] x2 : 0000000000001800 x1 : 000000014f4ab800 x0 : 000000000badca11
[ 2.436149] Call trace:
[ 2.436600] __create_hyp_private_mapping+0x118/0x1ac
[ 2.437576] create_hyp_exec_mappings+0x40/0x80
[ 2.438180] kvm_init_vector_slots+0x180/0x194
[ 2.458941] kvm_arch_init+0x80/0x274
[ 2.459220] kvm_init+0x48/0x354
[ 2.459416] arm_init+0x20/0x2c
[ 2.459601] do_one_initcall+0xbc/0x238
[ 2.459809] do_initcall_level+0x94/0xb4
[ 2.460043] do_initcalls+0x54/0x94
[ 2.460228] do_basic_setup+0x1c/0x28
[ 2.460407] kernel_init_freeable+0x110/0x178
[ 2.460610] kernel_init+0x20/0x1a0
[ 2.460817] ret_from_fork+0x10/0x20
[ 2.461274] ---[ end trace 0000000000000000 ]---
Indeed, the Protected KVM mode promotes __create_hyp_private_mapping()
to a hypercall as EL1 no longer has access to the hypervisor's stage-1
page-table. However, the call from kvm_init_vector_slots() happens after
pKVM has been initialized on the primary CPU, but before it has been
initialized on secondaries. As such, if the KVM initcall procedure is
migrated from one CPU to another in this window, the hypercall may end up
running on a CPU for which EL2 has not been initialized.
Fortunately, the pKVM hypervisor doesn't rely on the host to re-map the
vectors in the private range, so the hypercall in question is in fact
superfluous. Skip it when pKVM is enabled.
Reported-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
[maz: simplified the checks slightly]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220513092607.35233-1-qperret@google.com
When adding support for the slightly wonky Apple M1, we had to
populate ID_AA64PFR0_EL1.GIC==1 to present something to the guest,
as the HW itself doesn't advertise the feature.
However, we gated this on the in-kernel irqchip being created.
This causes some trouble for QEMU, which snapshots the state of
the registers before creating a virtual GIC, and then tries to
restore these registers once the GIC has been created. Obviously,
between the two stages, ID_AA64PFR0_EL1.GIC has changed value,
and the write fails.
The fix is to actually emulate the HW, and always populate the
field if the HW is capable of it.
Fixes: 562e530fd7 ("KVM: arm64: Force ID_AA64PFR0_EL1.GIC=1 when exposing a virtual GICv3")
Cc: stable@vger.kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Oliver Upton <oupton@google.com>
Link: https://lore.kernel.org/r/20220503211424.3375263-1-maz@kernel.org
These constants will change over time, and userspace has no
business knowing about them. Hide them behind __KERNEL__.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Now that the pmu code does not access hyp data, reenable it in
protected mode.
Once fully supported, protected VMs will not have pmu support,
since that could leak information. However, non-protected VMs in
protected mode should have pmu support if available.
Signed-off-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220510095710.148178-5-tabba@google.com
Instead of the host accessing hyp data directly, pass the pmu
events of the current cpu to hyp via the vcpu.
This adds 64 bits (in two fields) to the vcpu that need to be
synced before every vcpu run in nvhe and protected modes.
However, it isolates the hypervisor from the host, which allows
us to use pmu in protected mode in a subsequent patch.
No visible side effects in behavior intended.
Signed-off-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220510095710.148178-4-tabba@google.com
Unsusprisingly, Apple M1 Pro/Max have the exact same defect as the
original M1 and generate random SErrors in the host when a guest
tickles the GICv3 CPU interface the wrong way.
Add the part numbers for both the CPU types found in these two
new implementations, and add them to the hall of shame. This also
applies to the Ultra version, as it is composed of 2 Max SoCs.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220514102524.3188730-1-maz@kernel.org
Between the header and the definitions, there's no line gap, and in a
couple of places a double line gap for no semantic reason, which makes
the output look a little odd.
Fix this so blocks are consistently separated with a single line gap:
* Add a newline after the "Generated file" comment line, so this is
clearly split from whatever the first definition in the file is.
* At the start of a SysregFields block there's no need for a newline as
we haven't output any sysreg encoding details prior to this.
* At the end of a Sysreg block there's no need for a newline if we
have no RES0 or RES1 fields, as there will be a line gap after the
previous element (e.g. a Fields line).
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20220513174118.266966-3-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Currently for registers without fields we create a comment pointing at
the common definitions, e.g.
| #define REG_TTBR0_EL1 S3_0_C2_C0_0
| #define SYS_TTBR0_EL1 sys_reg(3, 0, 2, 0, 0)
| #define SYS_TTBR0_EL1_Op0 3
| #define SYS_TTBR0_EL1_Op1 0
| #define SYS_TTBR0_EL1_CRn 2
| #define SYS_TTBR0_EL1_CRm 0
| #define SYS_TTBR0_EL1_Op2 0
|
| /* See TTBRx_EL1 */
It would be slightly nicer if the comment said what we should be looking
for, e.g.
| #define REG_TTBR0_EL1 S3_0_C2_C0_0
| #define SYS_TTBR0_EL1 sys_reg(3, 0, 2, 0, 0)
| #define SYS_TTBR0_EL1_Op0 3
| #define SYS_TTBR0_EL1_Op1 0
| #define SYS_TTBR0_EL1_CRn 2
| #define SYS_TTBR0_EL1_CRm 0
| #define SYS_TTBR0_EL1_Op2 0
|
| /* For TTBR0_EL1 fields see TTBRx_EL1 */
Update the comment generation accordingly.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20220513174118.266966-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Pull misc fixes from Andrew Morton:
"Seven MM fixes, three of which address issues added in the most recent
merge window, four of which are cc:stable.
Three non-MM fixes, none very serious"
[ And yes, that's a real pull request from Andrew, not me creating a
branch from emailed patches. Woo-hoo! ]
* tag 'mm-hotfixes-stable-2022-05-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
MAINTAINERS: add a mailing list for DAMON development
selftests: vm: Makefile: rename TARGETS to VMTARGETS
mm/kfence: reset PG_slab and memcg_data before freeing __kfence_pool
mailmap: add entry for martyna.szapar-mudlaw@intel.com
arm[64]/memremap: don't abuse pfn_valid() to ensure presence of linear map
procfs: prevent unprivileged processes accessing fdinfo dir
mm: mremap: fix sign for EFAULT error return value
mm/hwpoison: use pr_err() instead of dump_page() in get_any_page()
mm/huge_memory: do not overkill when splitting huge_zero_page
Revert "mm/memory-failure.c: skip huge_zero_page in memory_failure()"
The MT7622 device tree never bothered to specify the number of virtual DMA
channels for the HSDMA controller, always falling back to the default value of
3. Make this value explicit, in order to avoid the following dmesg notification:
mtk_hsdma 1b007000.dma-controller: Using 3 as missing dma-requests property
Signed-off-by: Rui Salvaterra <rsalvaterra@gmail.com>
Link: https://lore.kernel.org/r/20220429084225.298213-1-rsalvaterra@gmail.com
Signed-off-by: Matthias Brugger <matthias.bgg@gmail.com>
Visconti device tree updates for 5.19
- Update the clock providers for PCIe host controller
- Update the clock providers for ethernet device
- Update the clock providers for SPI
- Update the clock providers for watchdog timer
- Update the clock providers for I2C
- Update the clock providers for UART
- Add clock controller support for TMPV7708
* tag 'visconti-arm-dt-for-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/iwamatsu/linux-visconti:
arm64: dts: visconti: Update the clock providers for PCIe host controller
arm64: dts: visconti: Update the clock providers for ethernet device
arm64: dts: visconti: Update the clock providers for SPI
arm64: dts: visconti: Update the clock providers for watchdog timer
arm64: dts: visconti: Update the clock providers for I2C
arm64: dts: visconti: Update the clock providers for UART
arm64: dts: visconti: Add clock controller support for TMPV7708
Link: https://lore.kernel.org/r/TYWPR01MB94201E842A2F8E5E9EBF740D92C99@TYWPR01MB9420.jpnprd01.prod.outlook.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
New peripherals supported on rk356x: sfc, usb3, sata and
the video-decoder on rk3328.
RK3399 received some improvements and nodes for the memory controller.
Additional peripherals for PineNote, Gru and BananaPi-R2-Pro.
New boards are the Firefly Station M2, Pine64 SoQuartz SOM and
Quartz64 model B as well as the Radxa Rock3 model A.
* tag 'v5.19-rockchip-dts64-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip: (32 commits)
arm64: dts: rockchip: enable otg/drd operation of usb_host0_xhci in rk356x
arm64: dts: rockchip: rename HDMI ref clock to 'ref' on rk3399
arm64: dts: rockchip: add dts for Firefly Station M2 rk3566
arm64: dts: rockchip: add SoQuartz CM4IO dts
arm64: dts: rockchip: add Pine64 Quartz64-B device tree
dt-bindings: arm: rockchip: Add Firefly Station M2
dt-bindings: arm: rockchip: Add Pine64 SoQuartz SoM
dt-bindings: arm: rockchip: Add Pine64 Quartz64 Model B
arm64: dts: rockchip: enable usb hub on the radxa rock3 model a
arm64: dts: rockchip: add usb3 support to the radxa rock3 model a
arm64: dts: rockchip: add rk356x sfc support
arm64: dts: rockchip: Add USB and TCPC to rk3566-pinenote
arm64: dts: rockchip: Add accelerometer to rk3566-pinenote
arm64: dts: rockchip: add an input enable pinconf to rk3399
arm64: dts: rockchip: Add vdec support for RK3328
arm64: dts: rockchip: Rename vdec_mmu node for RK3328
arm64: dts: rockchip: Enable dmc and dfi nodes on gru
arm64: dts: rockchip: Add dfi and dmc nodes to rk3399
arm64: dts: rockchip: add clocks property to cru nodes rk3399
arm64: dts: rockchip: use generic node name for pmucru on rk3399
...
Link: https://lore.kernel.org/r/7748558.DvuYhMxLoT@phil
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes for the mass-production version of BananaPi R2-Pro.
The mass market version received some changes compared to
preproduction versions and especially the io-domain setting
could affect the lifespan of the board if the wrong dt
gets booted on it.
* tag 'v5.18-rockchip-dtsfixes1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
arm64: dts: rockchip: Add gmac1 and change network settings of bpi-r2-pro
arm64: dts: rockchip: Change io-domains of bpi-r2-pro
Link: https://lore.kernel.org/r/2300256.NG923GbCHz@phil
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Add KRYO4XX gold/big cores to the list of CPUs that need the
repeat TLBI workaround. Apply this to the affected
KRYO4XX cores (rcpe to rfpe).
The variant and revision bits are implementation defined and are
different from the their Cortex CPU counterparts on which they are
based on, i.e., (r0p0 to r3p0) is equivalent to (rcpe to rfpe).
Signed-off-by: Shreyas K K <quic_shrekk@quicinc.com>
Reviewed-by: Sai Prakash Ranjan <quic_saipraka@quicinc.com>
Link: https://lore.kernel.org/r/20220512110134.12179-1-quic_shrekk@quicinc.com
Signed-off-by: Will Deacon <will@kernel.org>
The ID register table should have one entry per ID register but
currently has two entries for ID_AA64ISAR2_EL1. Only one entry has an
override, and get_arm64_ftr_reg() can end up choosing the other, causing
the override to be ignored. Fix this by removing the duplicate entry.
While here, also make the check in sort_ftr_regs() more strict so that
duplicate entries can't be added in the future.
Fixes: def8c222f0 ("arm64: Add support of PAuth QARMA3 architected algorithm")
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20220511162030.1403386-1-kristina.martsenko@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
swiotlb-xen uses very different ways to allocate coherent memory on x86
vs arm. On the former it allocates memory from the page allocator, while
on the later it reuses the dma-direct allocator the handles the
complexities of non-coherent DMA on arm platforms.
Unfortunately the complexities of trying to deal with the two cases in
the swiotlb-xen.c code lead to a bug in the handling of
DMA_ATTR_NO_KERNEL_MAPPING on arm. With the DMA_ATTR_NO_KERNEL_MAPPING
flag the coherent memory allocator does not actually allocate coherent
memory, but just a DMA handle for some memory that is DMA addressable
by the device, but which does not have to have a kernel mapping. Thus
dereferencing the return value will lead to kernel crashed and memory
corruption.
Fix this by using the dma-direct allocator directly for arm, which works
perfectly fine because on arm swiotlb-xen is only used when the domain is
1:1 mapped, and then simplifying the remaining code to only cater for the
x86 case with DMA coherent device.
Reported-by: Rahul Singh <Rahul.Singh@arm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Rahul Singh <rahul.singh@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Tested-by: Rahul Singh <rahul.singh@arm.com>
Many architectures have similar install.sh scripts.
The first half is really generic; it verifies that the kernel image
and System.map exist, then executes ~/bin/${INSTALLKERNEL} or
/sbin/${INSTALLKERNEL} if available.
The second half is kind of arch-specific; it copies the kernel image
and System.map to the destination, but the code is slightly different.
Factor out the generic part into scripts/install.sh.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nicolas Schier <n.schier@avm.de>
The pVM-specific FP/SIMD trap handler just calls straight into the
generic trap handler. Avoid the indirection and just call the hyp
handler directly.
Note that the BUILD_BUG_ON() pattern is repeated in
pvm_init_traps_aa64pfr0(), which is likely a better home for it.
No functional change intended.
Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220509162559.2387784-2-oupton@google.com