The zpci_remove_device() function removes the device from the PCI common
code core which is an operation dealing primarily with the zbus and PCI
bus code. With that and to match an upcoming refactoring of the
symmetric scanning part move it to the bus code.
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
A zPCI event with PEC 0x0301 for an existing zPCI device goes through
the same actions as enable_slot(). Similarly a zPCI event with PEC
0x0303 does the same steps as disable_slot().
We can thus unify both actions as zpci_configure_device() respectively
zpci_deconfigure_device().
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
The current zdev->state mixes the configuration states supported by CLP
with an additional Online state which is used inconsistently to include
enabled zPCI functions which are not yet visible to the common PCI
subsytem. In preparation for a clean separation between architected
configuration states and fine grained function states remove the Online
function state.
Where we previously checked for Online it is more accurate to check if
the function is enabled to avoid an edge case where a disabled device
was still treated as Online. This also simplifies checks whether
a function is configured as this is now directly reflected by its
function state.
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
In commit 05bc1be6db ("s390/pci: create zPCI bus") we removed the
pci_dev_put() call matching the earlier pci_get_slot() done as part of
__zpci_event_availability(). This was based on the wrong understanding
that the device_put() done as part of pci_destroy_device() would counter
the pci_get_slot() when it only counters the initial reference. This
same understanding and existing bad example also lead to not doing
a pci_dev_put() in zpci_remove_device().
Since releasing the PCI devices, unlike releasing the PCI slot, does not
print any debug message for testing I added one in pci_release_dev().
This revealed that we are indeed leaking the PCI device on PCI
hotunplug. Further testing also revealed another missing pci_dev_put() in
disable_slot().
Fix this by adding the missing pci_dev_put() in disable_slot() and fix
zpci_remove_device() with the correct pci_dev_put() calls. Also instead
of calling pci_get_slot() in __zpci_event_availability() to determine if
a PCI device is registered and then doing the same again in
zpci_remove_device() do this once in zpci_remove_device() which makes
sure that the pdev in __zpci_event_availability() is only used for the
result of pci_scan_single_device() which does not need a reference count
decremnt as its ownership goes to the PCI bus.
Also move the check if zdev->zbus->bus is set into zpci_remove_device()
since it may be that we're removing a device with devfn != 0 which never
had a PCI bus. So we can still set the pdev->error_state to indicate
that the device is not usable anymore, add a flag to set the error state.
Fixes: 05bc1be6db ("s390/pci: create zPCI bus")
Cc: <stable@vger.kernel.org> # 5.8+: e1bff843cd s390/pci: remove superfluous zdev->zbus check
Cc: <stable@vger.kernel.org> # 5.8+: ba764dd703 s390/pci: refactor zpci_create_device()
Cc: <stable@vger.kernel.org> # 5.8+
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
When we intercept a DIAG_9C from the guest we verify that the
target real CPU associated with the virtual CPU designated by
the guest is running and if not we forward the DIAG_9C to the
target real CPU.
To avoid a diag9c storm we allow a maximal rate of diag9c forwarding.
The rate is calculated as a count per second defined as a new
parameter of the s390 kvm module: diag9c_forwarding_hz .
The default value of 0 is to not forward diag9c.
Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
Link: https://lore.kernel.org/r/1613997661-22525-2-git-send-email-pmorel@linux.ibm.com
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Get rid of unsigned long long, and use unsigned long instead
everywhere. The usage of unsigned long long is a leftover from
31 bit kernel support.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
- Fix physical vs virtual confusion in some basic mm macros and
routines. Caused by __pa == __va on s390 currently.
- Get rid of on-stack cpu masks.
- Add support for complete CPU counter set extraction.
- Add arch_irq_work_raise implementation.
- virtio-ccw revision and opcode fixes.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAmA5UHMACgkQjYWKoQLX
FBh/Vgf/ezaC/qx8cPAJKemWTST5tK/cc3g63L5SlvIsiKTTcO08+PYpGEo5ajU8
0DDpDZdP+DYeDgLatrFFj+MlQyIL4wd602uKiRqD1LwjTR1oA+6HDDQtE41v2Z2a
t2Kuv32dGVT5361RythnerTdMx18XG6k77JprP6b7zCFa2qCpc8DaKk7FvcqnQt+
gfebknC/hdL15IJhtpPrIoqqerDXxB+HU+dSouipQwiamtENGFOuJ5csgl9XIJuR
ILzq3Ad/6u8yKf77c+q9WRvu3DTIsXXSY8xwl3zJjNqXcAnN2sa4apc7+noNJ0YZ
UkbokSshQomfOYMhIxyLs9AKEEcltg==
=YVkj
-----END PGP SIGNATURE-----
Merge tag 's390-5.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull more s390 updates from Vasily Gorbik:
- Fix physical vs virtual confusion in some basic mm macros and
routines. Caused by __pa == __va on s390 currently.
- Get rid of on-stack cpu masks.
- Add support for complete CPU counter set extraction.
- Add arch_irq_work_raise implementation.
- virtio-ccw revision and opcode fixes.
* tag 's390-5.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/cpumf: Add support for complete counter set extraction
virtio/s390: implement virtio-ccw revision 2 correctly
s390/smp: implement arch_irq_work_raise()
s390/topology: move cpumasks away from stack
s390/smp: smp_emergency_stop() - move cpumask away from stack
s390/smp: __smp_rescan_cpus() - move cpumask away from stack
s390/smp: consolidate locking for smp_rescan()
s390/mm: fix phys vs virt confusion in vmem_*() functions family
s390/mm: fix phys vs virt confusion in pgtable allocation routines
s390/mm: fix invalid __pa() usage in pfn_pXd() macros
s390/mm: make pXd_deref() macros return a pointer
s390/opcodes: rename selhhhr to selfhr
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmA2xiQUHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vzRDA/9GCyEskI9DMtyT9UeoTMzpHcUZpaU
eCbLa2BSPjOKlrHLnPY7IwE0nT7ihe4OOcm8uOYOWtulE46XJNCHfxlUYP3SbI0Y
JlG0FBCh4ldzCzzKsftwkSvVhk+gn+ms9ucJ8q2iBSOXVhG/41IbX7++8IfbQM4v
VHjdYUmTCCiOSRDtBVi82p4+GAHxH8IhaB0gDNb1Q7myj+qJKL5nKjK/nukgO0fO
UpCnSxyua48Ij+c59Y1QAIhGeORq5Gg5Q4ussY3FxS9ovhZODEGQwCFniTfilqRw
wEB9Fb8tiPY60ljEyDPnERMkiW69zutTJqOY4LfwmoRM9IEbxD6VPIqF5gin8sB7
pHhX4KUU+eB1hQdK9SGKjkwyehquNKzTdxsu2jccltOKwBm5jcXYeOvu2bJTzZn+
rrZPYJoA1dQig3bEuOzsBxvW4Jaj7IsVfVcao4OzXyh8Y7tLr9kVDXxr7JC/EkPM
zRK24yglERD2J1JXgNMvOuJQj6JmRHhEbV/faZci8x8ZEaz1FawRAUZqHf/gGmnW
2CllarHbRnchPyD8btv03Mp84WG6fCfKy7zG2D8HxOsiStDO/5ICehHtGcvYg7IL
RuE4Tj8OKdcbw/8cO4C3842FqiSj34+jooNIHSLyBqcpJam6VsN4XqNIZCL+DeG5
Q2JXruAaahTWOZg=
=GXL5
-----END PGP SIGNATURE-----
Merge tag 'pci-v5.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas:
"Enumeration:
- Remove unnecessary locking around _OSC (Bjorn Helgaas)
- Clarify message about _OSC failure (Bjorn Helgaas)
- Remove notification of PCIe bandwidth changes (Bjorn Helgaas)
- Tidy checking of syscall user config accessors (Heiner Kallweit)
Resource management:
- Decline to resize resources if boot config must be preserved (Ard
Biesheuvel)
- Fix pci_register_io_range() memory leak (Geert Uytterhoeven)
Error handling (Keith Busch):
- Clear error status from the correct device
- Retain error recovery status so drivers can use it after reset
- Log the type of Port (Root or Switch Downstream) that we reset
- Always request a reset for Downstream Ports in frozen state
Endpoint framework and NTB (Kishon Vijay Abraham I):
- Make *_get_first_free_bar() take into account 64 bit BAR
- Add helper API to get the 'next' unreserved BAR
- Make *_free_bar() return error codes on failure
- Remove unused pci_epf_match_device()
- Add support to associate secondary EPC with EPF
- Add support in configfs to associate two EPCs with EPF
- Add pci_epc_ops to map MSI IRQ
- Add pci_epf_ops to expose function-specific attrs
- Allow user to create sub-directory of 'EPF Device' directory
- Implement ->msi_map_irq() ops for cadence
- Configure LM_EP_FUNC_CFG based on epc->function_num_map for cadence
- Add EP function driver to provide NTB functionality
- Add support for EPF PCI Non-Transparent Bridge
- Add specification for PCI NTB function device
- Add PCI endpoint NTB function user guide
- Add configfs binding documentation for pci-ntb endpoint function
Broadcom STB PCIe controller driver:
- Add support for BCM4908 and external PERST# signal controller
(Rafał Miłecki)
Cadence PCIe controller driver:
- Retrain Link to work around Gen2 training defect (Nadeem Athani)
- Fix merge botch in cdns_pcie_host_map_dma_ranges() (Krzysztof
Wilczyński)
Freescale Layerscape PCIe controller driver:
- Add LX2160A rev2 EP mode support (Hou Zhiqiang)
- Convert to builtin_platform_driver() (Michael Walle)
MediaTek PCIe controller driver:
- Fix OF node reference leak (Krzysztof Wilczyński)
Microchip PolarFlare PCIe controller driver:
- Add Microchip PolarFire PCIe controller driver (Daire McNamara)
Qualcomm PCIe controller driver:
- Use PHY_REFCLK_USE_PAD only for ipq8064 (Ansuel Smith)
- Add support for ddrss_sf_tbu clock for sm8250 (Dmitry Baryshkov)
Renesas R-Car PCIe controller driver:
- Drop PCIE_RCAR config option (Lad Prabhakar)
- Always allocate MSI addresses in 32bit space (Marek Vasut)
Rockchip PCIe controller driver:
- Add FriendlyARM NanoPi M4B DT binding (Chen-Yu Tsai)
- Make 'ep-gpios' DT property optional (Chen-Yu Tsai)
Synopsys DesignWare PCIe controller driver:
- Work around ECRC configuration hardware defect (Vidya Sagar)
- Drop support for config space in DT 'ranges' (Rob Herring)
- Change size to u64 for EP outbound iATU (Shradha Todi)
- Add upper limit address for outbound iATU (Shradha Todi)
- Make dw_pcie ops optional (Jisheng Zhang)
- Remove unnecessary dw_pcie_ops from al driver (Jisheng Zhang)
Xilinx Versal CPM PCIe controller driver:
- Fix OF node reference leak (Pan Bian)
Miscellaneous:
- Remove tango host controller driver (Arnd Bergmann)
- Remove IRQ handler & data together (altera-msi, brcmstb, dwc)
(Martin Kaiser)
- Fix xgene-msi race in installing chained IRQ handler (Martin
Kaiser)
- Apply CONFIG_PCI_DEBUG to entire drivers/pci hierarchy (Junhao He)
- Fix pci-bridge-emul array overruns (Russell King)
- Remove obsolete uses of WARN_ON(in_interrupt()) (Sebastian Andrzej
Siewior)"
* tag 'pci-v5.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (69 commits)
PCI: qcom: Use PHY_REFCLK_USE_PAD only for ipq8064
PCI: qcom: Add support for ddrss_sf_tbu clock
dt-bindings: PCI: qcom: Document ddrss_sf_tbu clock for sm8250
PCI: al: Remove useless dw_pcie_ops
PCI: dwc: Don't assume the ops in dw_pcie always exist
PCI: dwc: Add upper limit address for outbound iATU
PCI: dwc: Change size to u64 for EP outbound iATU
PCI: dwc: Drop support for config space in 'ranges'
PCI: layerscape: Convert to builtin_platform_driver()
PCI: layerscape: Add LX2160A rev2 EP mode support
dt-bindings: PCI: layerscape: Add LX2160A rev2 compatible strings
PCI: dwc: Work around ECRC configuration issue
PCI/portdrv: Report reset for frozen channel
PCI/AER: Specify the type of Port that was reset
PCI/ERR: Retain status from error notification
PCI/AER: Clear AER status from Root Port when resetting Downstream Port
PCI/ERR: Clear status of the reporting device
dt-bindings: arm: rockchip: Add FriendlyARM NanoPi M4B
PCI: rockchip: Make 'ep-gpios' DT property optional
Documentation: PCI: Add PCI endpoint NTB function user guide
...
The irq stack switching was moved out of the ASM entry code in course of
the entry code consolidation. It ended up being suboptimal in various
ways.
- Make the stack switching inline so the stackpointer manipulation is not
longer at an easy to find place.
- Get rid of the unnecessary indirect call.
- Avoid the double stack switching in interrupt return and reuse the
interrupt stack for softirq handling.
- A objtool fix for CONFIG_FRAME_POINTER=y builds where it got confused
about the stack pointer manipulation.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmA21OcTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoaX0D/9S0ud6oqbsIvI8LwhvYub63a2cjKP9
liHAJ7xwMYYVwzf0skwsPb/QE6+onCzdq0upJkgG/gEYm2KbiaMWZ4GgHdj0O7ER
qXKJONDd36AGxSEdaVzLY5kPuD/mkomGk5QdaZaTmjruthkNzg4y/N2wXUBIMZR0
FdpSpp5fGspSZCn/DXDx6FjClwpLI53VclvDs6DcZ2DIBA0K+F/cSLb1UQoDLE1U
hxGeuNa+GhKeeZ5C+q5giho1+ukbwtjMW9WnKHAVNiStjm0uzdqq7ERGi/REvkcB
LY62u5uOSW1zIBMmzUjDDQEqvypB0iFxFCpN8g9sieZjA0zkaUioRTQyR+YIQ8Cp
l8LLir0dVQivR1bHghHDKQJUpdw/4zvDj4mMH10XHqbcOtIxJDOJHC5D00ridsAz
OK0RlbAJBl9FTdLNfdVReBCoehYAO8oefeyMAG12nZeSh5XVUWl238rvzmzIYNhG
cEtkSx2wIUNEA+uSuI+xvfmwpxL7voTGvqmiRDCAFxyO7Bl/GBu9OEBFA1eOvHB+
+wTmPDMswRetQNh4QCRXzk1JzP1Wk5CobUL9iinCWFoTJmnsPPSOWlosN6ewaNXt
kYFpRLy5xt9EP7dlfgBSjiRlthDhTdMrFjD5bsy1vdm1w7HKUo82lHa4O8Hq3PHS
tinKICUqRsbjig==
=Sqr1
-----END PGP SIGNATURE-----
Merge tag 'x86-entry-2021-02-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 irq entry updates from Thomas Gleixner:
"The irq stack switching was moved out of the ASM entry code in course
of the entry code consolidation. It ended up being suboptimal in
various ways.
This reworks the X86 irq stack handling:
- Make the stack switching inline so the stackpointer manipulation is
not longer at an easy to find place.
- Get rid of the unnecessary indirect call.
- Avoid the double stack switching in interrupt return and reuse the
interrupt stack for softirq handling.
- A objtool fix for CONFIG_FRAME_POINTER=y builds where it got
confused about the stack pointer manipulation"
* tag 'x86-entry-2021-02-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Fix stack-swizzle for FRAME_POINTER=y
um: Enforce the usage of asm-generic/softirq_stack.h
x86/softirq/64: Inline do_softirq_own_stack()
softirq: Move do_softirq_own_stack() to generic asm header
softirq: Move __ARCH_HAS_DO_SOFTIRQ to Kconfig
x86: Select CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK
x86/softirq: Remove indirection in do_softirq_own_stack()
x86/entry: Use run_sysvec_on_irqstack_cond() for XEN upcall
x86/entry: Convert device interrupts to inline stack switching
x86/entry: Convert system vectors to irq stack macro
x86/irq: Provide macro for inlining irq stack switching
x86/apic: Split out spurious handling code
x86/irq/64: Adjust the per CPU irq stack pointer by 8
x86/irq: Sanitize irq stack tracking
x86/entry: Fix instrumentation annotation
Add support to the CPU Measurement counter facility device driver
to extract complete counter sets per CPU and per counter set from user
space. This includes a new device named /dev/hwctr and support
for the device driver functions open, close and ioctl. Other
functions are not supported.
The ioctl command supports 3 subcommands:
S390_HWCTR_START: enables counter sets on a list of CPUs.
S390_HWCTR_STOP: disables counter sets on a list of CPUs.
S390_HWCTR_READ: reads counter sets on a list of CPUs.
The ioctl(..., S390_HWCTR_READ, ...) is the only subcommand which
returns data. It requires member data_bytes to be positive and
indicates the maximum amount of data available to store counter set
data. The other ioctl() subcommands do not use this member and it
should be set to zero.
The S390_HWCTR_READ subcommand returns the following data:
The cpuset data is flattened using the following scheme, stored in member
data:
0x0 0x8 0xc 0x10 0x10 0x18 0x20 0x28 0xU-1
+---------+-----+---------+-----+---------+-----+-----+------+------+
| no_cpus | cpu | no_sets | set | no_cnts | cv1 | cv2 | .... | cv_n |
+---------+-----+---------+-----+---------+-----+-----+------+------+
0xU 0xU+4 0xU+8 0xU+10 0xV-1
+-----+---------+-----+-----+------+------+
| set | no_cnts | cv1 | cv2 | .... | cv_n |
+-----+---------+-----+-----+------+------+
0xV 0xV+4 0xV+8 0xV+c
+-----+---------+-----+---------+-----+-----+------+------+
| cpu | no_sets | set | no_cnts | cv1 | cv2 | .... | cv_n |
+-----+---------+-----+---------+-----+-----+------+------+
U and V denote arbitrary hexadezimal addresses.
The first integer represents the number of CPUs data was extracted
from. This is followed by CPU number and number of counter sets extracted.
Both are two integer values. This is followed by the set identifer
and number of counters extracted. Both are two integer values. This is
followed by the counter values, each element is eight bytes in size.
The S390_HWCTR_READ ioctl subcommand is also limited to one call per
minute. This ensures that an application does not read out the
counter sets too often and reduces the overall CPU performance.
The complete counter set extraction is an expensive operation.
Reviewed-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
The immediate need to have this is to have bpf_send_signal() send the
signal ASAP instead of during the next hrtimer interrupt. However, it
should also improve irq_work_queue() latencies in general, as well as
get s390 out of the lame architectures list [1].
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/irq_work.c?h=v5.11#n45
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
The physical address of page tables is passed around and
used as virtual address in various locations.
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
There is little sense in applying __pa() to a physical
address, but that what pfn_pXd() macros do.
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
This update fixes semantics of pXd_deref macros which
are expected to return a CPU-addressable pointer.
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
- Convert to using the generic entry infrastructure.
- Add vdso time namespace support.
- Switch s390 and alpha to 64-bit ino_t. As discussed here
lkml.kernel.org/r/YCV7QiyoweJwvN+m@osiris
- Get rid of expensive stck (store clock) usages where possible. Utilize
cpu alternatives to patch stckf when supported.
- Make tod_clock usage less error prone by converting it to a union and
rework code which is using it.
- Machine check handler fixes and cleanups.
- Drop couple of minor inline asm optimizations to fix clang build.
- Default configs changes notably to make libvirt happy.
- Various changes to rework and improve qdio code.
- Other small various fixes and improvements all over the code.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAmAyzcwACgkQjYWKoQLX
FBjjMwgAmeY3oMkj93bnUF/OnbYTJQ0ZHmlyeboKt7SnFyvNpOVGyRfl7+fPHsNu
+t9QZQk0f7fSxbcC04gz0ZMw1YbTjWihgZJsN6s+qtrRsv/kVqKr7kvhFrcs8uSZ
rLiwIRWGVAbprnJZWCNqaGpKkOM0wPYZ5W3Mtnoxe4nTM2LwSu2RWI8ibTGYLQPy
FybKos2hYOFBTGQdrxmg1zAvpE8DJg4qQNLhYvnmHd8Bw/FNBmoyhx8rS8z06NmS
dWMk7pfvQaslIIaFC3Yo7/sJVa/JJH33FlBonc+MSO8OZz5O6vG4bk9ZHq6DfHUH
V1I38xiBdYdSXDq8QqT3N9d+CtjeMQ==
=Lt/v
-----END PGP SIGNATURE-----
Merge tag 's390-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Vasily Gorbik:
- Convert to using the generic entry infrastructure.
- Add vdso time namespace support.
- Switch s390 and alpha to 64-bit ino_t. As discussed at
https://lore.kernel.org/linux-mm/YCV7QiyoweJwvN+m@osiris/
- Get rid of expensive stck (store clock) usages where possible.
Utilize cpu alternatives to patch stckf when supported.
- Make tod_clock usage less error prone by converting it to a union and
rework code which is using it.
- Machine check handler fixes and cleanups.
- Drop couple of minor inline asm optimizations to fix clang build.
- Default configs changes notably to make libvirt happy.
- Various changes to rework and improve qdio code.
- Other small various fixes and improvements all over the code.
* tag 's390-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (68 commits)
s390/qdio: remove 'merge_pending' mechanism
s390/qdio: improve handling of PENDING buffers for QEBSM devices
s390/qdio: rework q->qdio_error indication
s390/qdio: inline qdio_kick_handler()
s390/time: remove get_tod_clock_ext()
s390/crypto: use store_tod_clock_ext()
s390/hypfs: use store_tod_clock_ext()
s390/debug: use union tod_clock
s390/kvm: use union tod_clock
s390/vdso: use union tod_clock
s390/time: convert tod_clock_base to union
s390/time: introduce new store_tod_clock_ext()
s390/time: rename store_tod_clock_ext() and use union tod_clock
s390/time: introduce union tod_clock
s390,alpha: switch to 64-bit ino_t
s390: split cleanup_sie
s390: use r13 in cleanup_sie as temp register
s390: fix kernel asce loading when sie is interrupted
s390: add stack for machine check handler
s390: use WRITE_ONCE when re-allocating async stack
...
- Support for userspace to emulate Xen hypercalls
- Raise the maximum number of user memslots
- Scalability improvements for the new MMU. Instead of the complex
"fast page fault" logic that is used in mmu.c, tdp_mmu.c uses an
rwlock so that page faults are concurrent, but the code that can run
against page faults is limited. Right now only page faults take the
lock for reading; in the future this will be extended to some
cases of page table destruction. I hope to switch the default MMU
around 5.12-rc3 (some testing was delayed due to Chinese New Year).
- Cleanups for MAXPHYADDR checks
- Use static calls for vendor-specific callbacks
- On AMD, use VMLOAD/VMSAVE to save and restore host state
- Stop using deprecated jump label APIs
- Workaround for AMD erratum that made nested virtualization unreliable
- Support for LBR emulation in the guest
- Support for communicating bus lock vmexits to userspace
- Add support for SEV attestation command
- Miscellaneous cleanups
PPC:
- Support for second data watchpoint on POWER10
- Remove some complex workarounds for buggy early versions of POWER9
- Guest entry/exit fixes
ARM64
- Make the nVHE EL2 object relocatable
- Cleanups for concurrent translation faults hitting the same page
- Support for the standard TRNG hypervisor call
- A bunch of small PMU/Debug fixes
- Simplification of the early init hypercall handling
Non-KVM changes (with acks):
- Detection of contended rwlocks (implemented only for qrwlocks,
because KVM only needs it for x86)
- Allow __DISABLE_EXPORTS from assembly code
- Provide a saner follow_pfn replacements for modules
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmApSRgUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroOc7wf9FnlinKoTFaSk7oeuuhF/CoCVwSFs
Z9+A2sNI99tWHQxFR6dyDkEFeQoXnqSxfLHtUVIdH/JnTg0FkEvFz3NK+0PzY1PF
PnGNbSoyhP58mSBG4gbBAxdF3ZJZMB8GBgYPeR62PvMX2dYbcHqVBNhlf6W4MQK4
5mAUuAnbf19O5N267sND+sIg3wwJYwOZpRZB7PlwvfKAGKf18gdBz5dQ/6Ej+apf
P7GODZITjqM5Iho7SDm/sYJlZprFZT81KqffwJQHWFMEcxFgwzrnYPx7J3gFwRTR
eeh9E61eCBDyCTPpHROLuNTVBqrAioCqXLdKOtO5gKvZI3zmomvAsZ8uXQ==
=uFZU
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"x86:
- Support for userspace to emulate Xen hypercalls
- Raise the maximum number of user memslots
- Scalability improvements for the new MMU.
Instead of the complex "fast page fault" logic that is used in
mmu.c, tdp_mmu.c uses an rwlock so that page faults are concurrent,
but the code that can run against page faults is limited. Right now
only page faults take the lock for reading; in the future this will
be extended to some cases of page table destruction. I hope to
switch the default MMU around 5.12-rc3 (some testing was delayed
due to Chinese New Year).
- Cleanups for MAXPHYADDR checks
- Use static calls for vendor-specific callbacks
- On AMD, use VMLOAD/VMSAVE to save and restore host state
- Stop using deprecated jump label APIs
- Workaround for AMD erratum that made nested virtualization
unreliable
- Support for LBR emulation in the guest
- Support for communicating bus lock vmexits to userspace
- Add support for SEV attestation command
- Miscellaneous cleanups
PPC:
- Support for second data watchpoint on POWER10
- Remove some complex workarounds for buggy early versions of POWER9
- Guest entry/exit fixes
ARM64:
- Make the nVHE EL2 object relocatable
- Cleanups for concurrent translation faults hitting the same page
- Support for the standard TRNG hypervisor call
- A bunch of small PMU/Debug fixes
- Simplification of the early init hypercall handling
Non-KVM changes (with acks):
- Detection of contended rwlocks (implemented only for qrwlocks,
because KVM only needs it for x86)
- Allow __DISABLE_EXPORTS from assembly code
- Provide a saner follow_pfn replacements for modules"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (192 commits)
KVM: x86/xen: Explicitly pad struct compat_vcpu_info to 64 bytes
KVM: selftests: Don't bother mapping GVA for Xen shinfo test
KVM: selftests: Fix hex vs. decimal snafu in Xen test
KVM: selftests: Fix size of memslots created by Xen tests
KVM: selftests: Ignore recently added Xen tests' build output
KVM: selftests: Add missing header file needed by xAPIC IPI tests
KVM: selftests: Add operand to vmsave/vmload/vmrun in svm.c
KVM: SVM: Make symbol 'svm_gp_erratum_intercept' static
locking/arch: Move qrwlock.h include after qspinlock.h
KVM: PPC: Book3S HV: Fix host radix SLB optimisation with hash guests
KVM: PPC: Book3S HV: Ensure radix guest has no SLB entries
KVM: PPC: Don't always report hash MMU capability for P9 < DD2.2
KVM: PPC: Book3S HV: Save and restore FSCR in the P9 path
KVM: PPC: remove unneeded semicolon
KVM: PPC: Book3S HV: Use POWER9 SLBIA IH=6 variant to clear SLB
KVM: PPC: Book3S HV: No need to clear radix host SLB before loading HPT guest
KVM: PPC: Book3S HV: Fix radix guest SLB side channel
KVM: PPC: Book3S HV: Remove support for running HPT guest on RPT host without mixed mode support
KVM: PPC: Book3S HV: Introduce new capability for 2nd DAWR
KVM: PPC: Book3S HV: Add infrastructure to support 2nd DAWR
...
For non-QEBSM devices, get_buf_states() merges PENDING and EMPTY buffers
into a single group of finished buffers. To allow the upper-layer driver
to differentiate between the two states, qdio_check_pending() looks at
each buffer's state again and sets the sbal_state flag to
QDIO_OUTBUF_STATE_FLAG_PENDING accordingly.
So effectively we're spending overhead on _every_ Output Queue
inspection, just to avoid some additional TX completion calls in case
a group of buffers has completed with mixed EMPTY / PENDING state.
Given that PENDING buffers should rarely occur, this is a bad trade-off.
In particular so as the additional checks in get_buf_states() affect
_all_ device types (even those that don't use the PENDING state).
Rip it all out, and just report the PENDING completions separately as
we already do for QEBSM devices.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
For QEBSM devices the 'merge_pending' mechanism in get_buf_states()
doesn't apply, and we can actually get SLSB_P_OUTPUT_PENDING returned.
So for this case propagating the PENDING state to the driver via the
queue's sbal_state doesn't make sense and creates unnecessary overhead.
Instead introduce a new QDIO_ERROR_* flag that gets passed to the
driver, and triggers the same processing as if the buffers were flagged
as QDIO_OUTBUF_STATE_FLAG_PENDING.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Remove get_tod_clock_ext() and the STORE_CLOCK_EXT_SIZE define. This
enforces all users of the existing low level functions to use a union
tod_clock.
This way there is now a compile time check for the correct time and
therefore also if the size of the argument matches what will be
written to by the STORE CLOCK EXTENDED instruction.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Convert tod_clock_base to union tod_clock. This simplifies quite a bit
of code and also fixes a bug in read_persistent_clock64();
void read_persistent_clock64(struct timespec64 *ts)
{
__u64 delta;
delta = initial_leap_seconds + TOD_UNIX_EPOCH;
get_tod_clock_ext(clk);
*(__u64 *) &clk[1] -= delta;
if (*(__u64 *) &clk[1] > delta)
clk[0]--;
ext_to_timespec64(clk, ts);
}
Assume &clk[1] == 3 and delta == 2; then after the substraction the if
condition becomes true and the epoch part of the clock is decremented
by one because of an assumed overflow, even though there is none.
Fix this by using 128 bit arithmetics and let the compiler do the
right thing:
void read_persistent_clock64(struct timespec64 *ts)
{
union tod_clock clk;
u64 delta;
delta = initial_leap_seconds + TOD_UNIX_EPOCH;
store_tod_clock_ext(&clk);
clk.eitod -= delta;
ext_to_timespec64(&clk, ts);
}
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Introduce new store_tod_clock_ext() function, which is the same like
store_tod_clock_ext_cc() except that it doesn't return a condition
code.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Rename store_tod_clock_ext() to store_tod_clock_ext_cc() to reflect
that it returns a condition code and also use union tod_clock as
parameter.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Introduce union tod_clock which is supposed to be used to decode and
access various fields of the result of STORE CLOCK EXTENDED.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
The previous code used the normal kernel stack for machine checks.
This is problematic when a machine check interrupts a system call
or interrupt handler right at the beginning where registers are set up.
Assume system_call is interrupted at the first instruction and a machine
check is triggered. The machine check handler is called, checks the PSW
to see whether it is coming from user space, notices that it is already
in kernel mode but %r15 still contains the user space stack. This would
lead to a kernel crash.
There are basically two ways of fixing that: Either using the 'critical
cleanup' approach which compares the address in the PSW to see whether
it is already at a point where the stack has been set up, or use an extra
stack for the machine check handler.
For simplicity, we will go with the second approach and allocate an extra
stack. This adds some memory overhead for large systems, but usually large
system have plenty of memory so this isn't really a concern. But it keeps
the mchk stack setup simple and less error prone.
Fixes: 0b0ed657fe ("s390: remove critical section cleanup from entry.S")
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Cc: <stable@kernel.org> # v5.8+
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Merge in the recent paravirt changes to resolve conflicts caused
by objtool annotations.
Conflicts:
arch/x86/xen/xen-asm.S
Signed-off-by: Ingo Molnar <mingo@kernel.org>
To prepare for inlining do_softirq_own_stack() replace
__ARCH_HAS_DO_SOFTIRQ with a Kconfig switch and select it in the affected
architectures.
This allows in the next step to move the function prototype and the inline
stub into a seperate asm-generic header file which is required to avoid
include recursion.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210210002513.181713427@linutronix.de
Add support for alternative inline assemblies with input and output
arguments. This is consistent to x86.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Use STORE CLOCK EXTENDED instead of STORE CLOCK in early tod clock
setup. This is just to remove another usage of stck, trying to remove
all usages of STORE CLOCK. This doesn't fix anything.
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Implement generic vdso time namespace support which also enables time
namespaces for s390. This is quite similar to what arm64 has.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Use the passed in vdso_data pointer instead of calculating it again.
This is also required as a prerequisite for vdso time namespaces: if a
process is part of a time namespace __arch_get_vdso_data() will return
a pointer to the time namespace data page instead of the vdso data
page, which is not what __arch_get_hw_counter() expects.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Convert vdso_init() to arch_initcall like it is on all other architectures.
This requires to remove the vdso_getcpu_init() call from vdso_init()
since it must be called before smp is enabled.
vdso_getcpu_init() is now an early_initcall like on powerpc.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Add missing forward declaration for task_struct.
The warning appears when the -Werror C compiler flag is being used.
Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Fix the following coccicheck warnings:
./arch/s390/include/asm/scsw.h:528:48-50: WARNING !A || A && B is
equivalent to !A || B.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Zhong <abaci-bugfix@linux.alibaba.com>
Reviewed-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Remove a superfluous semicolon after function definition.
Signed-off-by: Chengyang Fan <cy.fan@huawei.com>
Message-Id: <20210125095839.1720265-1-cy.fan@huawei.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Currently zpci_create_device() is only called in clp_add_pci_device()
which allocates the memory for the struct zpci_dev being created. There
is little separation of concerns as only both functions together can
create a zpci_dev and the only CLP specific code in
clp_add_pci_device() is a call to clp_query_pci_fn().
Improve this by removing clp_add_pci_device() and refactor
zpci_create_device() such that it alone creates and initializes the
zpci_dev given the FID and Function Handle. For this we need to make
clp_query_pci_fn() non-static. While at it remove the function handle
parameter since we can just take that from the zpci_dev. Also move
adding to the zpci_list to after the zdev has been fully created which
eliminates a window where a partially initialized zdev can be found by
get_zdev_by_fid().
Acked-by: Pierre Morel <pmorel@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Both qeth and zfcp have fully moved to the polling-driven flow for
Input Queues with commit 0a6e634535 ("s390/qdio: extend polling
support to multiple queues") and commit 0b524abc2d ("scsi: zfcp: Lift
Input Queue tasklet from qdio").
So remove the tasklet code for Input Queues, streamline the IRQ handlers
and push the tasklet struct into struct qdio_output_q.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Current KVM_USER_MEM_SLOTS limits are arch specific (512 on Power, 509 on x86,
32 on s390, 16 on MIPS) but they don't really need to be. Memory slots are
allocated dynamically in KVM when added so the only real limitation is
'id_to_index' array which is 'short'. We don't have any other
KVM_MEM_SLOTS_NUM/KVM_USER_MEM_SLOTS-sized statically defined structures.
Low KVM_USER_MEM_SLOTS can be a limiting factor for some configurations.
In particular, when QEMU tries to start a Windows guest with Hyper-V SynIC
enabled and e.g. 256 vCPUs the limit is hit as SynIC requires two pages per
vCPU and the guest is free to pick any GFN for each of them, this fragments
memslots as QEMU wants to have a separate memslot for each of these pages
(which are supposed to act as 'overlay' pages).
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210127175731.2020089-3-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The number reported by the query is N-1 and I think people reading the
sysfs file would expect N instead. For users creating VMs there's no
actual difference because KVM's limit is currently below the UV's
limit.
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Fixes: a0f60f8431 ("s390/protvirt: Add sysfs firmware interface for Ultravisor information")
Cc: stable@vger.kernel.org
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Instead of fetching all registers from struct pt_regs and passing
them to the syscall wrappers, let the system call wrappers only
fetch the values really required.
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
On s390 asmlinkage is a nop, so remove it.
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
This patch converts s390 to use the generic entry infrastructure from
kernel/entry/*.
There are a few special things on s390:
- PIF_PER_TRAP is moved to TIF_PER_TRAP as the generic code doesn't
know about our PIF flags in exit_to_user_mode_loop().
- The old code had several ways to restart syscalls:
a) PIF_SYSCALL_RESTART, which was only set during execve to force a
restart after upgrading a process (usually qemu-kvm) to pgste page
table extensions.
b) PIF_SYSCALL, which is set by do_signal() to indicate that the
current syscall should be restarted. This is changed so that
do_signal() now also uses PIF_SYSCALL_RESTART. Continuing to use
PIF_SYSCALL doesn't work with the generic code, and changing it
to PIF_SYSCALL_RESTART makes PIF_SYSCALL and PIF_SYSCALL_RESTART
more unique.
- On s390 calling sys_sigreturn or sys_rt_sigreturn is implemented by
executing a svc instruction on the process stack which causes a fault.
While handling that fault the fault code sets PIF_SYSCALL to hand over
processing to the syscall code on exit to usermode.
The patch introduces PIF_SYSCALL_RET_SET, which is set if ptrace sets
a return value for a syscall. The s390x ptrace ABI uses r2 both for the
syscall number and return value, so ptrace cannot set the syscall number +
return value at the same time. The flag makes handling that a bit easier.
do_syscall() will just skip executing the syscall if PIF_SYSCALL_RET_SET
is set.
CONFIG_DEBUG_ASCE was removd in favour of the generic CONFIG_DEBUG_ENTRY.
CR1/7/13 will be checked both on kernel entry and exit to contain the
correct asces.
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
clang does not know about the 'b1' construct used in bitops inline
assembly. Since the plan is to use compiler atomic builtins anyway
there is no point in requesting clang support for this. Especially if
one considers that the kernel seems to be the only user of this.
With removing this small optimization it is possible to compile the
kernel also with -march=zEC12 and higher using clang.
Build error:
In file included from ./include/linux/bitops.h:32:
./arch/s390/include/asm/bitops.h:69:4: error: invalid operand in inline asm: 'oi $0,${1:b}'
"oi %0,%b1\n"
^
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
With commit f0cbd3b83e ("s390/atomic: circumvent gcc 10 build
regression") there was an attempt to workaroud a gcc build bug,
however with the workaround a similar problem with clang appeared.
It was recommended to use a workaround which would fail again with
gcc. Therefore simply remove the optimization. It is just not worth
the effort.
Besides that all of this will be changed to use compiler atomic
builtins instead anyway.
See https://reviews.llvm.org/D90231
and https://reviews.llvm.org/D91786
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
On s390 cleared_pXs flags in struct mmu_gather are set by
corresponding pXd_free_tlb functions. Such approach is
inconsistent with how the generic code interprets these
flags, e.g pte_free_tlb() frees a PTE table - or a PMD
level entity, and so on.
This update does not bring any functional change, since
s390 does not use the flags at the moment.
Fixes: 9de7d833e3 ("s390/tlb: Convert to generic mmu_gather")
Link: https://lore.kernel.org/lkml/fbb00ac0-9104-8d25-f225-7b3d1b17a01f@huawei.com/
Reported-by: Zhenyu Ye <yezhenyu2@huawei.com>
Suggested-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Make <asm-generic/local64.h> mandatory in include/asm-generic/Kbuild and
remove all arch/*/include/asm/local64.h arch-specific files since they
only #include <asm-generic/local64.h>.
This fixes build errors on arch/c6x/ and arch/nios2/ for
block/blk-iocost.c.
Build-tested on 21 of 25 arch-es. (tools problems on the others)
Yes, we could even rename <asm-generic/local64.h> to
<linux/local64.h> and change all #includes to use
<linux/local64.h> instead.
Link: https://lkml.kernel.org/r/20201227024446.17018-1-rdunlap@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Masahiro Yamada <masahiroy@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Aurelien Jacquiot <jacquiot.aurelien@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* PSCI relay at EL2 when "protected KVM" is enabled
* New exception injection code
* Simplification of AArch32 system register handling
* Fix PMU accesses when no PMU is enabled
* Expose CSV3 on non-Meltdown hosts
* Cache hierarchy discovery fixes
* PV steal-time cleanups
* Allow function pointers at EL2
* Various host EL2 entry cleanups
* Simplification of the EL2 vector allocation
s390:
* memcg accouting for s390 specific parts of kvm and gmap
* selftest for diag318
* new kvm_stat for when async_pf falls back to sync
x86:
* Tracepoints for the new pagetable code from 5.10
* Catch VFIO and KVM irqfd events before userspace
* Reporting dirty pages to userspace with a ring buffer
* SEV-ES host support
* Nested VMX support for wait-for-SIPI activity state
* New feature flag (AVX512 FP16)
* New system ioctl to report Hyper-V-compatible paravirtualization features
Generic:
* Selftest improvements
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl/bdL4UHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroNgQQgAnTH6rhXa++Zd5F0EM2NwXwz3iEGb
lOq1DZSGjs6Eekjn8AnrWbmVQr+CBCuGU9MrxpSSzNDK/awryo3NwepOWAZw9eqk
BBCVwGBbJQx5YrdgkGC0pDq2sNzcpW/VVB3vFsmOxd9eHblnuKSIxEsCCXTtyqIt
XrLpQ1UhvI4yu102fDNhuFw2EfpzXm+K0Lc0x6idSkdM/p7SyeOxiv8hD4aMr6+G
bGUQuMl4edKZFOWFigzr8NovQAvDHZGrwfihu2cLRYKLhV97QuWVmafv/yYfXcz2
drr+wQCDNzDOXyANnssmviazrhOX0QmTAhbIXGGX/kTxYKcfPi83ZLoI3A==
=ISud
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"Much x86 work was pushed out to 5.12, but ARM more than made up for it.
ARM:
- PSCI relay at EL2 when "protected KVM" is enabled
- New exception injection code
- Simplification of AArch32 system register handling
- Fix PMU accesses when no PMU is enabled
- Expose CSV3 on non-Meltdown hosts
- Cache hierarchy discovery fixes
- PV steal-time cleanups
- Allow function pointers at EL2
- Various host EL2 entry cleanups
- Simplification of the EL2 vector allocation
s390:
- memcg accouting for s390 specific parts of kvm and gmap
- selftest for diag318
- new kvm_stat for when async_pf falls back to sync
x86:
- Tracepoints for the new pagetable code from 5.10
- Catch VFIO and KVM irqfd events before userspace
- Reporting dirty pages to userspace with a ring buffer
- SEV-ES host support
- Nested VMX support for wait-for-SIPI activity state
- New feature flag (AVX512 FP16)
- New system ioctl to report Hyper-V-compatible paravirtualization features
Generic:
- Selftest improvements"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (171 commits)
KVM: SVM: fix 32-bit compilation
KVM: SVM: Add AP_JUMP_TABLE support in prep for AP booting
KVM: SVM: Provide support to launch and run an SEV-ES guest
KVM: SVM: Provide an updated VMRUN invocation for SEV-ES guests
KVM: SVM: Provide support for SEV-ES vCPU loading
KVM: SVM: Provide support for SEV-ES vCPU creation/loading
KVM: SVM: Update ASID allocation to support SEV-ES guests
KVM: SVM: Set the encryption mask for the SVM host save area
KVM: SVM: Add NMI support for an SEV-ES guest
KVM: SVM: Guest FPU state save/restore not needed for SEV-ES guest
KVM: SVM: Do not report support for SMM for an SEV-ES guest
KVM: x86: Update __get_sregs() / __set_sregs() to support SEV-ES
KVM: SVM: Add support for CR8 write traps for an SEV-ES guest
KVM: SVM: Add support for CR4 write traps for an SEV-ES guest
KVM: SVM: Add support for CR0 write traps for an SEV-ES guest
KVM: SVM: Add support for EFER write traps for an SEV-ES guest
KVM: SVM: Support string IO operations for an SEV-ES guest
KVM: SVM: Support MMIO for an SEV-ES guest
KVM: SVM: Create trace events for VMGEXIT MSR protocol processing
KVM: SVM: Create trace events for VMGEXIT processing
...
that unwinding works properly.
- Fix stack unwinder test case to avoid rare interrupt stack corruption.
- Simplify udelay() and just let it busy loop instead of implementing a
complex logic.
- arch_cpu_idle() cleanup.
- Some other minor improvements.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEECMNfWEw3SLnmiLkZIg7DeRspbsIFAl/c800ACgkQIg7DeRsp
bsJBTRAAxJz7J4X1CqyBf+exDcWhjc+FXUEgwDCNbmkPRezvOrivKSymXDoVbvVo
D2ptGGQtpnUsrFqHZ6o0DwEWfcxrDSXlyJV16ABkPDcARuV2bDaor7HzaHJfyuor
nUD0rb/0dWbzzFMlNo+WAG8htrhmS5mA4f1p5XSOohf9zP8Sm6NTVq0A7pK4oJuw
AU6723chxE326yoB2DcyFHaNqByn7jNyVLxfZgH1tyCTRGvqi6ERT+kKLb58eSi8
t1YYEEuwanUUZSjSDHqZeHA2evfJl/ilWAkUdAJWwJL7hoYnCBhqcjexseeinQ7n
09GEGTVVdv09YPZYgDCU+YpJ853gS5zAHYh2ItC3kluCcXV0XNrNyCDT11OxQ4I4
s1uoMhx6S2RvEXKuJZTatmEhNpKd5UXTUoiM0NDYgwdpcxKcyE0cA4FH3Ik+KE/1
np4CsskOYU/XuFxOlu29gB7jJ7R/x2AXyJQdSELU+QXKUuaIF8uINnbzUyCc9mcY
pG9+NKWycRzTXT/1nbKOTBFEhjQi20XcoWRLqX5T0o9D9wLnq4Q+wVhLTt/e5DMb
pw94JDK9HNX2QTULd6YDR4gXxPrypiX4IBli8CHvZcwNnm6N5vdz9nMvxX+v4s/B
lbdo4JHnmIpTsTJf8YdFZPggYlJsxuV4ITNRu4BfFwtdCrZhfc8=
=1l0g
-----END PGP SIGNATURE-----
Merge tag 's390-5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull more s390 updates from Heiko Carstens:
"This is mainly to decouple udelay() and arch_cpu_idle() and simplify
both of them.
Summary:
- Always initialize kernel stack backchain when entering the kernel,
so that unwinding works properly.
- Fix stack unwinder test case to avoid rare interrupt stack
corruption.
- Simplify udelay() and just let it busy loop instead of implementing
a complex logic.
- arch_cpu_idle() cleanup.
- Some other minor improvements"
* tag 's390-5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/zcrypt: convert comma to semicolon
s390/idle: allow arch_cpu_idle() to be kprobed
s390/idle: remove raw_local_irq_save()/restore() from arch_cpu_idle()
s390/idle: merge enabled_wait() and arch_cpu_idle()
s390/delay: remove udelay_simple()
s390/irq: select HAVE_IRQ_EXIT_ON_IRQ_STACK
s390/delay: simplify udelay
s390/test_unwind: use timer instead of udelay
s390/test_unwind: fix CALL_ON_STACK tests
s390: make calls to TRACE_IRQS_OFF/TRACE_IRQS_ON balanced
s390: always clear kernel stack backchain before calling functions
The major update to this release is that there's a new arch config option called:
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS. Currently, only x86_64 enables it.
All the ftrace callbacks now take a struct ftrace_regs instead of a struct
pt_regs. If the architecture has HAVE_DYNAMIC_FTRACE_WITH_ARGS enabled, then
the ftrace_regs will have enough information to read the arguments of the
function being traced, as well as access to the stack pointer. This way, if
a user (like live kernel patching) only cares about the arguments, then it
can avoid using the heavier weight "regs" callback, that puts in enough
information in the struct ftrace_regs to simulate a breakpoint exception
(needed for kprobes).
New config option that audits the timestamps of the ftrace ring buffer at
most every event recorded. The "check_buffer()" calls will conflict with
mainline, because I purposely added the check without including the fix that
it caught, which is in mainline. Running a kernel built from the commit of
the added check will trigger it.
Ftrace recursion protection has been cleaned up to move the protection to
the callback itself (this saves on an extra function call for those
callbacks).
Perf now handles its own RCU protection and does not depend on ftrace to do
it for it (saving on that extra function call).
New debug option to add "recursed_functions" file to tracefs that lists all
the places that triggered the recursion protection of the function tracer.
This will show where things need to be fixed as recursion slows down the
function tracer.
The eval enum mapping updates done at boot up are now offloaded to a work
queue, as it caused a noticeable pause on slow embedded boards.
Various clean ups and last minute fixes.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCX9uq8xQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qtrwAQCHevqWMjKc1Q76bnCgwB0AbFKB6vqy
5b6g/co5+ihv8wD/eJPWlZMAt97zTVW7bdp5qj/GTiCDbAsODMZ597LsxA0=
=rZEz
-----END PGP SIGNATURE-----
Merge tag 'trace-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"The major update to this release is that there's a new arch config
option called CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS.
Currently, only x86_64 enables it. All the ftrace callbacks now take a
struct ftrace_regs instead of a struct pt_regs. If the architecture
has HAVE_DYNAMIC_FTRACE_WITH_ARGS enabled, then the ftrace_regs will
have enough information to read the arguments of the function being
traced, as well as access to the stack pointer.
This way, if a user (like live kernel patching) only cares about the
arguments, then it can avoid using the heavier weight "regs" callback,
that puts in enough information in the struct ftrace_regs to simulate
a breakpoint exception (needed for kprobes).
A new config option that audits the timestamps of the ftrace ring
buffer at most every event recorded.
Ftrace recursion protection has been cleaned up to move the protection
to the callback itself (this saves on an extra function call for those
callbacks).
Perf now handles its own RCU protection and does not depend on ftrace
to do it for it (saving on that extra function call).
New debug option to add "recursed_functions" file to tracefs that
lists all the places that triggered the recursion protection of the
function tracer. This will show where things need to be fixed as
recursion slows down the function tracer.
The eval enum mapping updates done at boot up are now offloaded to a
work queue, as it caused a noticeable pause on slow embedded boards.
Various clean ups and last minute fixes"
* tag 'trace-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (33 commits)
tracing: Offload eval map updates to a work queue
Revert: "ring-buffer: Remove HAVE_64BIT_ALIGNED_ACCESS"
ring-buffer: Add rb_check_bpage in __rb_allocate_pages
ring-buffer: Fix two typos in comments
tracing: Drop unneeded assignment in ring_buffer_resize()
tracing: Disable ftrace selftests when any tracer is running
seq_buf: Avoid type mismatch for seq_buf_init
ring-buffer: Fix a typo in function description
ring-buffer: Remove obsolete rb_event_is_commit()
ring-buffer: Add test to validate the time stamp deltas
ftrace/documentation: Fix RST C code blocks
tracing: Clean up after filter logic rewriting
tracing: Remove the useless value assignment in test_create_synth_event()
livepatch: Use the default ftrace_ops instead of REGS when ARGS is available
ftrace/x86: Allow for arguments to be passed in to ftrace_regs by default
ftrace: Have the callbacks receive a struct ftrace_regs instead of pt_regs
MAINTAINERS: assign ./fs/tracefs to TRACING
tracing: Fix some typos in comments
ftrace: Remove unused varible 'ret'
ring-buffer: Add recording of ring buffer recursion into recursed_functions
...
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl/XgdYQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpjTBD/4me2TNvGOogbcL0b1leAotndJ7spI/IcFM
NUMNy3pOGuRBcRjwle85xq44puAjlNkZE2LLatem5sT7ZvS+8lPNnOIoTYgfaCjt
PhKx2sKlLumVm3BwymYAPcPtke4fikGG15Mwu5nX1oOehmyGrjObGAr3Lo6gexCT
tQoCOczVqaTsV+iTXrLlmgEgs07J9Tm93uh2cNR8Jgroxb8ivuWeUq4YgbV4kWk+
Y8XvOyVE/yba0vQf5/hHtWuVoC6RdELnqZ6NCkcP/EicdBecwk1GMJAej1S3zPS1
0BT7GSFTpm3YUHcygD6LRmRg4I/BmWDTDtMi84+jLat6VvSG1HwIm//qHiCJh3ku
SlvFZENIWAv5LP92x2vlR5Lt7uE3GK2V/5Pxt2fekyzCth6mzu+hLH4CBPQ3xgyd
E1JqIQ/ilbXstp+EYoivV5x8yltZQnKEZRopws0EOqj1LsmDPj9XT1wzE9RnB0o+
PWu/DNhQFhhcmP7Z8uLgPiKIVpyGs+vjxiJLlTtGDFTCy6M5JbcgzGkEkSmnybxH
7lSanjpLt1dWj85FBMc6fNtJkv2rBPfb4+j0d1kZ45Dzcr4umirGIh7wtCHcgc83
brmXSt29hlKHseSHMMuNWK8haXcgAE7gq9tD8GZ/kzM7+vkmLLxHJa22Qhq5rp4w
URPeaBaQJw==
=ayp2
-----END PGP SIGNATURE-----
Merge tag 'for-5.11/drivers-2020-12-14' of git://git.kernel.dk/linux-block
Pull block driver updates from Jens Axboe:
"Nothing major in here:
- NVMe pull request from Christoph:
- nvmet passthrough improvements (Chaitanya Kulkarni)
- fcloop error injection support (James Smart)
- read-only support for zoned namespaces without Zone Append
(Javier González)
- improve some error message (Minwoo Im)
- reject I/O to offline fabrics namespaces (Victor Gladkov)
- PCI queue allocation cleanups (Niklas Schnelle)
- remove an unused allocation in nvmet (Amit Engel)
- a Kconfig spelling fix (Colin Ian King)
- nvme_req_qid simplication (Baolin Wang)
- MD pull request from Song:
- Fix race condition in md_ioctl() (Dae R. Jeong)
- Initialize read_slot properly for raid10 (Kevin Vigor)
- Code cleanup (Pankaj Gupta)
- md-cluster resync/reshape fix (Zhao Heming)
- Move null_blk into its own directory (Damien Le Moal)
- null_blk zone and discard improvements (Damien Le Moal)
- bcache race fix (Dongsheng Yang)
- Set of rnbd fixes/improvements (Gioh Kim, Guoqing Jiang, Jack Wang,
Lutz Pogrell, Md Haris Iqbal)
- lightnvm NULL pointer deref fix (tangzhenhao)
- sr in_interrupt() removal (Sebastian Andrzej Siewior)
- FC endpoint security support for s390/dasd (Jan Höppner, Sebastian
Ott, Vineeth Vijayan). From the s390 arch guys, arch bits included
as it made it easier for them to funnel the feature through the
block driver tree.
- Follow up fixes (Colin Ian King)"
* tag 'for-5.11/drivers-2020-12-14' of git://git.kernel.dk/linux-block: (64 commits)
block: drop dead assignments in loop_init()
sr: Remove in_interrupt() usage in sr_init_command().
sr: Switch the sector size back to 2048 if sr_read_sector() changed it.
cdrom: Reset sector_size back it is not 2048.
drivers/lightnvm: fix a null-ptr-deref bug in pblk-core.c
null_blk: Move driver into its own directory
null_blk: Allow controlling max_hw_sectors limit
null_blk: discard zones on reset
null_blk: cleanup discard handling
null_blk: Improve implicit zone close
null_blk: improve zone locking
block: Align max_hw_sectors to logical blocksize
null_blk: Fail zone append to conventional zones
null_blk: Fix zone size initialization
bcache: fix race between setting bdev state to none and new write request direct to backing
block/rnbd: fix a null pointer dereference on dev->blk_symlink_name
block/rnbd-clt: Dynamically alloc buffer for pathname & blk_symlink_name
block/rnbd: call kobject_put in the failure path
Documentation/ABI/rnbd-srv: add document for force_close
block/rnbd-srv: close a mapped device from server side.
...
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl/YJxsQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpjpyEACBdW+YjenjTbkUPeEXzQgkBkTZUYw3g007
DPcUT1g8PQZXYXlQvBKCvGhhIr7/KVcjepKoowiNQfBNGcIPJTVopW58nzpqAfTQ
goI2WYGn5EKFFKBPvtH04cJD/Wo8muXdxynKtqyZbnGGgZjQxPrE259b8dpHjBSR
6L7HHkk0D1oU/5b6h6Ocpg9mc/0iIUCZylySAYY3eGO0JaVPJaXgZSJZYgHxCHll
Lb+/y/fXdtm/0PmQ3ko0ev54g3yEWqZIX0NsZW1asrButIy+KLzQ2Mz1xFLFDMag
prtIfwb8tzgc4dFPY090C/azjCh5CPpxqYS6FkRwS0p86n6OhkyXrqfily5Hs4/B
NC7CBPBSH/j+NKUK7CYZcpTzTpxPjUr9p0anUdlvMJz8FhTb/3YEEZ1UTeWOeHmk
Yo5SxnFghLeZZeZ1ok6rdymnVa7WEX12SCLGQX31BB2mld0tNbKb4b+FsBF6OUMk
IUaX6OjwDFVRaysC88BQ4hjcIP1HxsViG4/VZDX15gjAAH2Pvb+7tev+lcDcOhjz
TCD4GNFspTFzRhh9nT7oxQ679qCh9G9zHbzuIRewnrS6iqvo5SJQB3dR2yrWZRRH
ySkQFiHpYOlnLJYv0jg9COlGwo2FUdcvKhCvkjQKKBz48rzW/IC0LwKdRQWZDFk3
FKGzP/NBig==
=cadT
-----END PGP SIGNATURE-----
Merge tag 'tif-task_work.arch-2020-12-14' of git://git.kernel.dk/linux-block
Pull TIF_NOTIFY_SIGNAL updates from Jens Axboe:
"This sits on top of of the core entry/exit and x86 entry branch from
the tip tree, which contains the generic and x86 parts of this work.
Here we convert the rest of the archs to support TIF_NOTIFY_SIGNAL.
With that done, we can get rid of JOBCTL_TASK_WORK from task_work and
signal.c, and also remove a deadlock work-around in io_uring around
knowing that signal based task_work waking is invoked with the sighand
wait queue head lock.
The motivation for this work is to decouple signal notify based
task_work, of which io_uring is a heavy user of, from sighand. The
sighand lock becomes a huge contention point, particularly for
threaded workloads where it's shared between threads. Even outside of
threaded applications it's slower than it needs to be.
Roman Gershman <romger@amazon.com> reported that his networked
workload dropped from 1.6M QPS at 80% CPU to 1.0M QPS at 100% CPU
after io_uring was changed to use TIF_NOTIFY_SIGNAL. The time was all
spent hammering on the sighand lock, showing 57% of the CPU time there
[1].
There are further cleanups possible on top of this. One example is
TIF_PATCH_PENDING, where a patch already exists to use
TIF_NOTIFY_SIGNAL instead. Hopefully this will also lead to more
consolidation, but the work stands on its own as well"
[1] https://github.com/axboe/liburing/issues/215
* tag 'tif-task_work.arch-2020-12-14' of git://git.kernel.dk/linux-block: (28 commits)
io_uring: remove 'twa_signal_ok' deadlock work-around
kernel: remove checking for TIF_NOTIFY_SIGNAL
signal: kill JOBCTL_TASK_WORK
io_uring: JOBCTL_TASK_WORK is no longer used by task_work
task_work: remove legacy TWA_SIGNAL path
sparc: add support for TIF_NOTIFY_SIGNAL
riscv: add support for TIF_NOTIFY_SIGNAL
nds32: add support for TIF_NOTIFY_SIGNAL
ia64: add support for TIF_NOTIFY_SIGNAL
h8300: add support for TIF_NOTIFY_SIGNAL
c6x: add support for TIF_NOTIFY_SIGNAL
alpha: add support for TIF_NOTIFY_SIGNAL
xtensa: add support for TIF_NOTIFY_SIGNAL
arm: add support for TIF_NOTIFY_SIGNAL
microblaze: add support for TIF_NOTIFY_SIGNAL
hexagon: add support for TIF_NOTIFY_SIGNAL
csky: add support for TIF_NOTIFY_SIGNAL
openrisc: add support for TIF_NOTIFY_SIGNAL
sh: add support for TIF_NOTIFY_SIGNAL
um: add support for TIF_NOTIFY_SIGNAL
...
The only caller of enabled_wait() besides arch_cpu_idle() was
udelay(). Since that call doesn't exist anymore, merge enabled_wait()
and arch_cpu_idle().
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
udelay_simple() callers can make use of the now simplified udelay()
implementation. No need to keep it.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
udelay is implemented by using quite subtle details to make it
possible to load an idle psw and waiting for an interrupt even in irq
context or when interrupts are disabled. Also handling (or better: no
handling) of softirqs is taken into account.
All this is done to optimize for something which should in normal
circumstances never happen: calling udelay to busy wait. Therefore get
rid of the whole complexity and just busy loop like other
architectures are doing it also.
It could have been possible to use diag 0x44 instead of cpu_relax() in
the busy loop, however we have seen too many bad things happen with
diag 0x44 that it seems to be better to simply busy loop.
Also note that with this new implementation kernel preemption does
work when within the udelay loop. This did not work before.
To get a feeling what the former code optimizes for: IPL'ing a kernel
with 'defconfig' and afterwards compiling a kernel ends with a total
of zero udelay calls.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
This is a cleanup series from Nicholas Piggin, preparing for
later changes. The asm/mmu_context.h header are generalized
and common code moved to asm-gneneric/mmu_context.h.
This saves a bit of code and makes it easier to change in
the future.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAl/Y1LsACgkQmmx57+YA
GNm6kBAAq4/n6nuNnh6b9LhjXaZRG75gEyW7JvHl8KE5wmZHwDHqbwiQgU1b3lUs
JJGbfKqi5ASKxNg6MpfYodmCOqeTUUYG0FUCb6lMhcxxMdfLTLYBvkNd6Y143M+T
boi5b/iz+OUQdNPzlVeSsUEVsD59FIXmP/GhscWZN9VAyf/aLV2MDBIOhrDSJlPo
ObexnP0Iw1E1NRQYDQ6L2dKTHa6XmHyUtw40ABPmd/6MSd1S+D+j3FGg+CYmvnzG
k9g8FbNby8xtUfc0pZV4W/322WN8cDFF9bc04eTDZiAv1bk9lmfvWJ2bWjs3s2qt
RO/suiZEOAta/WUX9vVLgYn2td00ef+AyjNUgffiUfvQfl++fiCDFTGl+MoCLjbh
xQUPcRuRdED7bMKNrC0CcDOSwWEBWVXvkU/szBLDeE1sPjXzGQ80q1Y72k9y961I
mqg7FrHqjZsxT9luXMAzClHNhXAtvehkJZBIdHlFok83EFoTQp48Da4jaDuOOhlq
p/lkPJWOHegIQMWtGwRyGmG1qzil7b/QBNAPLgu9pF4TA+ySRBEB2BOr2jRSkj6N
mNTHQbSYxBoktdt+VhtrSsxR+i8lwlegx+RNRFmKK3VH5da2nfiBaOY7zBQQHxCK
yxQvXvsljSVpfkFKLc/S2nLQL1zTkRfFKV1Xmd3+3owR+EoqM60=
=NpMX
-----END PGP SIGNATURE-----
Merge tag 'asm-generic-mmu-context-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic mmu-context cleanup from Arnd Bergmann:
"This is a cleanup series from Nicholas Piggin, preparing for later
changes. The asm/mmu_context.h header are generalized and common code
moved to asm-gneneric/mmu_context.h.
This saves a bit of code and makes it easier to change in the future"
* tag 'asm-generic-mmu-context-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: (25 commits)
h8300: Fix generic mmu_context build
m68k: mmu_context: Fix Sun-3 build
xtensa: use asm-generic/mmu_context.h for no-op implementations
x86: use asm-generic/mmu_context.h for no-op implementations
um: use asm-generic/mmu_context.h for no-op implementations
sparc: use asm-generic/mmu_context.h for no-op implementations
sh: use asm-generic/mmu_context.h for no-op implementations
s390: use asm-generic/mmu_context.h for no-op implementations
riscv: use asm-generic/mmu_context.h for no-op implementations
powerpc: use asm-generic/mmu_context.h for no-op implementations
parisc: use asm-generic/mmu_context.h for no-op implementations
openrisc: use asm-generic/mmu_context.h for no-op implementations
nios2: use asm-generic/mmu_context.h for no-op implementations
nds32: use asm-generic/mmu_context.h for no-op implementations
mips: use asm-generic/mmu_context.h for no-op implementations
microblaze: use asm-generic/mmu_context.h for no-op implementations
m68k: use asm-generic/mmu_context.h for no-op implementations
ia64: use asm-generic/mmu_context.h for no-op implementations
hexagon: use asm-generic/mmu_context.h for no-op implementations
csky: use asm-generic/mmu_context.h for no-op implementations
...
Core:
- Consolidation and robustness changes for irq time accounting
- Cleanup and consolidation of irq stats
- Remove the fasteoi IPI flow which has been proved useless
- Provide an interface for converting legacy interrupt mechanism into
irqdomains
Drivers:
The rare event of not having completely new chip driver code, just new
DT bindings and extensions of existing drivers to accomodate new
variants!
- Preliminary support for managed interrupts on platform devices
- Correctly identify allocation of MSIs proxyied by another device
- Generalise the Ocelot support to new SoCs
- Improve GICv4.1 vcpu entry, matching the corresponding KVM optimisation
- Work around spurious interrupts on Qualcomm PDC
- Random fixes and cleanups
Thanks,
tglx
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl/YwZgTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoW4CD/90rTi1OQrMe3nb5okVjUZmktz/K3BN
Cl5+evFiXiNoH+yJSMIVP+8eMAtBH6RgoaD0EUtSYmgzb9h/JRRQYwtPxobXcMb2
2xcWyLPJkVJL431JKNM8BBRYjLA2VnQ6Ia+Kx3BxqpgKXn5+cEMh1dwIy27Ll2rj
+2NHAQe1sHL7o/KcCDhYqbVIDjw5K/d7YPwjEuPeEoNv1DOxrOCdCEfgFN0jBtRE
CoaRTBskeAaHIzHNp47Mxyz43g4tA/D8kB68X0OjpEykVkPUbgNK1FHSwaPbIsFT
FTSPU3zg8Q6DZ+RGyjNJykIFgUbirlJxARk2c6Ct8Kc3DN6K1jQt4EsU7CXRCc98
BTBjUNeFeNj3irZ4GHhyMKOQJCA1Z5nCRfBUGiW6gK8183us3BLfH5DM1zEsAYUh
DCp+UKsLuXhbB80EWq7kl82/2mNGZ8En8EerE6XJA7Z3JN8FplOHEuLezYYzwzbb
RIes971Vc50J2u2Wf/M2c3PDz3D/4FzfwUeA4LJfTnmOL09RYZ8CsqSckpx4ku/F
XiBnjwtGEpDXWJ8z13DC7yONrxFGByV19+sqHTBlub5DmIs0gXjhC0dKAPAruUIS
iCC+Vx6xLgOpTDu8shFsjibbi9Hb6vuZrF2Te+WR5Rf7d80C0J4b5K5PS4daUjr6
IuD2tz+3CtPjHw==
=iytv
-----END PGP SIGNATURE-----
Merge tag 'irq-core-2020-12-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
"Generic interrupt and irqchips subsystem updates. Unusually, there is
not a single completely new irq chip driver, just new DT bindings and
extensions of existing drivers to accomodate new variants!
Core:
- Consolidation and robustness changes for irq time accounting
- Cleanup and consolidation of irq stats
- Remove the fasteoi IPI flow which has been proved useless
- Provide an interface for converting legacy interrupt mechanism into
irqdomains
Drivers:
- Preliminary support for managed interrupts on platform devices
- Correctly identify allocation of MSIs proxyied by another device
- Generalise the Ocelot support to new SoCs
- Improve GICv4.1 vcpu entry, matching the corresponding KVM
optimisation
- Work around spurious interrupts on Qualcomm PDC
- Random fixes and cleanups"
* tag 'irq-core-2020-12-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits)
irqchip/qcom-pdc: Fix phantom irq when changing between rising/falling
driver core: platform: Add devm_platform_get_irqs_affinity()
ACPI: Drop acpi_dev_irqresource_disabled()
resource: Add irqresource_disabled()
genirq/affinity: Add irq_update_affinity_desc()
irqchip/gic-v3-its: Flag device allocation as proxied if behind a PCI bridge
irqchip/gic-v3-its: Tag ITS device as shared if allocating for a proxy device
platform-msi: Track shared domain allocation
irqchip/ti-sci-intr: Fix freeing of irqs
irqchip/ti-sci-inta: Fix printing of inta id on probe success
drivers/irqchip: Remove EZChip NPS interrupt controller
Revert "genirq: Add fasteoi IPI flow"
irqchip/hip04: Make IPIs use handle_percpu_devid_irq()
irqchip/bcm2836: Make IPIs use handle_percpu_devid_irq()
irqchip/armada-370-xp: Make IPIs use handle_percpu_devid_irq()
irqchip/gic, gic-v3: Make SGIs use handle_percpu_devid_irq()
irqchip/ocelot: Add support for Jaguar2 platforms
irqchip/ocelot: Add support for Serval platforms
irqchip/ocelot: Add support for Luton platforms
irqchip/ocelot: prepare to support more SoC
...
- Preliminary support for managed interrupts on platform devices
- Correctly identify allocation of MSIs proxyied by another device
- Remove the fasteoi IPI flow which has been proved useless
- Generalise the Ocelot support to new SoCs
- Improve GICv4.1 vcpu entry, matching the corresponding KVM optimisation
- Work around spurious interrupts on Qualcomm PDC
- Random fixes and cleanups
-----BEGIN PGP SIGNATURE-----
iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAl/Uxq8PHG1hekBrZXJu
ZWwub3JnAAoJECPQ0LrRPXpDoW0P/0ZMDvFPxrfnJD46exUgUOPuuFF8jZxAlxD8
7UExqar7u6yX7bbq394jPgtOOxldDagfCx/jCXgb9ja7DK5EHKRcrfjaDT8knHi2
Keg5RaRMRi9TVltvWQTxAkXwSv0Atl881qqsndPeZCez0GNZp+HB34s+rNkZwBOu
MBrWihMQOSv5QE6milsNc7HXLSHM1eLZ7Y2XgumNtKrIGEX9yZI7qwdMofwP8Za3
ayMOvc1WAWaTJI7Mg5ac1yTCVbqLmRHhCtws6c6DMgaRu6SI0itmbpQzkDuJJIe3
k9h4KQPaKAFcQsoo3GV0MKTMm63eq82XT3CAdv+htYRY1z95D2+nzNK+mJtsGptX
gJ2zeJkUb4u+yVtNguL9qjo5ssCXV/6IybJxv6baaEFnSwQMUwqa066NdxmtqfIe
1BOWnc153a7SRbQ34M9/llje+v8YJbueGMS2RFR2LQ6IjjpaHsXh+YCZokfA/kdk
zGbOUD5WWFtFD1T3UoaJ4gFt+pzHjNqym4CcEj4S1Vf5y+POUkNmC+GYK+xfm2Fp
WJMbdIUxJhHFRD9L1ShtfAVUSbp712VOOdILp9rYAkOdqfb51BVUiMUP++s2dGp1
ZIT78qt7kTKT1CxbDdFAjzsi7RoMqdSGYgKmG4sVprELeZnFwq47nBkBr8XEQ1TT
0ccEUOY8
=7Z24
-----END PGP SIGNATURE-----
Merge tag 'irqchip-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core
Pull irqchip updates for 5.11 from Marc Zyngier:
- Preliminary support for managed interrupts on platform devices
- Correctly identify allocation of MSIs proxyied by another device
- Remove the fasteoi IPI flow which has been proved useless
- Generalise the Ocelot support to new SoCs
- Improve GICv4.1 vcpu entry, matching the corresponding KVM optimisation
- Work around spurious interrupts on Qualcomm PDC
- Random fixes and cleanups
Link: https://lore.kernel.org/r/20201212135626.1479884-1-maz@kernel.org
- Expose tag address bits in siginfo. The original arm64 ABI did not
expose any of the bits 63:56 of a tagged address in siginfo. In the
presence of user ASAN or MTE, this information may be useful. The
implementation is generic to other architectures supporting tags (like
SPARC ADI, subject to wiring up the arch code). The user will have to
opt in via sigaction(SA_EXPOSE_TAGBITS) so that the extra bits, if
available, become visible in si_addr.
- Default to 32-bit wide ZONE_DMA. Previously, ZONE_DMA was set to the
lowest 1GB to cope with the Raspberry Pi 4 limitations, to the
detriment of other platforms. With these changes, the kernel scans the
Device Tree dma-ranges and the ACPI IORT information before deciding
on a smaller ZONE_DMA.
- Strengthen READ_ONCE() to acquire when CONFIG_LTO=y. When building
with LTO, there is an increased risk of the compiler converting an
address dependency headed by a READ_ONCE() invocation into a control
dependency and consequently allowing for harmful reordering by the
CPU.
- Add CPPC FFH support using arm64 AMU counters.
- set_fs() removal on arm64. This renders the User Access Override (UAO)
ARMv8 feature unnecessary.
- Perf updates: PMU driver for the ARM DMC-620 memory controller, sysfs
identifier file for SMMUv3, stop event counters support for i.MX8MP,
enable the perf events-based hard lockup detector.
- Reorganise the kernel VA space slightly so that 52-bit VA
configurations can use more virtual address space.
- Improve the robustness of the arm64 memory offline event notifier.
- Pad the Image header to 64K following the EFI header definition
updated recently to increase the section alignment to 64K.
- Support CONFIG_CMDLINE_EXTEND on arm64.
- Do not use tagged PC in the kernel (TCR_EL1.TBID1==1), freeing up 8
bits for PtrAuth.
- Switch to vmapped shadow call stacks.
- Miscellaneous clean-ups.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAl/XcSgACgkQa9axLQDI
XvGkwg//SLknimELD/cphf2UzZm5RFuCU0x1UnIXs9XYo5BrOpgVLLA//+XkCrKN
0GLAdtBDfw1axWJudzgMBiHrv6wSGh4p3YWjLIW06u/PJu3m3U8oiiolvvF8d7Yq
UKDseKGQnQkrl97J0SyA+Da/u8D11GEzp52SWL5iRxzt6vInEC27iTOp9n1yoaoP
f3y7qdp9kv831ryUM3rXFYpc8YuMWXk+JpBSNaxqmjlvjMzipA5PhzBLmNzfc657
XcrRX5qsgjEeJW8UUnWUVNB42j7tVzN77yraoUpoVVCzZZeWOQxqq5EscKPfIhRt
AjtSIQNOs95ZVE0SFCTjXnUUb823coUs4dMCdftqlE62JNRwdR+3bkfa+QjPTg1F
O9ohW1AzX0/JB19QBxMaOgbheB8GFXh3DVJ6pizTgxJgyPvQQtFuEhT1kq8Cst0U
Pe+pEWsg9t41bUXNz+/l9tUWKWpeCfFNMTrBXLmXrNlTLeOvDh/0UiF0+2lYJYgf
YAboibQ5eOv2wGCcSDEbNMJ6B2/6GtubDJxH4du680F6Emb6pCSw0ntPwB7mSGLG
5dXz+9FJxDLjmxw7BXxQgc5MoYIrt5JQtaOQ6UxU8dPy53/+py4Ck6tXNkz0+Ap7
gPPaGGy1GqobQFu3qlHtOK1VleQi/sWcrpmPHrpiiFUf6N7EmcY=
=zXFk
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
- Expose tag address bits in siginfo. The original arm64 ABI did not
expose any of the bits 63:56 of a tagged address in siginfo. In the
presence of user ASAN or MTE, this information may be useful. The
implementation is generic to other architectures supporting tags
(like SPARC ADI, subject to wiring up the arch code). The user will
have to opt in via sigaction(SA_EXPOSE_TAGBITS) so that the extra
bits, if available, become visible in si_addr.
- Default to 32-bit wide ZONE_DMA. Previously, ZONE_DMA was set to the
lowest 1GB to cope with the Raspberry Pi 4 limitations, to the
detriment of other platforms. With these changes, the kernel scans
the Device Tree dma-ranges and the ACPI IORT information before
deciding on a smaller ZONE_DMA.
- Strengthen READ_ONCE() to acquire when CONFIG_LTO=y. When building
with LTO, there is an increased risk of the compiler converting an
address dependency headed by a READ_ONCE() invocation into a control
dependency and consequently allowing for harmful reordering by the
CPU.
- Add CPPC FFH support using arm64 AMU counters.
- set_fs() removal on arm64. This renders the User Access Override
(UAO) ARMv8 feature unnecessary.
- Perf updates: PMU driver for the ARM DMC-620 memory controller, sysfs
identifier file for SMMUv3, stop event counters support for i.MX8MP,
enable the perf events-based hard lockup detector.
- Reorganise the kernel VA space slightly so that 52-bit VA
configurations can use more virtual address space.
- Improve the robustness of the arm64 memory offline event notifier.
- Pad the Image header to 64K following the EFI header definition
updated recently to increase the section alignment to 64K.
- Support CONFIG_CMDLINE_EXTEND on arm64.
- Do not use tagged PC in the kernel (TCR_EL1.TBID1==1), freeing up 8
bits for PtrAuth.
- Switch to vmapped shadow call stacks.
- Miscellaneous clean-ups.
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (78 commits)
perf/imx_ddr: Add system PMU identifier for userspace
bindings: perf: imx-ddr: add compatible string
arm64: Fix build failure when HARDLOCKUP_DETECTOR_PERF is enabled
arm64: mte: fix prctl(PR_GET_TAGGED_ADDR_CTRL) if TCF0=NONE
arm64: mark __system_matches_cap as __maybe_unused
arm64: uaccess: remove vestigal UAO support
arm64: uaccess: remove redundant PAN toggling
arm64: uaccess: remove addr_limit_user_check()
arm64: uaccess: remove set_fs()
arm64: uaccess cleanup macro naming
arm64: uaccess: split user/kernel routines
arm64: uaccess: refactor __{get,put}_user
arm64: uaccess: simplify __copy_user_flushcache()
arm64: uaccess: rename privileged uaccess routines
arm64: sdei: explicitly simulate PAN/UAO entry
arm64: sdei: move uaccess logic to arch/arm64/
arm64: head.S: always initialize PSTATE
arm64: head.S: cleanup SCTLR_ELx initialization
arm64: head.S: rename el2_setup -> init_kernel_el
arm64: add C wrappers for SET_PSTATE_*()
...
The random longs to be pulled by arch_get_random_long() are
prepared in an 4K buffer which is filled from the NIST 800-90
compliant s390 drbg. By default the random long buffer is refilled
256 times before the drbg itself needs a reseed. The reseed of the
drbg is done with 32 bytes fetched from the high quality (but slow)
trng which is assumed to deliver 100% entropy. So the 32 * 8 = 256
bits of entropy are spread over 256 * 4KB = 1MB serving 131072
arch_get_random_long() invocations before reseeded.
How often the 4K random long buffer is refilled with the drbg
before the drbg is reseeded can be adjusted. There is a module
parameter 's390_arch_rnd_long_drbg_reseed' accessible via
/sys/module/arch_random/parameters/rndlong_drbg_reseed
or as kernel command line parameter
arch_random.rndlong_drbg_reseed=<value>
This parameter tells how often the drbg fills the 4K buffer before
it is re-seeded by fresh entropy from the trng.
A value of 16 results in reseeding the drbg at every 16 * 4 KB = 64
KB with 32 bytes of fresh entropy pulled from the trng. So a value
of 16 would result in 256 bits entropy per 64 KB.
A value of 256 results in 1MB of drbg output before a reseed of the
drbg is done. So this would spread the 256 bits of entropy among 1MB.
Setting this parameter to 0 forces the reseed to take place every
time the 4K buffer is depleted, so the entropy rises to 256 bits
entropy per 4K or 0.5 bit entropy per arch_get_random_long(). With
setting this parameter to negative values all this effort is
disabled, arch_get_random long() returns false and thus indicating
that the arch_get_random_long() feature is disabled at all.
arch_get_random_long() is used by random.c among others to provide
an initial hash value to be mixed with the entropy pool on every
random data pull. For about 64 bytes read from /dev/urandom there
is one call to arch_get_random_long(). So these additional random
long values count for performance of /dev/urandom with measurable
but low penalty.
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Ingo Franzki <ifranzki@linux.ibm.com>
Reviewed-by: Juergen Christ <jchrist@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Right now we do count pfault (pseudo page faults aka async page faults
start and completion events). What we do not count is, if an async page
fault would have been possible by the host, but it was disabled by the
guest (e.g. interrupts off, pfault disabled, secure execution....). Let
us count those as well in the pfault_sync counter.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Link: https://lore.kernel.org/r/20201125090658.38463-1-borntraeger@de.ibm.com
Currently only idle_task_exit() explicitly switches (switch_mm) to
init_mm. This causes the kernel asce to be loaded into cr7 and
therefore it would be used for potential user space accesses.
This is currently no problem since idle_task_exit() is nearly the last
thing a CPU executes before it is taken down. However things might
change - and therefore make sure that always the invalid asce is used
for cr7 when active_mm is init_mm.
This makes sure that all potential user space accesses will fail,
instead of accessing kernel address space.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
s390 has its own version of IRQ entry accounting because it doesn't
account the idle time the same way the other architectures do. Only
the actual idle sleep time is accounted as idle time, the rest of the
idle task execution is accounted as system time.
Make the generic IRQ entry accounting aware of architectures that have
their own way of accounting idle time and convert s390 to use it.
This prepares s390 to get involved in further consolidations of IRQ
time accounting.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201202115732.27827-3-frederic@kernel.org
As part of removing broken pm-support from s390 arch, remove
the pm callbacks from ccw-bus driver.The power-management functions
are unused since the 'commit 394216275c ("s390: remove broken
hibernate / power management support")'.
Signed-off-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Most architectures with the exception of alpha, mips, parisc and
sparc use the same values for these flags. Move their definitions into
asm-generic/signal-defs.h and allow the architectures with non-standard
values to override them. Also, document the non-standard flag values
in order to make it easier to add new generic flags in the future.
A consequence of this change is that on powerpc and x86, the constants'
values aside from SA_RESETHAND change signedness from unsigned
to signed. This is not expected to impact realistic use of these
constants. In particular the typical use of the constants where they
are or'ed together and assigned to sa_flags (or another int variable)
would not be affected.
Signed-off-by: Peter Collingbourne <pcc@google.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Link: https://linux-review.googlesource.com/id/Ia3849f18b8009bf41faca374e701cdca36974528
Link: https://lkml.kernel.org/r/b6d0d1ec34f9ee93e1105f14f288fba5f89d1f24.1605235762.git.pcc@google.com
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Implement the previously removed getcpu vdso syscall by using the
TOD programmable field to pass the cpu number to user space.
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Verify on exit to user space that always
- the primary ASCE (cr1) is set to kernel ASCE
- the secondary ASCE (cr7) is set to user ASCE
If this is not the case: panic since something went terribly wrong.
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Create a region 3 page table which contains only invalid entries, and
use that via "s390_invalid_asce" instead of the kernel ASCE whenever
there is either
- no user address space available, e.g. during early startup
- as an intermediate ASCE when address spaces are switched
This makes sure that user space accesses in such situations are
guaranteed to fail.
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Remove set_fs support from s390. With doing this rework address space
handling and simplify it. As a result address spaces are now setup
like this:
CPU running in | %cr1 ASCE | %cr7 ASCE | %cr13 ASCE
----------------------------|-----------|-----------|-----------
user space | user | user | kernel
kernel, normal execution | kernel | user | kernel
kernel, kvm guest execution | gmap | user | kernel
To achieve this the getcpu vdso syscall is removed in order to avoid
secondary address mode and a separate vdso address space in for user
space. The getcpu vdso syscall will be implemented differently with a
subsequent patch.
The kernel accesses user space always via secondary address space.
This happens in different ways:
- with mvcos in home space mode and directly read/write to secondary
address space
- with mvcs/mvcp in primary space mode and copy from primary space to
secondary space or vice versa
- with e.g. cs in secondary space mode and access secondary space
Switching translation modes happens with sacf before and after
instructions which access user space, like before.
Lazy handling of control register reloading is removed in the hope to
make everything simpler, but at the cost of making kernel entry and
exit a bit slower. That is: on kernel entry the primary asce is always
changed to contain the kernel asce, and on kernel exit the primary
asce is changed again so it contains the user asce.
In kernel mode there is only one exception to the primary asce: when
kvm guests are executed the primary asce contains the gmap asce (which
describes the guest address space). The primary asce is reset to
kernel asce whenever kvm guest execution is interrupted, so that this
doesn't has to be taken into account for any user space accesses.
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Currently the kernel minimal compiler requirement is gcc 4.9 or
clang 10.0.1.
* gcc -mhotpatch option is supported since 4.8.
* A combination of -pg -mrecord-mcount -mnop-mcount -mfentry flags is
supported since gcc 9 and since clang 10.
Drop support for old -pg function prologues. Which leaves binary
compatible -mhotpatch / -mnop-mcount -mfentry prologues in a form:
brcl 0,0
Which are also do not require initial nop optimization / conversion and
presence of _mcount symbol.
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Currently we have to consider too many different values which
in the end only affect identity mapping size. These are:
1. max_physmem_end - end of physical memory online or standby.
Always <= end of the last online memory block (get_mem_detect_end()).
2. CONFIG_MAX_PHYSMEM_BITS - the maximum size of physical memory the
kernel is able to support.
3. "mem=" kernel command line option which limits physical memory usage.
4. OLDMEM_BASE which is a kdump memory limit when the kernel is executed as
crash kernel.
5. "hsa" size which is a memory limit when the kernel is executed during
zfcp/nvme dump.
Through out kernel startup and run we juggle all those values at once
but that does not bring any amusement, only confusion and complexity.
Unify all those values to a single one we should really care, that is
our identity mapping size.
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
As the number of cpus increases, the sccb response can exceed 4k for
read cpu and read scp info sclp commands. Hence, all cpu info entries
cant be embedded within a sccb response
Solution:
To overcome this limitation, extended sccb facility is provided by sclp.
1. Check if the extended sccb facility is installed.
2. If extended sccb is installed, perform the read scp and read cpu
command considering a max sccb length of three page size. This max
length is based on factors like max cpus, sccb header.
3. If extended sccb is not installed, perform the read scp and read cpu
sclp command considering a max sccb length of one page size.
Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Fibre Channel Endpoint-Security event is received as an sei:nt0 type
in the CIO layer. This information needs to be shared with the
CCW device drivers using the path_events callback.
Co-developed-by: Sebastian Ott <sebott@linux.ibm.com>
Signed-off-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Signed-off-by: Sebastian Ott <sebott@linux.ibm.com>
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Acked-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add an interface in the CIO layer to retrieve the information about the
Endpoint-Security Mode (ESM) of the specified CU. The ESM values are
defined as 0-None, 1-Authenticated or 2, 3-Encrypted.
[vneethv@linux.ibm.com: cleaned-up and modified description]
Signed-off-by: Sebastian Ott <sebott@linux.ibm.com>
Signed-off-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Acked-by: Vasily Gorbik <gor@linux.ibm.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS is available, the ftrace call
will be able to set the ip of the calling function. This will improve the
performance of live kernel patching where it does not need all the regs to
be stored just to change the instruction pointer.
If all archs that support live kernel patching also support
HAVE_DYNAMIC_FTRACE_WITH_ARGS, then the architecture specific function
klp_arch_set_pc() could be made generic.
It is possible that an arch can support HAVE_DYNAMIC_FTRACE_WITH_ARGS but
not HAVE_DYNAMIC_FTRACE_WITH_REGS and then have access to live patching.
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: live-patching@vger.kernel.org
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
It is relying on _REGION1_SHIFT / _REGION2_SHIFT values which come from
asm/pgtable.h, so include it.
Reviewed-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Kasan early code is only working on init_mm, remove unneeded pgd
parameter from kasan_copy_shadow and rename it to
kasan_copy_shadow_mapping.
Reviewed-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
We've seen several occurences in the past where the default vmalloc
size of 128GB is not sufficient. Therefore extend the default size.
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Compiling the kernel with Kasan disables automatic 3-level vs 4-level
kernel space paging selection, because the shadow memory offset has
to be known at compile time and there is no such offset which would be
acceptable for both 3 and 4-level paging. Instead S390_4_LEVEL_PAGING
option was introduced which allowed to pick how many paging levels to
use under Kasan.
With the introduction of protected virtualization, kernel memory layout
may be affected due to ultravisor secure storage limit. This adds
additional complexity into how memory layout would look like in
combination with Kasan predefined shadow memory offsets. To simplify
this make Kasan 4-level paging default and remove Kasan 3-level paging
support.
Suggested-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
s390_base_ext_handler_fn haven't been used since its introduction in
commit ab14de6c37 ("[S390] Convert memory detection into C code.").
s390_base_ext_handler itself is currently falsely storing 16 registers
at __LC_SAVE_AREA_ASYNC rewriting several following lowcore values:
cpu_flags, return_psw, return_mcck_psw, sync_enter_timer and
async_enter_timer.
Besides that s390_base_ext_handler itself is only potentially hiding
EXT interrupts which should not have happen in the first place. Any
piece of code which requires EXT interrupts before fully functional
ext_int_handler is enabled has to do it on its own, like this is done
by sclp_early_cmd() which is doing EXT interrupts handling synchronously
in sclp_early_wait_irq().
With s390_base_ext_handler removed unexpected EXT interrupt leads
to disabled wait with the address 0x1b0 (__LC_EXT_NEW_PSW), which is
currently setup in the decompressor.
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Currently udelay relies on working EXT interrupts handler, which is not
the case during early startup. In such cases udelay_simple() has to be
used instead.
To avoid mistakes of calling udelay too early, which could happen from
the common code as well - make udelay work for the early code by
introducing static branch and redirecting all udelay calls to
udelay_simple until EXT interrupts handler is fully initialized and
async stack is allocated.
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
pmd/pud_deref() assume that they will never operate on large pmd/pud
entries, and therefore only use the non-large _xxx_ENTRY_ORIGIN mask.
With commit 9ec8fa8dc3 ("s390/vmemmap: extend modify_pagetable()
to handle vmemmap"), that assumption is no longer true, at least for
pmd_deref().
In theory, we could end up with wrong addresses because some of the
non-address bits of a large entry would not be masked out.
In practice, this does not (yet) show any impact, because vmemmap_free()
is currently never used for s390.
Fix pmd/pud_deref() to check for the entry type and use the
_xxx_ENTRY_ORIGIN_LARGE mask for large entries.
While at it, also move pmd/pud_pfn() around, in order to avoid code
duplication, because they do the same thing.
Fixes: 9ec8fa8dc3 ("s390/vmemmap: extend modify_pagetable() to handle vmemmap")
Cc: <stable@vger.kernel.org> # 5.9
Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Currently s390 build is broken.
SECTCMP .boot.data
error: section .boot.data differs between vmlinux and arch/s390/boot/compressed/vmlinux
make[2]: *** [arch/s390/boot/section_cmp.boot.data] Error 1
SECTCMP .boot.preserved.data
error: section .boot.preserved.data differs between vmlinux and arch/s390/boot/compressed/vmlinux
make[2]: *** [arch/s390/boot/section_cmp.boot.preserved.data] Error 1
make[1]: *** [bzImage] Error 2
Commit 33def8498f ("treewide: Convert macro and uses of __section(foo)
to __section("foo")") converted all __section(foo) to __section("foo").
This is wrong for __bootdata / __bootdata_preserved macros which want
variable names to be a part of intermediate section names .boot.data.<var
name> and .boot.preserved.data.<var name>. Those sections are later
sorted by alignment + name and merged together into final .boot.data
/ .boot.preserved.data sections. Those sections must be identical in
the decompressor and the decompressed kernel (that is checked during
the build).
Fixes: 33def8498f ("treewide: Convert macro and uses of __section(foo) to __section("foo")")
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Use a more generic form for __section that requires quotes to avoid
complications with clang and gcc differences.
Remove the quote operator # from compiler_attributes.h __section macro.
Convert all unquoted __section(foo) uses to quoted __section("foo").
Also convert __attribute__((section("foo"))) uses to __section("foo")
even if the __attribute__ has multiple list entry forms.
Conversion done using the script at:
https://lore.kernel.org/lkml/75393e5ddc272dc7403de74d645e6c6e0f4e70eb.camel@perches.com/2-convert_section.pl
Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@gooogle.com>
Reviewed-by: Miguel Ojeda <ojeda@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- New fsl-mc vfio bus driver supporting userspace drivers of objects
within NXP's DPAA2 architecture (Diana Craciun)
- Support for exposing zPCI information on s390 (Matthew Rosato)
- Fixes for "detached" VFs on s390 (Matthew Rosato)
- Fixes for pin-pages and dma-rw accesses (Yan Zhao)
- Cleanups and optimize vconfig regen (Zenghui Yu)
- Fix duplicate irq-bypass token registration (Alex Williamson)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)
iQIcBAABAgAGBQJfkcCjAAoJECObm247sIsi2XIP/j7NL4glPrWU37mesz9dd5nx
SmZhcmxnOqZSQkOCnu+hNFZ9e+tdQjuX+jATOZaYz5l55bLAFmBlBj1Dv8HWaCVI
mTbJ6xXUwdOvNSxbFH6BIUkJg8otR0iEkefVyJLNlF84FsaDknH4yZxx0vdeczjF
wTkkk3+4VmH+4klvPIa9v0eL7yeKeFmgls9nQViVE5kDWUF4us/z/oHlVm9wR+mL
2r3DEjHyz4L2hwVEkhZk7ytR6szdhuhF2l7NoMmaSEXRXjBzJoO6I3P9Y2W4i+su
MFgTfiQ+OpIfVuiR8GzGev+/SrjWGX0Hvb2sYriKOELjhyedkE2kmxacbqMZ/UE+
SRAhFf64C1rzJ4g1IW//Gg+9ObIPqlkqU52VDbOZdCED0AquwSyVmdwIUAK6qF+I
HLOyZXhMI8EZ+w063cS+aKLJIvQTBbfIdMmPZkopVZhwWB3N3BjdvBKA+rPpPoTx
0DpeUo891+zyeEE4aunUmCB8HFnBPgUa+XZqg2juq9MxjScsqgTzA0WEZg7jV4oj
tORQrqoAKJgSk9oVL3EvAnr+IJix3ScRTqYymESORkz/lRCk2hFX48qdeW+qiSP8
W1DHOnivFb1+JzhuZyaRKFWy1mK0EQQWTsE2b2ymPMKJbFhi+pVxaksmeG5x+4Q9
SAp+Qma8Aj3UtBKcj/S+
=LDPo
-----END PGP SIGNATURE-----
Merge tag 'vfio-v5.10-rc1' of git://github.com/awilliam/linux-vfio
Pull VFIO updates from Alex Williamson:
- New fsl-mc vfio bus driver supporting userspace drivers of objects
within NXP's DPAA2 architecture (Diana Craciun)
- Support for exposing zPCI information on s390 (Matthew Rosato)
- Fixes for "detached" VFs on s390 (Matthew Rosato)
- Fixes for pin-pages and dma-rw accesses (Yan Zhao)
- Cleanups and optimize vconfig regen (Zenghui Yu)
- Fix duplicate irq-bypass token registration (Alex Williamson)
* tag 'vfio-v5.10-rc1' of git://github.com/awilliam/linux-vfio: (30 commits)
vfio iommu type1: Fix memory leak in vfio_iommu_type1_pin_pages
vfio/pci: Clear token on bypass registration failure
vfio/fsl-mc: fix the return of the uninitialized variable ret
vfio/fsl-mc: Fix the dead code in vfio_fsl_mc_set_irq_trigger
vfio/fsl-mc: Fixed vfio-fsl-mc driver compilation on 32 bit
MAINTAINERS: Add entry for s390 vfio-pci
vfio-pci/zdev: Add zPCI capabilities to VFIO_DEVICE_GET_INFO
vfio/fsl-mc: Add support for device reset
vfio/fsl-mc: Add read/write support for fsl-mc devices
vfio/fsl-mc: trigger an interrupt via eventfd
vfio/fsl-mc: Add irq infrastructure for fsl-mc devices
vfio/fsl-mc: Added lock support in preparation for interrupt handling
vfio/fsl-mc: Allow userspace to MMAP fsl-mc device MMIO regions
vfio/fsl-mc: Implement VFIO_DEVICE_GET_REGION_INFO ioctl call
vfio/fsl-mc: Implement VFIO_DEVICE_GET_INFO ioctl
vfio/fsl-mc: Scan DPRC objects on vfio-fsl-mc driver bind
vfio: Introduce capability definitions for VFIO_DEVICE_GET_INFO
s390/pci: track whether util_str is valid in the zpci_dev
s390/pci: stash version in the zpci_dev
vfio/fsl-mc: Add VFIO framework skeleton for fsl-mc devices
...
- Remove address space overrides using set_fs().
- Convert to generic vDSO.
- Convert to generic page table dumper.
- Add ARCH_HAS_DEBUG_WX support.
- Add leap seconds handling support.
- Add NVMe firmware-assisted kernel dump support.
- Extend NVMe boot support with memory clearing control and addition of
kernel parameters.
- AP bus and zcrypt api code rework. Add adapter configure/deconfigure
interface. Extend debug features. Add failure injection support.
- Add ECC secure private keys support.
- Add KASan support for running protected virtualization host with
4-level paging.
- Utilize destroy page ultravisor call to speed up secure guests shutdown.
- Implement ioremap_wc() and ioremap_prot() with MIO in PCI code.
- Various checksum improvements.
- Other small various fixes and improvements all over the code.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAl+JXIIACgkQjYWKoQLX
FBgIWAf9FKpnIsy/aNI2RpvojfySEhgH3T5zxGDTjghCSUQzAu0hIBPKhQOs/YfV
/apflXxNPneq7FsQPPpNqfdz2DXQrtgDfecK+7GyEVoOawFArgxiwP+tDVy4dmPT
30PNfr+BpGs7GjKuj33fC0c5U33HYvKzUGJn/GQB2Fhw+5tTDxxCubuS1GVR9iuw
/U1cQhG4KN0lwEeF2gO7BWWgqTH9C1t60+WzOQhIAbdvgtBRr1ctGu//F5S94BYL
NBw5Wxb9vUHrMm2mL0n8bi16hSn2MWHmAMQLkxPXI2osBYun3soaHUWFSA3ryFMw
4BGU+g7T66Pv3ZmLP4jH5UGrn8HWmg==
=4zdC
-----END PGP SIGNATURE-----
Merge tag 's390-5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Vasily Gorbik:
- Remove address space overrides using set_fs()
- Convert to generic vDSO
- Convert to generic page table dumper
- Add ARCH_HAS_DEBUG_WX support
- Add leap seconds handling support
- Add NVMe firmware-assisted kernel dump support
- Extend NVMe boot support with memory clearing control and addition of
kernel parameters
- AP bus and zcrypt api code rework. Add adapter configure/deconfigure
interface. Extend debug features. Add failure injection support
- Add ECC secure private keys support
- Add KASan support for running protected virtualization host with
4-level paging
- Utilize destroy page ultravisor call to speed up secure guests
shutdown
- Implement ioremap_wc() and ioremap_prot() with MIO in PCI code
- Various checksum improvements
- Other small various fixes and improvements all over the code
* tag 's390-5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (85 commits)
s390/uaccess: fix indentation
s390/uaccess: add default cases for __put_user_fn()/__get_user_fn()
s390/zcrypt: fix wrong format specifications
s390/kprobes: move insn_page to text segment
s390/sie: fix typo in SIGP code description
s390/lib: fix kernel doc for memcmp()
s390/zcrypt: Introduce Failure Injection feature
s390/zcrypt: move ap_msg param one level up the call chain
s390/ap/zcrypt: revisit ap and zcrypt error handling
s390/ap: Support AP card SCLP config and deconfig operations
s390/sclp: Add support for SCLP AP adapter config/deconfig
s390/ap: add card/queue deconfig state
s390/ap: add error response code field for ap queue devices
s390/ap: split ap queue state machine state from device state
s390/zcrypt: New config switch CONFIG_ZCRYPT_DEBUG
s390/zcrypt: introduce msg tracking in zcrypt functions
s390/startup: correct early pgm check info formatting
s390: remove orphaned extern variables declarations
s390/kasan: make sure int handler always run with DAT on
s390/ipl: add support to control memory clearing for nvme re-IPL
...
Add redirect_neigh() BPF packet redirect helper, allowing to limit stack
traversal in common container configs and improving TCP back-pressure.
Daniel reports ~10Gbps => ~15Gbps single stream TCP performance gain.
Expand netlink policy support and improve policy export to user space.
(Ge)netlink core performs request validation according to declared
policies. Expand the expressiveness of those policies (min/max length
and bitmasks). Allow dumping policies for particular commands.
This is used for feature discovery by user space (instead of kernel
version parsing or trial and error).
Support IGMPv3/MLDv2 multicast listener discovery protocols in bridge.
Allow more than 255 IPv4 multicast interfaces.
Add support for Type of Service (ToS) reflection in SYN/SYN-ACK
packets of TCPv6.
In Multi-patch TCP (MPTCP) support concurrent transmission of data
on multiple subflows in a load balancing scenario. Enhance advertising
addresses via the RM_ADDR/ADD_ADDR options.
Support SMC-Dv2 version of SMC, which enables multi-subnet deployments.
Allow more calls to same peer in RxRPC.
Support two new Controller Area Network (CAN) protocols -
CAN-FD and ISO 15765-2:2016.
Add xfrm/IPsec compat layer, solving the 32bit user space on 64bit
kernel problem.
Add TC actions for implementing MPLS L2 VPNs.
Improve nexthop code - e.g. handle various corner cases when nexthop
objects are removed from groups better, skip unnecessary notifications
and make it easier to offload nexthops into HW by converting
to a blocking notifier.
Support adding and consuming TCP header options by BPF programs,
opening the doors for easy experimental and deployment-specific
TCP option use.
Reorganize TCP congestion control (CC) initialization to simplify life
of TCP CC implemented in BPF.
Add support for shipping BPF programs with the kernel and loading them
early on boot via the User Mode Driver mechanism, hence reusing all the
user space infra we have.
Support sleepable BPF programs, initially targeting LSM and tracing.
Add bpf_d_path() helper for returning full path for given 'struct path'.
Make bpf_tail_call compatible with bpf-to-bpf calls.
Allow BPF programs to call map_update_elem on sockmaps.
Add BPF Type Format (BTF) support for type and enum discovery, as
well as support for using BTF within the kernel itself (current use
is for pretty printing structures).
Support listing and getting information about bpf_links via the bpf
syscall.
Enhance kernel interfaces around NIC firmware update. Allow specifying
overwrite mask to control if settings etc. are reset during update;
report expected max time operation may take to users; support firmware
activation without machine reboot incl. limits of how much impact
reset may have (e.g. dropping link or not).
Extend ethtool configuration interface to report IEEE-standard
counters, to limit the need for per-vendor logic in user space.
Adopt or extend devlink use for debug, monitoring, fw update
in many drivers (dsa loop, ice, ionic, sja1105, qed, mlxsw,
mv88e6xxx, dpaa2-eth).
In mlxsw expose critical and emergency SFP module temperature alarms.
Refactor port buffer handling to make the defaults more suitable and
support setting these values explicitly via the DCBNL interface.
Add XDP support for Intel's igb driver.
Support offloading TC flower classification and filtering rules to
mscc_ocelot switches.
Add PTP support for Marvell Octeontx2 and PP2.2 hardware, as well as
fixed interval period pulse generator and one-step timestamping in
dpaa-eth.
Add support for various auth offloads in WiFi APs, e.g. SAE (WPA3)
offload.
Add Lynx PHY/PCS MDIO module, and convert various drivers which have
this HW to use it. Convert mvpp2 to split PCS.
Support Marvell Prestera 98DX3255 24-port switch ASICs, as well as
7-port Mediatek MT7531 IP.
Add initial support for QCA6390 and IPQ6018 in ath11k WiFi driver,
and wcn3680 support in wcn36xx.
Improve performance for packets which don't require much offloads
on recent Mellanox NICs by 20% by making multiple packets share
a descriptor entry.
Move chelsio inline crypto drivers (for TLS and IPsec) from the crypto
subtree to drivers/net. Move MDIO drivers out of the phy directory.
Clean up a lot of W=1 warnings, reportedly the actively developed
subsections of networking drivers should now build W=1 warning free.
Make sure drivers don't use in_interrupt() to dynamically adapt their
code. Convert tasklets to use new tasklet_setup API (sadly this
conversion is not yet complete).
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAl+ItRwACgkQMUZtbf5S
IrtTMg//UxpdR/MirT1DatBU0K/UGAZY82hV7F/UC8tPgjfHZeHvWlDFxfi3YP81
PtPKbhRZ7DhwBXefUp6nY3UdvjftrJK2lJm8prJUPSsZRye8Wlcb7y65q7/P2y2U
Efucyopg6RUrmrM0DUsIGYGJgylQLHnMYUl/keCsD4t5Bp4ksyi9R2t5eitGoWzh
r3QGdbSa0AuWx4iu0i+tqp6Tj0ekMBMXLVb35dtU1t0joj2KTNEnSgABN3prOa8E
iWYf2erOau68Ogp3yU3miCy0ZU4p/7qGHTtzbcp677692P/ekak6+zmfHLT9/Pjy
2Stq2z6GoKuVxdktr91D9pA3jxG4LxSJmr0TImcGnXbvkMP3Ez3g9RrpV5fn8j6F
mZCH8TKZAoD5aJrAJAMkhZmLYE1pvDa7KolSk8WogXrbCnTEb5Nv8FHTS1Qnk3yl
wSKXuvutFVNLMEHCnWQLtODbTST9DI/aOi6EctPpuOA/ZyL1v3pl+gfp37S+LUTe
owMnT/7TdvKaTD0+gIyU53M6rAWTtr5YyRQorX9awIu/4Ha0F0gYD7BJZQUGtegp
HzKt59NiSrFdbSH7UdyemdBF4LuCgIhS7rgfeoUXMXmuPHq7eHXyHZt5dzPPa/xP
81P0MAvdpFVwg8ij2yp2sHS7sISIRKq17fd1tIewUabxQbjXqPc=
=bc1U
-----END PGP SIGNATURE-----
Merge tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
- Add redirect_neigh() BPF packet redirect helper, allowing to limit
stack traversal in common container configs and improving TCP
back-pressure.
Daniel reports ~10Gbps => ~15Gbps single stream TCP performance gain.
- Expand netlink policy support and improve policy export to user
space. (Ge)netlink core performs request validation according to
declared policies. Expand the expressiveness of those policies
(min/max length and bitmasks). Allow dumping policies for particular
commands. This is used for feature discovery by user space (instead
of kernel version parsing or trial and error).
- Support IGMPv3/MLDv2 multicast listener discovery protocols in
bridge.
- Allow more than 255 IPv4 multicast interfaces.
- Add support for Type of Service (ToS) reflection in SYN/SYN-ACK
packets of TCPv6.
- In Multi-patch TCP (MPTCP) support concurrent transmission of data on
multiple subflows in a load balancing scenario. Enhance advertising
addresses via the RM_ADDR/ADD_ADDR options.
- Support SMC-Dv2 version of SMC, which enables multi-subnet
deployments.
- Allow more calls to same peer in RxRPC.
- Support two new Controller Area Network (CAN) protocols - CAN-FD and
ISO 15765-2:2016.
- Add xfrm/IPsec compat layer, solving the 32bit user space on 64bit
kernel problem.
- Add TC actions for implementing MPLS L2 VPNs.
- Improve nexthop code - e.g. handle various corner cases when nexthop
objects are removed from groups better, skip unnecessary
notifications and make it easier to offload nexthops into HW by
converting to a blocking notifier.
- Support adding and consuming TCP header options by BPF programs,
opening the doors for easy experimental and deployment-specific TCP
option use.
- Reorganize TCP congestion control (CC) initialization to simplify
life of TCP CC implemented in BPF.
- Add support for shipping BPF programs with the kernel and loading
them early on boot via the User Mode Driver mechanism, hence reusing
all the user space infra we have.
- Support sleepable BPF programs, initially targeting LSM and tracing.
- Add bpf_d_path() helper for returning full path for given 'struct
path'.
- Make bpf_tail_call compatible with bpf-to-bpf calls.
- Allow BPF programs to call map_update_elem on sockmaps.
- Add BPF Type Format (BTF) support for type and enum discovery, as
well as support for using BTF within the kernel itself (current use
is for pretty printing structures).
- Support listing and getting information about bpf_links via the bpf
syscall.
- Enhance kernel interfaces around NIC firmware update. Allow
specifying overwrite mask to control if settings etc. are reset
during update; report expected max time operation may take to users;
support firmware activation without machine reboot incl. limits of
how much impact reset may have (e.g. dropping link or not).
- Extend ethtool configuration interface to report IEEE-standard
counters, to limit the need for per-vendor logic in user space.
- Adopt or extend devlink use for debug, monitoring, fw update in many
drivers (dsa loop, ice, ionic, sja1105, qed, mlxsw, mv88e6xxx,
dpaa2-eth).
- In mlxsw expose critical and emergency SFP module temperature alarms.
Refactor port buffer handling to make the defaults more suitable and
support setting these values explicitly via the DCBNL interface.
- Add XDP support for Intel's igb driver.
- Support offloading TC flower classification and filtering rules to
mscc_ocelot switches.
- Add PTP support for Marvell Octeontx2 and PP2.2 hardware, as well as
fixed interval period pulse generator and one-step timestamping in
dpaa-eth.
- Add support for various auth offloads in WiFi APs, e.g. SAE (WPA3)
offload.
- Add Lynx PHY/PCS MDIO module, and convert various drivers which have
this HW to use it. Convert mvpp2 to split PCS.
- Support Marvell Prestera 98DX3255 24-port switch ASICs, as well as
7-port Mediatek MT7531 IP.
- Add initial support for QCA6390 and IPQ6018 in ath11k WiFi driver,
and wcn3680 support in wcn36xx.
- Improve performance for packets which don't require much offloads on
recent Mellanox NICs by 20% by making multiple packets share a
descriptor entry.
- Move chelsio inline crypto drivers (for TLS and IPsec) from the
crypto subtree to drivers/net. Move MDIO drivers out of the phy
directory.
- Clean up a lot of W=1 warnings, reportedly the actively developed
subsections of networking drivers should now build W=1 warning free.
- Make sure drivers don't use in_interrupt() to dynamically adapt their
code. Convert tasklets to use new tasklet_setup API (sadly this
conversion is not yet complete).
* tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2583 commits)
Revert "bpfilter: Fix build error with CONFIG_BPFILTER_UMH"
net, sockmap: Don't call bpf_prog_put() on NULL pointer
bpf, selftest: Fix flaky tcp_hdr_options test when adding addr to lo
bpf, sockmap: Add locking annotations to iterator
netfilter: nftables: allow re-computing sctp CRC-32C in 'payload' statements
net: fix pos incrementment in ipv6_route_seq_next
net/smc: fix invalid return code in smcd_new_buf_create()
net/smc: fix valid DMBE buffer sizes
net/smc: fix use-after-free of delayed events
bpfilter: Fix build error with CONFIG_BPFILTER_UMH
cxgb4/ch_ipsec: Replace the module name to ch_ipsec from chcr
net: sched: Fix suspicious RCU usage while accessing tcf_tunnel_info
bpf: Fix register equivalence tracking.
rxrpc: Fix loss of final ack on shutdown
rxrpc: Fix bundle counting for exclusive connections
netfilter: restore NF_INET_NUMHOOKS
ibmveth: Identify ingress large send packets.
ibmveth: Switch order of ibmveth_helper calls.
cxgb4: handle 4-tuple PEDIT to NAT mode translation
selftests: Add VRF route leaking tests
...
Pull compat quotactl cleanups from Al Viro:
"More Christoph's compat cleanups: quotactl(2)"
* 'work.quota-compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
quota: simplify the quotactl compat handling
compat: add a compat_need_64bit_alignment_fixup() helper
compat: lift compat_s64 and compat_u64 to <asm-generic/compat.h>
Pull copy_and_csum cleanups from Al Viro:
"Saner calling conventions for csum_and_copy_..._user() and friends"
[ Removing 800+ lines of code and cleaning stuff up is good - Linus ]
* 'work.csum_and_copy' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
ppc: propagate the calling conventions change down to csum_partial_copy_generic()
amd64: switch csum_partial_copy_generic() to new calling conventions
sparc64: propagate the calling convention changes down to __csum_partial_copy_...()
xtensa: propagate the calling conventions change down into csum_partial_copy_generic()
mips: propagate the calling convention change down into __csum_partial_copy_..._user()
mips: __csum_partial_copy_kernel() has no users left
mips: csum_and_copy_{to,from}_user() are never called under KERNEL_DS
sparc32: propagate the calling conventions change down to __csum_partial_copy_sparc_generic()
i386: propagate the calling conventions change down to csum_partial_copy_generic()
sh: propage the calling conventions change down to csum_partial_copy_generic()
m68k: get rid of zeroing destination on error in csum_and_copy_from_user()
arm: propagate the calling convention changes down to csum_partial_copy_from_user()
alpha: propagate the calling convention changes down to csum_partial_copy.c helpers
saner calling conventions for csum_and_copy_..._user()
csum_and_copy_..._user(): pass 0xffffffff instead of 0 as initial sum
csum_partial_copy_nocheck(): drop the last argument
unify generic instances of csum_partial_copy_nocheck()
icmp_push_reply(): reorder adding the checksum up
skb_copy_and_csum_bits(): don't bother with the last argument
Add default cases for __put_user_fn()/__get_user_fn(). This doesn't
fix anything since the functions are only called with sane values.
However we get rid of smatch warnings:
./arch/s390/include/asm/uaccess.h:143 __get_user_fn() error: uninitialized symbol 'rc'.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
We'll need to keep track of whether or not the byte string in util_str is
valid and thus needs to be passed to a vfio-pci passthrough device.
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Acked-by: Niklas Schnelle <schnelle@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
In preparation for passing the info on to vfio-pci devices, stash the
supported PCI version for the target device in the zpci_dev.
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Acked-by: Niklas Schnelle <schnelle@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Add support for AP bus adapter config and deconfig to the sclp
core code. The code is statically build into the kernel when
ZCRYPT is configured either as module or with static support.
This is the base functionality for having configure/deconfigure
support in the AP bus and card code. Another patch will exploit
this soon.
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Suggested-by: Pierre Morel <pmorel@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Rejecting non-native endian BTF overlapped with the addition
of support for it.
The rest were more simple overlapping changes, except the
renesas ravb binding update, which had to follow a file
move as well as a YAML conversion.
Signed-off-by: David S. Miller <davem@davemloft.net>
arch/s390/kernel/entry.h: suspend_zero_pages - only declaration left
after commit 394216275c ("s390: remove broken hibernate / power
management support")
arch/s390/include/asm/setup.h: vmhalt_cmd - only declaration left after
commit 99ca4e582d ("[S390] kernel: Shutdown Actions Interface")
arch/s390/include/asm/setup.h: vmpoff_cmd - only declaration left after
commit 99ca4e582d ("[S390] kernel: Shutdown Actions Interface")
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
From the kernel perspective NVMe dump works exactly like zFCP dump.
Therefore, adapt all places where code explicitly tests only for
IPL of type FCP DUMP. And also set the memory end correctly in this case.
Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Reviewed-by: Philipp Rudo <prudo@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Add the nvme dump ipl type, associated data, and sysfs entries. This allows
booting into a stand alone dump environment that resides on an nvme device.
Signed-off-by: Jason J. Herne <jjherne@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
arch/s390/pci/pci_bus.h: zpci_bus_init - only declaration left after
commit 05bc1be6db ("s390/pci: create zPCI bus")
arch/s390/include/asm/gmap.h: gmap_pte_notify - only declaration left
after commit 4be130a084 ("s390/mm: add shadow gmap support")
arch/s390/include/asm/pgalloc.h: rcu_table_freelist_finish - only
declaration left after commit 36409f6353 ("[S390] use generic RCU
page-table freeing code")
arch/s390/include/asm/tlbflush.h: smp_ptlb_all - only declaration left
after commit 5a79859ae0 ("s390: remove 31 bit support")
arch/s390/include/asm/vtimer.h: init_cpu_vtimer - only declaration left
after commit b5f87f15e2 ("s390/idle: consolidate idle functions and
definitions")
arch/s390/include/asm/pci.h: zpci_debug_info - only declaration left
after commit 386aa051fb ("s390/pci: remove per device debug attribute")
arch/s390/include/asm/vdso.h: vdso_alloc_boot_cpu - only declaration
left after commit 4bff8cb545 ("s390: convert to GENERIC_VDSO")
arch/s390/include/asm/smp.h: smp_vcpu_scheduled - only declaration left
after commit 67626fadd2 ("s390: enforce CONFIG_SMP")
arch/s390/kernel/entry.h: restart_call_handler - only declaration left
after commit 8b646bd759 ("[S390] rework smp code")
arch/s390/kernel/entry.h: startup_init_nobss - only declaration left
after commit 2e83e0eb85 ("s390: clean .bss before running uncompressed
kernel")
arch/s390/kernel/entry.h: s390_early_resume - only declaration left after
commit 394216275c ("s390: remove broken hibernate / power management
support")
drivers/s390/char/raw3270.h: raw3270_request_alloc_bootmem - only
declaration left after commit 33403dcfcd ("[S390] 3270 console:
convert from bootmem to slab")
drivers/s390/cio/device.h: ccw_device_schedule_sch_unregister - only
declaration left after commit 37de53bb52 ("[S390] cio: introduce ccw
device todos")
drivers/s390/char/tape.h: tape_hotplug_event - has only declaration
since recorded git history.
drivers/s390/char/tape.h: tape_oper_handler - has only declaration since
recorded git history.
drivers/s390/char/tape.h: tape_noper_handler - has only declaration
since recorded git history.
drivers/s390/char/tape_std.h: tape_std_check_locate - only declaration
left after commit 161beff8f4 ("s390/tape: remove tape block leftovers")
drivers/s390/char/tape_std.h: tape_std_default_handler - has only
declaration since recorded git history.
drivers/s390/char/tape_std.h: tape_std_unexpect_uchk_handler - has only
declaration since recorded git history.
drivers/s390/char/tape_std.h: tape_std_irq - has only declaration since
recorded git history.
drivers/s390/char/tape_std.h: tape_std_error_recovery - has only
declaration since recorded git history.
drivers/s390/char/tape_std.h: tape_std_error_recovery_has_failed -
has only declaration since recorded git history.
drivers/s390/char/tape_std.h: tape_std_error_recovery_succeded - has
only declaration since recorded git history.
drivers/s390/char/tape_std.h: tape_std_error_recovery_do_retry - has
only declaration since recorded git history.
drivers/s390/char/tape_std.h: tape_std_error_recovery_read_opposite -
has only declaration since recorded git history.
drivers/s390/char/tape_std.h: tape_std_error_recovery_HWBUG - has only
declaration since recorded git history.
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Remove couple of declarations which are unused since commit 4bff8cb545
("s390: convert to GENERIC_VDSO").
Acked-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Since commit 394216275c ("s390: remove broken hibernate / power
management support") _swsusp_reset_dma is unused and could be safely
removed.
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Currently to make sure that every page table entry is read just once
gup_fast walks perform READ_ONCE and pass pXd value down to the next
gup_pXd_range function by value e.g.:
static int gup_pud_range(p4d_t p4d, unsigned long addr, unsigned long end,
unsigned int flags, struct page **pages, int *nr)
...
pudp = pud_offset(&p4d, addr);
This function passes a reference on that local value copy to pXd_offset,
and might get the very same pointer in return. This happens when the
level is folded (on most arches), and that pointer should not be
iterated.
On s390 due to the fact that each task might have different 5,4 or
3-level address translation and hence different levels folded the logic
is more complex and non-iteratable pointer to a local copy leads to
severe problems.
Here is an example of what happens with gup_fast on s390, for a task
with 3-level paging, crossing a 2 GB pud boundary:
// addr = 0x1007ffff000, end = 0x10080001000
static int gup_pud_range(p4d_t p4d, unsigned long addr, unsigned long end,
unsigned int flags, struct page **pages, int *nr)
{
unsigned long next;
pud_t *pudp;
// pud_offset returns &p4d itself (a pointer to a value on stack)
pudp = pud_offset(&p4d, addr);
do {
// on second iteratation reading "random" stack value
pud_t pud = READ_ONCE(*pudp);
// next = 0x10080000000, due to PUD_SIZE/MASK != PGDIR_SIZE/MASK on s390
next = pud_addr_end(addr, end);
...
} while (pudp++, addr = next, addr != end); // pudp++ iterating over stack
return 1;
}
This happens since s390 moved to common gup code with commit
d1874a0c28 ("s390/mm: make the pxd_offset functions more robust") and
commit 1a42010cdc ("s390/mm: convert to the generic
get_user_pages_fast code").
s390 tried to mimic static level folding by changing pXd_offset
primitives to always calculate top level page table offset in pgd_offset
and just return the value passed when pXd_offset has to act as folded.
What is crucial for gup_fast and what has been overlooked is that
PxD_SIZE/MASK and thus pXd_addr_end should also change correspondingly.
And the latter is not possible with dynamic folding.
To fix the issue in addition to pXd values pass original pXdp pointers
down to gup_pXd_range functions. And introduce pXd_offset_lockless
helpers, which take an additional pXd entry value parameter. This has
already been discussed in
https://lkml.kernel.org/r/20190418100218.0a4afd51@mschwideX1
Fixes: 1a42010cdc ("s390/mm: convert to the generic get_user_pages_fast code")
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: <stable@vger.kernel.org> [5.2+]
Link: https://lkml.kernel.org/r/patch.git-943f1e5dcff2.your-ad-here.call-01599856292-ext-8676@work.hours
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In the current implementation, leap seconds are only synchronized
during the bootup process when the STP clock is synced. If the Leap
second offset (LSO) changes the machine must be rebooted, which is
not desired. This patch adds the required code to handle Leap second
changes during runtime. If the Leap second changes, a Configuration
change machine check is triggered. The STP code than schedules a Leap
second insertion/deletion with do_adjtimex().
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
In hardware-dependent headers using u32 is easier
to read and less error-prone.
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Use __packed instead of __attribute__((packed))
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
This patch extends the pkey kernel module to support CCA
and EP11 secure ECC (private) keys as source for deriving
ECC protected (private) keys.
There is yet another new ioctl to support this: PKEY_KBLOB2PROTK3
can handle all the old keys plus CCA and EP11 secure ECC keys.
For details see ioctl description in pkey.h.
The CPACF unit currently only supports a subset of 5
different ECC curves (P-256, P-384, P-521, ED25519, ED448) and
so only keys of this curve type can be transformed into
protected keys. However, the pkey and the cca/ep11 low level
functions do not check this but simple pass-through the key
blob to the firmware onto the crypto cards. So most likely
the failure will be a response carrying an error code
resulting in user space errno value EIO instead of EINVAL.
Deriving a protected key from an EP11 ECC secure key
requires a CEX7 in EP11 mode. Deriving a protected key from
an CCA ECC secure key requires a CEX7 in CCA mode.
Together with this new ioctl the ioctls for querying lists
of apqns (PKEY_APQNS4K and PKEY_APQNS4KT) have been extended
to support EP11 and CCA ECC secure key type and key blobs.
Together with this ioctl there comes a new struct ep11kblob_header
which is to be prepended onto the EP11 key blob. See details
in pkey.h for the fields in there. The older EP11 AES key blob
with some info stored in the (unused) session field is also
supported with this new ioctl.
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Ingo Franzki <ifranzki@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
This reverts commit 55a5542a54 ("s390/hibernate: fix error handling when
suspend cpu != resume cpu"). It added sclp_early_printk_force() which
is no longer used since commit 394216275c ("s390: remove broken
hibernate / power management support"). No hibernate - no problem.
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
lift the compat_s64 and compat_u64 definitions into common code using the
COMPAT_FOR_U64_ALIGNMENT symbol for the x86 special case.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Currently the kernel crashes in Kasan instrumentation code if
CONFIG_KASAN_S390_4_LEVEL_PAGING is used on protected virtualization
capable machine where the ultravisor imposes addressing limitations on
the host and those limitations are lower then KASAN_SHADOW_OFFSET.
The problem is that Kasan has to know in advance where vmalloc/modules
areas would be. With protected virtualization enabled vmalloc/modules
areas are moved down to the ultravisor secure storage limit while kasan
still expects them at the very end of 4-level paging address space.
To fix that make Kasan recognize when protected virtualization is enabled
and predefine vmalloc/modules areas position which are compliant with
ultravisor secure storage limit.
Kasan shadow itself stays in place and might reside above that ultravisor
secure storage limit.
One slight difference compaired to a kernel without Kasan enabled is that
vmalloc/modules areas position is not reverted to default if ultravisor
initialization fails. It would still be below the ultravisor secure
storage limit.
Kernel layout with kasan, 4-level paging and protected virtualization
enabled (ultravisor secure storage limit is at 0x0000800000000000):
---[ vmemmap Area Start ]---
0x0000400000000000-0x0000400080000000
---[ vmemmap Area End ]---
---[ vmalloc Area Start ]---
0x00007fe000000000-0x00007fff80000000
---[ vmalloc Area End ]---
---[ Modules Area Start ]---
0x00007fff80000000-0x0000800000000000
---[ Modules Area End ]---
---[ Kasan Shadow Start ]---
0x0018000000000000-0x001c000000000000
---[ Kasan Shadow End ]---
0x001c000000000000-0x0020000000000000 1P PGD I
Kernel layout with kasan, 4-level paging and protected virtualization
disabled/unsupported:
---[ vmemmap Area Start ]---
0x0000400000000000-0x0000400060000000
---[ vmemmap Area End ]---
---[ Kasan Shadow Start ]---
0x0018000000000000-0x001c000000000000
---[ Kasan Shadow End ]---
---[ vmalloc Area Start ]---
0x001fffe000000000-0x001fffff80000000
---[ vmalloc Area End ]---
---[ Modules Area Start ]---
0x001fffff80000000-0x0020000000000000
---[ Modules Area End ]---
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Add helper functions to expose Channel Subsystem ID (CSSID), MIF Image Id
(IID), Channel ID (CHID) and Channel Path ID (CHPID).
These values are required by the qeth driver's exploitation of network-
address-change-notifications to determine which entries belong to this
interface.
Store the Partition identifier in System log, as this may be used to map
a Linux view to a Hardware view for debugging purpose.
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for operation code 3 (OC3) of the
Perform-Network-Subchannel-Operations (PNSO) function
of the Channel-Subsystem-Call (CHSC) instruction.
PNSO provides 2 operation codes:
OC0 - BRIDGE_INFO
OC3 - ADDR_INFO (new)
Extend the function calls to *pnso* to pass the OC and
add new response code 0108.
Support for OC3 is indicated by a flag in the css_general_characteristics.
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Reviewed-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We don't need to export pages if we destroy the VM configuration
afterwards anyway. Instead we can destroy the page which will zero it
and then make it accessible to the host.
Destroying is about twice as fast as the export.
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Link: https://lore.kernel.org/kvm/20200907124700.10374-2-frankja@linux.ibm.com/
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Checks the whole kernel address space for W+X mappings. Note that
currently the first lowcore page unfortunately has to be mapped
W+X. Therefore this not reported as an insecure mapping.
For the very same reason the wording is also different to other
architectures if the test passes:
On s390 it is "no unexpected W+X pages found" instead of
"no W+X pages found".
Tested-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
clp_rescan_pci_devices_simple() is neither simpler than
clp_scan_pci_devices() nor does it really scan PCI devices, in particular
it will neither add newly discovered devices nor remove those which
disappeared.
Instead it only refreshes PCI function handles and also
has just a single callsite in the same translation unit left which
in fact only refreshes one specific function handle identified by
a FID.
Clarify this by renaming the function and its helper to
clp_refresh_fh() respectvely __clp_refresh_fh() and make it take
a fid directly which saves us dealing with the NULL case which
updated all function handles but is not used anymore.
Furthermore since the only callsite is in the same translation unit
make it static.
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
there is only one call site of clp_rescan_pci_devices() and
all the function does is call zpci_remove_reserved_devices()
followed by a duplicating clp_scan_pci_devices().
So inline the single call as a call to zpci_remove_reserved_devices()
and clp_scan_pci_devices() and remove the function.
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
the only caller of this was removed as part of the suspend/resume
removal so no need to keep this function around.
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
This is currently only preventing that outdated information is
provided to user space. A concurrent split of huge/large pages does
modify the kernel page tables, however either the huge/large mapping
is reported or the split area is being walked.
This "fixes" also only a potential future bug, since split pages could
also be merged again if page permissions are the same for larger
memory areas.
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Passing a custom name from the device driver is nice - but in practice
it's only zfcp who has been using this. So we might as well hard-code
a naming scheme in the qdio layer, so that qeth also benefits from it.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
With our current support for the new MIO PCI instructions, write
combining/write back MMIO memory can be obtained via the pci_iomap_wc()
and pci_iomap_wc_range() functions.
This is achieved by using the write back address for a specific bar
as provided in clp_store_query_pci_fn()
These functions are however not widely used and instead drivers often
rely on ioremap_wc() and ioremap_prot(), which on other platforms enable
write combining using a PTE flag set through the pgrprot value.
While we do not have a write combining flag in the low order flag bits
of the PTE like x86_64 does, with MIO support, there is a write back bit
in the physical address (bit 1 on z15) and thus also the PTE.
Which bit is used to toggle write back and whether it is available at
all, is however not fixed in the architecture. Instead we get this
information from the CLP Store Logical Processor Characteristics for PCI
command. When the write back bit is not provided we fall back to the
existing behavior.
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
__qdio_allocate_fill_qdr() is meant to set up one specific queue
descriptor in the QDR. But for this simple task, it gets passed a bunch
of global structs and offsets - and then navigates through the structs
to find its actual operands.
Clean up all the complicated pointer chasing & index calculation, and
just pass a descriptor and its associated queue struct.
While at it also add some virt_to_phys() translations, to clarify that
addresses in the QDR are meant to be absolute.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Convert s390 to generic vDSO. There are a few special things on s390:
- vDSO can be called without a stack frame - glibc did this in the past.
So we need to allocate a stackframe on our own.
- The former assembly code used stcke to get the TOD clock and applied
time steering to it. We need to do the same in the new code. This is done
in the architecture specific __arch_get_hw_counter function. The steering
information is stored in an architecure specific area in the vDSO data.
- CPUCLOCK_VIRT is now handled with a syscall fallback, which might
be slower/less accurate than the old implementation.
The getcpu() function stays as an assembly function because there is no
generic implementation and the code is just a few lines.
Performance number from my system do 100 mio gettimeofday() calls:
Plain syscall: 8.6s
Generic VDSO: 1.3s
old ASM VDSO: 1s
So it's a bit slower but still much faster than syscalls.
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Add some coding style changes which hopefully make the code
look a bit less odd.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Use "|" instead of "+" within csum_fold() for consistency reasons,
like in the rest of the file.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Convert ip_fast_csum() so it doesn't call csum_partial(), but instead
open code the checksum calculation. The problem with csum_partial() is
that it makes use of the cksm instruction, which has high startup
costs and therefore is only very fast if used on larger memory
regions.
IPv4 headers however are small in size (5-16 32-bit words). The open
coded variant calculates the checksum in ~30% of the time compared to
the old variant (z14, march=z196).
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Rewrite csum_tcpudp_nofold() so that the generated code will not
contain branches. The old implementation was also optimized for
machines which came with "add logical with carry" instructions,
however the compiler doesn't generate them anymore. This is most
likely because those instructions are slower.
However with the old code the compiler generates a lot of branches,
which isn't too helpful usually. Therefore rewrite the code.
In a tight loop this doesn't make any difference since the branch
prediction unit does its job.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
This implementation needs only ~30% of the time to calculate the
checksum compared to the generic variant. In addition the compiler
also generates only ~30% of the instructions compared to the generic
variant (on z14, compiled with march=z196).
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Since commit a21ee6055c ("lockdep: Change hardirq{s_enabled,_context}
to per-cpu variables") the lockdep code itself uses percpu variables. This
leads to recursions because the percpu macros are calling preempt_enable()
which might call trace_preempt_on().
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
quite a few architectures have the same csum_partial_copy_nocheck() -
simply memcpy() the data and then return the csum of the copy.
hexagon, parisc, ia64, s390, um: explicitly spelled out that way.
arc, arm64, csky, h8300, m68k/nommu, microblaze, mips/GENERIC_CSUM, nds32,
nios2, openrisc, riscv, unicore32: end up picking the same thing spelled
out in lib/checksum.h (with varying amounts of perversions along the way).
everybody else (alpha, arm, c6x, m68k/mmu, mips/!GENERIC_CSUM, powerpc,
sh, sparc, x86, xtensa) have non-generic variants. For all except c6x
the declaration is in their asm/checksum.h. c6x uses the wrapper
from asm-generic/checksum.h that would normally lead to the lib/checksum.h
instance, but in case of c6x we end up using an asm function from arch/c6x
instead.
Screw that mess - have architectures with private instances define
_HAVE_ARCH_CSUM_AND_COPY in their asm/checksum.h and have the default
one right in net/checksum.h conditional on _HAVE_ARCH_CSUM_AND_COPY
*not* defined.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
of truncating the most significant bits.
- Improve THP splitting required by qemu processes by making use of
walk_page_vma() instead of calling follow_page() for every single page
within each vma.
- Add missing ZCRYPT dependency to VFIO_AP to fix potential compile problems.
- Remove not required select CLOCKSOURCE_VALIDATE_LAST_CYCLE again.
- Set node distance to LOCAL_DISTANCE instead of 0, since e.g. libnuma
translates a node distance of 0 to "no NUMA support available".
- Couple of other minor fixes and improvements.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEECMNfWEw3SLnmiLkZIg7DeRspbsIFAl81bRwACgkQIg7DeRsp
bsIT3g/7BSIfbI852VTn6hWw+LfdoBLVqXDFcg9xQhUqccbPcG8LG/BYlmuvIn1Z
X9XujP76xAX7LKnt9x0Z9znwpgGIj0oURmcBxzGBTk+yLBf7YXCOLu9w4MV1AD57
WRoGG7pPFZy5rMDM29frqBNQBWc7g+/wGoPVqiffzqgj2DcS4Ka8TSTTboMIH7th
mLK5xaU504FdDGwTgNv6Q/Pt3+YBZ6P/1cIuifzYnfjJ2CsjSBLjS9IFP6/w/c1Q
VQ2TfajuRbAgkkY01o15tKrlqwPzzWYhnh9ix1LkL0WT6VNql+fcwN665HdsSDFu
Ctu6Sk3BZmhN1GONK4gIx5didRxBi7neAfUMenSasPT8uMGJcR4EMoNUr70sXNA/
B1naRayfCH0dE3SVZflAeFJRSh1LnikY14uj65Gg/Wtla5N+90zuHEGDttIBNzrr
FDzc+39GWN1wJEl7FzSIm+YRC88C/So/BNUQhmlVYNY0sbqyAszUGBOAa+5VS2WQ
tkWzWPjJ6QA3e3t/dpnc0dvlCKLTJG3o7bdtilb71QNybGiAniP8+79hCNa5nPT3
iLRhmLX27cQF54IUw1whVA5mkfzHSzwAlB+/laPi8MWFQXTRLTD1XzhmjrIyR57Y
Q6wM+lxC4qreDz7yGX+eQk7OEs8j8IKTF14HZxkp/DEL09Vbmbs=
=E0hf
-----END PGP SIGNATURE-----
Merge tag 's390-5.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull more s390 updates from Heiko Carstens:
- Allow s390 debug feature to handle finally more than 256 CPU numbers,
instead of truncating the most significant bits.
- Improve THP splitting required by qemu processes by making use of
walk_page_vma() instead of calling follow_page() for every single
page within each vma.
- Add missing ZCRYPT dependency to VFIO_AP to fix potential compile
problems.
- Remove not required select CLOCKSOURCE_VALIDATE_LAST_CYCLE again.
- Set node distance to LOCAL_DISTANCE instead of 0, since e.g. libnuma
translates a node distance of 0 to "no NUMA support available".
- Couple of other minor fixes and improvements.
* tag 's390-5.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/numa: move code to arch/s390/kernel
s390/time: remove select CLOCKSOURCE_VALIDATE_LAST_CYCLE again
s390/debug: debug feature version 3
s390/Kconfig: add missing ZCRYPT dependency to VFIO_AP
s390/numa: set node distance to LOCAL_DISTANCE
s390/pkey: remove redundant variable initialization
s390/test_unwind: fix possible memleak in test_unwind()
s390/gmap: improve THP splitting
s390/atomic: circumvent gcc 10 build regression
segment_eq is only used to implement uaccess_kernel. Just open code
uaccess_kernel in the arch uaccess headers and remove one layer of
indirection.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Greentime Hu <green.hu@gmail.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Link: http://lkml.kernel.org/r/20200710135706.537715-5-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change __debug_entry structure in the following way:
- remove redundant union
- Field containing cpuid is expanded to 16 bits. 8-bit width was not
enough since we already support up to 512 cpus.
- Field containing the timestamp is expanded to 60 bits. The timestamp
itself is now stored in the absolute Unix time format in microseconds
taking the Epoch Index into acount.
Adjust default header for debug entries by setting minimum width for cpuid
to 4 digits.
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
The node distance is hardcoded to 0, which causes a trouble
for some user-level applications. In particular, "libnuma"
expects the distance of a node to itself as LOCAL_DISTANCE.
This update removes the offending node distance override.
Cc: <stable@vger.kernel.org> # 4.4
Fixes: 3a368f742d ("s390/numa: add core infrastructure")
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Patch series "mm: cleanup usage of <asm/pgalloc.h>"
Most architectures have very similar versions of pXd_alloc_one() and
pXd_free_one() for intermediate levels of page table. These patches add
generic versions of these functions in <asm-generic/pgalloc.h> and enable
use of the generic functions where appropriate.
In addition, functions declared and defined in <asm/pgalloc.h> headers are
used mostly by core mm and early mm initialization in arch and there is no
actual reason to have the <asm/pgalloc.h> included all over the place.
The first patch in this series removes unneeded includes of
<asm/pgalloc.h>
In the end it didn't work out as neatly as I hoped and moving
pXd_alloc_track() definitions to <asm-generic/pgalloc.h> would require
unnecessary changes to arches that have custom page table allocations, so
I've decided to move lib/ioremap.c to mm/ and make pgalloc-track.h local
to mm/.
This patch (of 8):
In most cases <asm/pgalloc.h> header is required only for allocations of
page table memory. Most of the .c files that include that header do not
use symbols declared in <asm/pgalloc.h> and do not require that header.
As for the other header files that used to include <asm/pgalloc.h>, it is
possible to move that include into the .c file that actually uses symbols
from <asm/pgalloc.h> and drop the include from the header file.
The process was somewhat automated using
sed -i -E '/[<"]asm\/pgalloc\.h/d' \
$(grep -L -w -f /tmp/xx \
$(git grep -E -l '[<"]asm/pgalloc\.h'))
where /tmp/xx contains all the symbols defined in
arch/*/include/asm/pgalloc.h.
[rppt@linux.ibm.com: fix powerpc warning]
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Matthew Wilcox <willy@infradead.org>
Link: http://lkml.kernel.org/r/20200627143453.31835-1-rppt@kernel.org
Link: http://lkml.kernel.org/r/20200627143453.31835-2-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
x86:
* Report last CPU for debugging
* Emulate smaller MAXPHYADDR in the guest than in the host
* .noinstr and tracing fixes from Thomas
* nested SVM page table switching optimization and fixes
Generic:
* Unify shadow MMU cache data structures across architectures
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl8pC+oUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroNcOwgAjomqtEqQNlp7DdZT7VyyklzbxX1/
ud7v+oOJ8K4sFlf64lSthjPo3N9rzZCcw+yOXmuyuITngXOGc3tzIwXpCzpLtuQ1
WO1Ql3B/2dCi3lP5OMmsO1UAZqy9pKLg1dfeYUPk48P5+p7d/NPmk+Em5kIYzKm5
JsaHfCp2EEXomwmljNJ8PQ1vTjIQSSzlgYUBZxmCkaaX7zbEUMtxAQCStHmt8B84
33LczwXBm3viSWrzsoBV37I70+tseugiSGsCfUyupXOvq55d6D9FCqtCb45Hn4Vh
Ik8ggKdalsk/reiGEwNw1/3nr6mRMkHSbl+Mhc4waOIFf9dn0urgQgOaDg==
=YVx0
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"s390:
- implement diag318
x86:
- Report last CPU for debugging
- Emulate smaller MAXPHYADDR in the guest than in the host
- .noinstr and tracing fixes from Thomas
- nested SVM page table switching optimization and fixes
Generic:
- Unify shadow MMU cache data structures across architectures"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (127 commits)
KVM: SVM: Fix sev_pin_memory() error handling
KVM: LAPIC: Set the TDCR settable bits
KVM: x86: Specify max TDP level via kvm_configure_mmu()
KVM: x86/mmu: Rename max_page_level to max_huge_page_level
KVM: x86: Dynamically calculate TDP level from max level and MAXPHYADDR
KVM: VXM: Remove temporary WARN on expected vs. actual EPTP level mismatch
KVM: x86: Pull the PGD's level from the MMU instead of recalculating it
KVM: VMX: Make vmx_load_mmu_pgd() static
KVM: x86/mmu: Add separate helper for shadow NPT root page role calc
KVM: VMX: Drop a duplicate declaration of construct_eptp()
KVM: nSVM: Correctly set the shadow NPT root level in its MMU role
KVM: Using macros instead of magic values
MIPS: KVM: Fix build error caused by 'kvm_run' cleanup
KVM: nSVM: remove nonsensical EXITINFO1 adjustment on nested NPF
KVM: x86: Add a capability for GUEST_MAXPHYADDR < HOST_MAXPHYADDR support
KVM: VMX: optimize #PF injection when MAXPHYADDR does not match
KVM: VMX: Add guest physical address check in EPT violation and misconfig
KVM: VMX: introduce vmx_need_pf_intercept
KVM: x86: update exception bitmap on CPUID changes
KVM: x86: rename update_bp_intercept to update_exception_bitmap
...
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAl8pk84ACgkQUqAMR0iA
lPIrTxAAhD6fosJx+7LCrDRABIw/ZybeS5MIxTuPsNtdMmGBemigew5Ao1wYY6Ww
3BFiNC2LpDXPxSOCQpz0Zm5/oCLhShPJmS6ukjLbufDsiw0MezliKCAa2Bfw3W31
6xntQtf7ps+bmTEQDyuznu8Kfg+I3lmdGUOEBBluHIP4gb7XKQE8ttyUHB6qdiXI
3eAl53Q8dOMMjtk5eNBXA19JY43g4JmLZRBumrAUc1vsv15KTDmSyWKlV8+tLH9K
JbQAHe0pNVec4sJUIYLvIwDZXvtsvxjdJyX3tTeZ7xJ/ARcvRLoixVGqWxKhqdth
j5U/L+YQfCJifyqvEVo03yy4Ti+OraliRpGcRf/bM2HpmFBA2+dISr7/VEqRwkG7
Sy8HuvBHHyUqdrPjB7izhv8iyRN+LxFfpdT5LMnzsvxMxAJ+QwNjxb13RA4kkeRU
5SgOhfGWgTsLy71J6qdSeXYB2oPFw4Onp5yAtoUsOJVYqWkN9x0zdl+9HmqIHF7T
dY+KNriEO6gmpsQrMR4FC/GVMtwYWf8AoqeZen5O5SQULmzuKQ5AkOo0IAMrU92i
iAdFrSZj35HAQjIJRccPNGZ3FwTd1Z4r5GT7VRvrN+nq2wVopzbbz924/lmsGoAS
YppAw31sKfXDc5uWE8jP8GP3OJqhORn2PPXq3D5Q3XSVbGgey0Q=
=ZcMq
-----END PGP SIGNATURE-----
Merge tag 'printk-for-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux
Pull printk updates from Petr Mladek:
- Herbert Xu made printk header file self-contained.
- Andy Shevchenko and Sergey Senozhatsky cleaned up console->setup()
error handling.
- Andy Shevchenko did some cleanups (e.g. sparse warning) in vsprintf
code.
- Minor documentation updates.
* tag 'printk-for-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
lib/vsprintf: Force type of flags value for gfp_t
lib/vsprintf: Replace custom spec to print decimals with generic one
lib/vsprintf: Replace hidden BUILD_BUG_ON() with static_assert()
printk: Make linux/printk.h self-contained
doc:kmsg: explicitly state the return value in case of SEEK_CUR
Replace HTTP links with HTTPS ones: vsprintf
hvc: unify console setup naming
console: Fix trivia typo 'change' -> 'chance'
console: Propagate error code from console ->setup()
tty: hvc: Return proper error code from console ->setup() hook
serial: sunzilog: Return proper error code from console ->setup() hook
serial: sunsab: Return proper error code from console ->setup() hook
mips: Return proper error code from console ->setup() hook
- LKMM updates: mostly documentation changes, but also some new litmus tests for atomic ops.
- KCSAN updates: the most important change is that GCC 11 now has all fixes in place
to support KCSAN, so GCC support can be enabled again. Also more annotations.
- futex updates: minor cleanups and simplifications
- seqlock updates: merge preparatory changes/cleanups for the 'associated locks' facilities.
- lockdep updates:
- simplify IRQ trace event handling
- add various new debug checks
- simplify header dependencies, split out <linux/lockdep_types.h>, decouple
lockdep from other low level headers some more
- fix NMI handling
- misc cleanups and smaller fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl8n9/wRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1hZFQ//dD+AKw9Nym+WbylovmeD0qxWxPyeN/jG
vBVDTOJIJLtZTkZf6YHcYOJlPwaMDYUQluqTPQhsaQZy/NoEb5NM2cFAj2R9gjyT
O8665T1dvhW9Sh353mBpuwviqdrnvCeHTBEcglSlFY7hxToYAflUN0+DXGVtNys8
PFNf3L9SHT0GLVC8+di/eJzQaRqxiB0Pq7kvh2RvPJM/dcQNA9Ho3CCNO5j6qGoY
u7OnMT8xJXkgbdjjUO4RO0v9VjMuNthZ2JiONDgvgKtJfIL2wt5YXIv1EYX0GuWp
WZgIzE4o1G7GJOOzKpFfZFyK8grHu2fWgK1plvodWjlLkBmltJZ1qyOM+wngd/m2
TgtPo73/YFbxFUbbBpkb0eiIaH2t99kMvfCWd05+GiPCtzn9UL9GfFRWd42vonwc
sQWjFrHKlnuzifUfNcLmKg7R2nUtF3Dm/SydiTJ+9NtH/QA17YJKWnlE1moulNtQ
p7H7+8UdcvSQ7F38A74v2IYNIyDsv5qcE8ar4QHdaanBBX/LCyD0UlfgsgxEReXf
GDKkpx7LFQlI6Y2YB+dZgkCwhNBl3/OQ3v6hC95B37fA67dAIQyPIWHiHbaM+029
gghqU4GcUcbjSnHPzl9PPL+hi9MyXrMjpb7CBXytg4NI4EE1waHR+0kX14V8ndRj
MkWQOKPUgB0=
=3MTT
-----END PGP SIGNATURE-----
Merge tag 'locking-core-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
- LKMM updates: mostly documentation changes, but also some new litmus
tests for atomic ops.
- KCSAN updates: the most important change is that GCC 11 now has all
fixes in place to support KCSAN, so GCC support can be enabled again.
Also more annotations.
- futex updates: minor cleanups and simplifications
- seqlock updates: merge preparatory changes/cleanups for the
'associated locks' facilities.
- lockdep updates:
- simplify IRQ trace event handling
- add various new debug checks
- simplify header dependencies, split out <linux/lockdep_types.h>,
decouple lockdep from other low level headers some more
- fix NMI handling
- misc cleanups and smaller fixes
* tag 'locking-core-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
kcsan: Improve IRQ state trace reporting
lockdep: Refactor IRQ trace events fields into struct
seqlock: lockdep assert non-preemptibility on seqcount_t write
lockdep: Add preemption enabled/disabled assertion APIs
seqlock: Implement raw_seqcount_begin() in terms of raw_read_seqcount()
seqlock: Add kernel-doc for seqcount_t and seqlock_t APIs
seqlock: Reorder seqcount_t and seqlock_t API definitions
seqlock: seqcount_t latch: End read sections with read_seqcount_retry()
seqlock: Properly format kernel-doc code samples
Documentation: locking: Describe seqlock design and usage
locking/qspinlock: Do not include atomic.h from qspinlock_types.h
locking/atomic: Move ATOMIC_INIT into linux/types.h
lockdep: Move list.h inclusion into lockdep.h
locking/lockdep: Fix TRACE_IRQFLAGS vs. NMIs
futex: Remove unused or redundant includes
futex: Consistently use fshared as boolean
futex: Remove needless goto's
futex: Remove put_futex_key()
rwsem: fix commas in initialisation
docs: locking: Replace HTTP links with HTTPS ones
...
- Add support for custom exception handlers, as required by BPF_PROBE_MEM.
- Add support for BPF_PROBE_MEM.
- Add trace events for idle enter / exit for the s390 specific idle
implementation.
- Remove unused zcore memmmap device.
- Remove unused "raw view" from s390 debug feature.
- AP bus + zcrypt device driver code refactoring.
- Provide cex4 cca sysfs attributes for cex3 for zcrypt device driver.
- Expose only minimal interface to walk physmem for mm/memblock. This
is a common code change and it has been agreed on with Mike Rapoport
and Andrew Morton that this can go upstream via the s390 tree.
- Rework of the s390 vmem/vmmemap code to allow for future memory hot
remove.
- Get rid of FORCE_MAX_ZONEORDER to finally allow for order-10
allocations again, instead of only order-8 allocations.
- Various small improvements and fixes.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEECMNfWEw3SLnmiLkZIg7DeRspbsIFAl8n1eUACgkQIg7DeRsp
bsJJIhAAsY4IwWHOOh9GRY0yAU8FQvJiBI8H2IuukjnwjKmj8LQA/VkiIWOfWU99
2cnrnEi7+Op1od0ebjnkAU+oGws3qazpRxp6RaN3qTbnEYYSVMGvNfjTaWH3/Tsd
jxNgYZ4bV7foSWfYvyoBy4cORcSt1xFdA7by+XQYoacFJMNgjktDoeMFnj9TMCbj
LFHjAdqN78o98nwgREuzSPV806cQgNhzBc6kYaC2zw1W5Z3NrdmLXVyyqM7YCB/9
rKTQrEYi550BoyHHpxOY3K9PQQBEZZOH3M/2rA/W/gQaWCs2z3dwmBqjzwM36eZQ
To+sw4F9x/enuYpU5ylVrh0nuWaJ7wpe3DugHY+UghGZwm71On6ZTnEkWD450jD+
bVdDdYPturypTLdCiAFr7D0pMDqzgUP+jyTpIPH1uOFAkocfwrfFj6Als3mIjjks
pptWs+1m4lv1E+7flrSgkNdvPpUhwD6Zf5RZi03GUZShFZzA6Nq4+yVOX7O871M7
R9rLOQ0ch9/PiDdD4VXihL0Qva9eayo/Bek0npEBp0ZnyjIgHr64Xr77jqx74mMB
yoT+CSfICqvmF5CV4lPhPeQYEpvzYj8yi9zAxlFNyRpeM75B7L/JkNcqMN9fra4I
yKxo4Ng/6EEYx7ooCnX2I0BWJZc3b4ZBIJiRAF7OXzX91O9v8nU=
=H0KX
-----END PGP SIGNATURE-----
Merge tag 's390-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Heiko Carstens:
- Add support for function error injection.
- Add support for custom exception handlers, as required by
BPF_PROBE_MEM.
- Add support for BPF_PROBE_MEM.
- Add trace events for idle enter / exit for the s390 specific idle
implementation.
- Remove unused zcore memmmap device.
- Remove unused "raw view" from s390 debug feature.
- AP bus + zcrypt device driver code refactoring.
- Provide cex4 cca sysfs attributes for cex3 for zcrypt device driver.
- Expose only minimal interface to walk physmem for mm/memblock. This
is a common code change and it has been agreed on with Mike Rapoport
and Andrew Morton that this can go upstream via the s390 tree.
- Rework of the s390 vmem/vmmemap code to allow for future memory hot
remove.
- Get rid of FORCE_MAX_ZONEORDER to finally allow for order-10
allocations again, instead of only order-8 allocations.
- Various small improvements and fixes.
* tag 's390-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (48 commits)
s390/vmemmap: coding style updates
s390/vmemmap: avoid memset(PAGE_UNUSED) when adding consecutive sections
s390/vmemmap: remember unused sub-pmd ranges
s390/vmemmap: fallback to PTEs if mapping large PMD fails
s390/vmem: cleanup empty page tables
s390/vmemmap: take the vmem_mutex when populating/freeing
s390/vmemmap: cleanup when vmemmap_populate() fails
s390/vmemmap: extend modify_pagetable() to handle vmemmap
s390/vmem: consolidate vmem_add_range() and vmem_remove_range()
s390/vmem: rename vmem_add_mem() to vmem_add_range()
s390: enable HAVE_FUNCTION_ERROR_INJECTION
s390/pci: clarify comment in s390_mmio_read/write
s390/time: improve comparison for tod steering
s390/time: select CLOCKSOURCE_VALIDATE_LAST_CYCLE
s390/time: use CLOCKSOURCE_MASK
s390/bpf: implement BPF_PROBE_MEM
s390/kernel: expand exception table logic to allow new handling options
s390/kernel: unify EX_TABLE* implementations
s390/mm: allow order 10 allocations
s390/mm: avoid trimming to MAX_ORDER
...
This patch moves ATOMIC_INIT from asm/atomic.h into linux/types.h.
This allows users of atomic_t to use ATOMIC_INIT without having to
include atomic.h as that way may lead to header loops.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Waiman Long <longman@redhat.com>
Link: https://lkml.kernel.org/r/20200729123105.GB7047@gondor.apana.org.au
As it stands if you include printk.h by itself it will fail to
compile because it requires definitions from ratelimit.h. However,
simply including ratelimit.h from printk.h does not work due to
inclusion loops involving sched.h and kernel.h.
This patch solves this by moving bits from ratelimit.h into a new
header file which can then be included by printk.h without any
worries about header loops.
The build bot then revealed some intriguing failures arising out
of this patch. On s390 there is an inclusion loop with asm/bug.h
and linux/kernel.h that triggers a compile failure, because kernel.h
will cause asm-generic/bug.h to be included before s390's own
asm/bug.h has finished processing. This has been fixed by not
including kernel.h in arch/s390/include/asm/bug.h.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Petr Mladek <pmladek@suse.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Link: https://lore.kernel.org/r/20200721062248.GA18383@gondor.apana.org.au
This kernel feature is required for enabling BPF_KPROBE_OVERRIDE.
Define override_function_with_return() and regs_set_return_value()
functions, and fix compile errors in syscall_wrapper.h.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
This is a s390 port of commit 548acf1923 ("x86/mm: Expand the
exception table logic to allow new handling options"), which is needed
for implementing BPF_PROBE_MEM on s390.
The new handler field is made 64-bit in order to allow pointing from
dynamically allocated entries to handlers in kernel text. Unlike on x86,
NULL is used instead of ex_handler_default. This is because exception
tables are used by boot/text_dma.S, and it would be a pain to preserve
ex_handler_default.
The new infrastructure is ignored in early_pgm_check_handler, since
there is no pt_regs.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Replace three implementations with one using using __stringify_in_c
macro conveniently "borrowed" from powerpc and microblaze.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
In order to use <asm/percpu.h> in irqflags.h, we need to make sure
asm/percpu.h does not itself depend on irqflags.h
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20200623083721.396143816@infradead.org
Move x86's 'struct kvm_mmu_memory_cache' to common code in anticipation
of moving the entire x86 implementation code to common KVM and reusing
it for arm64 and MIPS. Add a new architecture specific asm/kvm_types.h
to control the existence and parameters of the struct. The new header
is needed to avoid a chicken-and-egg problem with asm/kvm_host.h as all
architectures define instances of the struct in their vCPU structs.
Add an asm-generic version of kvm_types.h to avoid having empty files on
PPC and s390 in the long term, and for arm64 and mips in the short term.
Suggested-by: Christoffer Dall <christoffer.dall@arm.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200703023545.8771-15-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl8DWosUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroO8cAf/UskNg8qoLGG17rQwhFpmigSllbiJ
TAyi3tpb1Y0Z2MfYeGkeiEb1L34bS28Cxl929DoqI3hrXy1wDCmsHPB5c3URXrzd
aswvr7pJtQV9iH1ykaS2woFJnOUovMFsFYMhj46yUPoAvdKOZKvuqcduxbogYHFw
YeRhS+1lGfiP2A0j3O/nnNJ0wq+FxKO46G3CgWeqG75+FSL6y/tl0bZJUMKKajQZ
GNaOv/CYCHAfUdvgy0ZitRD8lV8yxng3dYGjm+a52Kmn2ZWiFlxNrnxzHySk16Rn
Lq6MfFOqgrYpoZv7SnsFYnRE05U5bEFQ8BGr22fImQ+ktKDgq+9gv6cKwA==
=+DN/
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"Bugfixes and a one-liner patch to silence a sparse warning"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: arm64: Stop clobbering x0 for HVC_SOFT_RESTART
KVM: arm64: PMU: Fix per-CPU access in preemptible context
KVM: VMX: Use KVM_POSSIBLE_CR*_GUEST_BITS to initialize guest/host masks
KVM: x86: Mark CR4.TSD as being possibly owned by the guest
KVM: x86: Inject #GP if guest attempts to toggle CR4.LA57 in 64-bit mode
kvm: use more precise cast and do not drop __user
KVM: x86: bit 8 of non-leaf PDPEs is not reserved
KVM: X86: Fix async pf caused null-ptr-deref
KVM: arm64: vgic-v4: Plug race between non-residency and v4.1 doorbell
KVM: arm64: pvtime: Ensure task delay accounting is enabled
KVM: arm64: Fix kvm_reset_vcpu() return code being incorrect with SVE
KVM: arm64: Annotate hyp NMI-related functions as __always_inline
KVM: s390: reduce number of IO pins to 1
Some beautifications related to the internal only used
struct ap_message and related code. Instead of one int carrying
only the special flag now a u32 flags field is used.
At struct CPRBX the pointers to additional data are now marked
with __user. This caused some changes needed on code, where
these structs are also used within the zcrypt misc functions.
The ica_rsa_* structs now use the generic types __u8, __u32, ...
instead of char, unsigned int.
zcrypt_msg6 and zcrypt_msg50 use min_t() instead of min().
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
I can't come up with a satisfying reason why we still need the memory
segment list. We used to represent in the list:
- boot memory
- standby memory added via add_memory()
- loaded dcss segments
When loading/unloading dcss segments, we already track them in a
separate list and check for overlaps
(arch/s390/mm/extmem.c:segment_overlaps_others()) when loading segments.
The overlap check was introduced for some segments in
commit b2300b9efe ("[S390] dcssblk: add >2G DCSSs support and stacked
contiguous DCSSs support.")
and was extended to cover all dcss segments in
commit ca57114609 ("s390/extmem: remove code for 31 bit addressing
mode").
Although I doubt that overlaps with boot memory and standby memory
are relevant, let's reshuffle the checks in load_segment() to request
the resource first. This will bail out in case we have overlaps with
other resources (esp. boot memory and standby memory). The order
is now different compared to segment_unload() and segment_unload(), but
that should not matter.
This smells like a leftover from ancient times, let's get rid of it. We
can now convert vmem_remove_mapping() into a void function - everybody
ignored the return value already.
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20200625150029.45019-1-david@redhat.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Tested-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [DCSS]
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
There is no interface to userspace which exposes anything that would
require the struct __debug_entry definition. Therefore remove it from
uapi. This allows to change the definition, since it is only kernel
internally used.
The only exception is the crash utility, however that tool must handle
changes all the time anyway.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
There is not a single user of the debug raw view. Therefore remove it
before anybody uses it. If anybody would make use of the view it would
expose the struct __debug_entry definition to userspace and really
would make it uapi. This wouldn't be good, since the definition is
suboptimal and needs to be changed.
Right now the structure definition is only defined to be uapi, however
there is no user.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
DIAGNOSE 0x318 (diag318) sets information regarding the environment
the VM is running in (Linux, z/VM, etc) and is observed via
firmware/service events.
This is a privileged s390x instruction that must be intercepted by
SIE. Userspace handles the instruction as well as migration. Data
is communicated via VCPU register synchronization.
The Control Program Name Code (CPNC) is stored in the SIE block. The
CPNC along with the Control Program Version Code (CPVC) are stored
in the kvm_vcpu_arch struct.
This data is reset on load normal and clear resets.
Signed-off-by: Collin Walling <walling@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20200622154636.5499-3-walling@linux.ibm.com
[borntraeger@de.ibm.com: fix sync_reg position]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
The diag 318 struct introduced in include/asm/diag.h can be
reused in KVM, so let's condense the version code fields in the
diag318_info struct for easier usage and simplify it until we
can determine how the data should be formatted.
Signed-off-by: Collin Walling <walling@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Link: https://lore.kernel.org/r/20200622154636.5499-2-walling@linux.ibm.com
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
The current number of KVM_IRQCHIP_NUM_PINS results in an order 3
allocation (32kb) for each guest start/restart. This can result in OOM
killer activity even with free swap when the memory is fragmented
enough:
kernel: qemu-system-s39 invoked oom-killer: gfp_mask=0x440dc0(GFP_KERNEL_ACCOUNT|__GFP_COMP|__GFP_ZERO), order=3, oom_score_adj=0
kernel: CPU: 1 PID: 357274 Comm: qemu-system-s39 Kdump: loaded Not tainted 5.4.0-29-generic #33-Ubuntu
kernel: Hardware name: IBM 8562 T02 Z06 (LPAR)
kernel: Call Trace:
kernel: ([<00000001f848fe2a>] show_stack+0x7a/0xc0)
kernel: [<00000001f8d3437a>] dump_stack+0x8a/0xc0
kernel: [<00000001f8687032>] dump_header+0x62/0x258
kernel: [<00000001f8686122>] oom_kill_process+0x172/0x180
kernel: [<00000001f8686abe>] out_of_memory+0xee/0x580
kernel: [<00000001f86e66b8>] __alloc_pages_slowpath+0xd18/0xe90
kernel: [<00000001f86e6ad4>] __alloc_pages_nodemask+0x2a4/0x320
kernel: [<00000001f86b1ab4>] kmalloc_order+0x34/0xb0
kernel: [<00000001f86b1b62>] kmalloc_order_trace+0x32/0xe0
kernel: [<00000001f84bb806>] kvm_set_irq_routing+0xa6/0x2e0
kernel: [<00000001f84c99a4>] kvm_arch_vm_ioctl+0x544/0x9e0
kernel: [<00000001f84b8936>] kvm_vm_ioctl+0x396/0x760
kernel: [<00000001f875df66>] do_vfs_ioctl+0x376/0x690
kernel: [<00000001f875e304>] ksys_ioctl+0x84/0xb0
kernel: [<00000001f875e39a>] __s390x_sys_ioctl+0x2a/0x40
kernel: [<00000001f8d55424>] system_call+0xd8/0x2c8
As far as I can tell s390x does not use the iopins as we bail our for
anything other than KVM_IRQ_ROUTING_S390_ADAPTER and the chip/pin is
only used for KVM_IRQ_ROUTING_IRQCHIP. So let us use a small number to
reduce the memory footprint.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20200617083620.5409-1-borntraeger@de.ibm.com
If both the tracer and the tracee are compat processes, and gprs[2]
is assigned a value by __poke_user_compat, then the higher 32 bits
of gprs[2] are cleared, IS_ERR_VALUE() always returns false, and
syscall_get_error() always returns 0.
Fix the implementation by sign-extending the value for compat processes
the same way as x86 implementation does.
The bug was exposed to user space by commit 201766a20e ("ptrace: add
PTRACE_GET_SYSCALL_INFO request") and detected by strace test suite.
This change fixes strace syscall tampering on s390.
Link: https://lkml.kernel.org/r/20200602180051.GA2427@altlinux.org
Fixes: 753c4dd6a2 ("[S390] ptrace changes")
Cc: Elvira Khabirova <lineprinter@altlinux.org>
Cc: stable@vger.kernel.org # v2.6.28+
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
clock_getres in the vDSO library has to preserve the same behaviour
of posix_get_hrtimer_res().
In particular, posix_get_hrtimer_res() does:
sec = 0;
ns = hrtimer_resolution;
and hrtimer_resolution depends on the enablement of the high
resolution timers that can happen either at compile or at run time.
Fix the s390 vdso implementation of clock_getres keeping a copy of
hrtimer_resolution in vdso data and using that directly.
Link: https://lkml.kernel.org/r/20200324121027.21665-1-vincenzo.frascino@arm.com
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
[heiko.carstens@de.ibm.com: use llgf for proper zero extension]
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
- Loongson port
PPC:
- Fixes
ARM:
- Fixes
x86:
- KVM_SET_USER_MEMORY_REGION optimizations
- Fixes
- Selftest fixes
The guest side of the asynchronous page fault work has been delayed to 5.9
in order to sync with Thomas's interrupt entry rework.
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl7icj4UHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroPHGQgAj9+5j+f5v06iMP/+ponWwsVfh+5/
UR1gPbpMSFMKF0U+BCFxsBeGKWPDiz9QXaLfy6UGfOFYBI475Su5SoZ8/i/o6a2V
QjcKIJxBRNs66IG/774pIpONY8/mm/3b6vxmQktyBTqjb6XMGlOwoGZixj/RTp85
+uwSICxMlrijg+fhFMwC4Bo/8SFg+FeBVbwR07my88JaLj+3cV/NPolG900qLSa6
uPqJ289EQ86LrHIHXCEWRKYvwy77GFsmBYjKZH8yXpdzUlSGNexV8eIMAz50figu
wYRJGmHrRqwuzFwEGknv8SA3s2HVggXO4WVkWWCeJyO8nIVfYFUhME5l6Q==
=+Hh0
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull more KVM updates from Paolo Bonzini:
"The guest side of the asynchronous page fault work has been delayed to
5.9 in order to sync with Thomas's interrupt entry rework, but here's
the rest of the KVM updates for this merge window.
MIPS:
- Loongson port
PPC:
- Fixes
ARM:
- Fixes
x86:
- KVM_SET_USER_MEMORY_REGION optimizations
- Fixes
- Selftest fixes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (62 commits)
KVM: x86: do not pass poisoned hva to __kvm_set_memory_region
KVM: selftests: fix sync_with_host() in smm_test
KVM: async_pf: Inject 'page ready' event only if 'page not present' was previously injected
KVM: async_pf: Cleanup kvm_setup_async_pf()
kvm: i8254: remove redundant assignment to pointer s
KVM: x86: respect singlestep when emulating instruction
KVM: selftests: Don't probe KVM_CAP_HYPERV_ENLIGHTENED_VMCS when nested VMX is unsupported
KVM: selftests: do not substitute SVM/VMX check with KVM_CAP_NESTED_STATE check
KVM: nVMX: Consult only the "basic" exit reason when routing nested exit
KVM: arm64: Move hyp_symbol_addr() to kvm_asm.h
KVM: arm64: Synchronize sysreg state on injecting an AArch32 exception
KVM: arm64: Make vcpu_cp1x() work on Big Endian hosts
KVM: arm64: Remove host_cpu_context member from vcpu structure
KVM: arm64: Stop sparse from moaning at __hyp_this_cpu_ptr
KVM: arm64: Handle PtrAuth traps early
KVM: x86: Unexport x86_fpu_cache and make it static
KVM: selftests: Ignore KVM 5-level paging support for VM_MODE_PXXV48_4K
KVM: arm64: Save the host's PtrAuth keys in non-preemptible context
KVM: arm64: Stop save/restoring ACTLR_EL1
KVM: arm64: Add emulation for 32bit guests accessing ACTLR2
...
'Page not present' event may or may not get injected depending on
guest's state. If the event wasn't injected, there is no need to
inject the corresponding 'page ready' event as the guest may get
confused. E.g. Linux thinks that the corresponding 'page not present'
event wasn't delivered *yet* and allocates a 'dummy entry' for it.
This entry is never freed.
Note, 'wakeup all' events have no corresponding 'page not present'
event and always get injected.
s390 seems to always be able to inject 'page not present', the
change is effectively a nop.
Suggested-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200610175532.779793-2-vkuznets@redhat.com>
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=208081
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
All architectures define pte_index() as
(address >> PAGE_SHIFT) & (PTRS_PER_PTE - 1)
and all architectures define pte_offset_kernel() as an entry in the array
of PTEs indexed by the pte_index().
For the most architectures the pte_offset_kernel() implementation relies
on the availability of pmd_page_vaddr() that converts a PMD entry value to
the virtual address of the page containing PTEs array.
Let's move x86 definitions of the PTE accessors to the generic place in
<linux/pgtable.h> and then simply drop the respective definitions from the
other architectures.
The architectures that didn't provide pmd_page_vaddr() are updated to have
that defined.
The generic implementation of pte_offset_kernel() can be overridden by an
architecture and alpha makes use of this because it has special ordering
requirements for its version of pte_offset_kernel().
[rppt@linux.ibm.com: v2]
Link: http://lkml.kernel.org/r/20200514170327.31389-11-rppt@kernel.org
[rppt@linux.ibm.com: update]
Link: http://lkml.kernel.org/r/20200514170327.31389-12-rppt@kernel.org
[rppt@linux.ibm.com: update]
Link: http://lkml.kernel.org/r/20200514170327.31389-13-rppt@kernel.org
[akpm@linux-foundation.org: fix x86 warning]
[sfr@canb.auug.org.au: fix powerpc build]
Link: http://lkml.kernel.org/r/20200607153443.GB738695@linux.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200514170327.31389-10-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The replacement of <asm/pgrable.h> with <linux/pgtable.h> made the include
of the latter in the middle of asm includes. Fix this up with the aid of
the below script and manual adjustments here and there.
import sys
import re
if len(sys.argv) is not 3:
print "USAGE: %s <file> <header>" % (sys.argv[0])
sys.exit(1)
hdr_to_move="#include <linux/%s>" % sys.argv[2]
moved = False
in_hdrs = False
with open(sys.argv[1], "r") as f:
lines = f.readlines()
for _line in lines:
line = _line.rstrip('
')
if line == hdr_to_move:
continue
if line.startswith("#include <linux/"):
in_hdrs = True
elif not moved and in_hdrs:
moved = True
print hdr_to_move
print line
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200514170327.31389-4-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The include/linux/pgtable.h is going to be the home of generic page table
manipulation functions.
Start with moving asm-generic/pgtable.h to include/linux/pgtable.h and
make the latter include asm/pgtable.h.
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200514170327.31389-3-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "mm: consolidate definitions of page table accessors", v2.
The low level page table accessors (pXY_index(), pXY_offset()) are
duplicated across all architectures and sometimes more than once. For
instance, we have 31 definition of pgd_offset() for 25 supported
architectures.
Most of these definitions are actually identical and typically it boils
down to, e.g.
static inline unsigned long pmd_index(unsigned long address)
{
return (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1);
}
static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
{
return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address);
}
These definitions can be shared among 90% of the arches provided
XYZ_SHIFT, PTRS_PER_XYZ and xyz_page_vaddr() are defined.
For architectures that really need a custom version there is always
possibility to override the generic version with the usual ifdefs magic.
These patches introduce include/linux/pgtable.h that replaces
include/asm-generic/pgtable.h and add the definitions of the page table
accessors to the new header.
This patch (of 12):
The linux/mm.h header includes <asm/pgtable.h> to allow inlining of the
functions involving page table manipulations, e.g. pte_alloc() and
pmd_alloc(). So, there is no point to explicitly include <asm/pgtable.h>
in the files that include <linux/mm.h>.
The include statements in such cases are remove with a simple loop:
for f in $(git grep -l "include <linux/mm.h>") ; do
sed -i -e '/include <asm\/pgtable.h>/ d' $f
done
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200514170327.31389-1-rppt@kernel.org
Link: http://lkml.kernel.org/r/20200514170327.31389-2-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Add support for multi-function devices in pci code.
- Enable PF-VF linking for architectures using the
pdev->no_vf_scan flag (currently just s390).
- Add reipl from NVMe support.
- Get rid of critical section cleanup in entry.S.
- Refactor PNSO CHSC (perform network subchannel operation) in cio
and qeth.
- QDIO interrupts and error handling fixes and improvements, more
refactoring changes.
- Align ioremap() with generic code.
- Accept requests without the prefetch bit set in vfio-ccw.
- Enable path handling via two new regions in vfio-ccw.
- Other small fixes and improvements all over the code.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAl7eVGcACgkQjYWKoQLX
FBhweQgAkicvx31x230rdfG+jQkQkl0UqF99vvWrJHEll77SqadfjzKAGIjUB+K0
EoeHVD5Wcj7BogDGcyHeQ0bZpu4WzE+y1nmnrsvu7TEEvcBmkJH0rF2jF+y0sb/O
3qvwFkX/CB5OqaMzKC/AEeRpcCKR+ZUXkWu1irbYth7CBXaycD9EAPc4cj8CfYGZ
r5njUdYOVk77TaO4aV+t5pCYc5TCRJaWXSsWaAv/nuLcIqsFBYOy2q+L47zITGXp
utZVanIDjzx+ikpaKicOIfC3hJsRuNX9MnlZKsQFwpVEZAUZmIUm29XdhGJTWSxU
RV7m1ORINbFP1nGAqWqkOvGo/LC0ZA==
=VhXR
-----END PGP SIGNATURE-----
Merge tag 's390-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Vasily Gorbik:
- Add support for multi-function devices in pci code.
- Enable PF-VF linking for architectures using the pdev->no_vf_scan
flag (currently just s390).
- Add reipl from NVMe support.
- Get rid of critical section cleanup in entry.S.
- Refactor PNSO CHSC (perform network subchannel operation) in cio and
qeth.
- QDIO interrupts and error handling fixes and improvements, more
refactoring changes.
- Align ioremap() with generic code.
- Accept requests without the prefetch bit set in vfio-ccw.
- Enable path handling via two new regions in vfio-ccw.
- Other small fixes and improvements all over the code.
* tag 's390-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (52 commits)
vfio-ccw: make vfio_ccw_regops variables declarations static
vfio-ccw: Add trace for CRW event
vfio-ccw: Wire up the CRW irq and CRW region
vfio-ccw: Introduce a new CRW region
vfio-ccw: Refactor IRQ handlers
vfio-ccw: Introduce a new schib region
vfio-ccw: Refactor the unregister of the async regions
vfio-ccw: Register a chp_event callback for vfio-ccw
vfio-ccw: Introduce new helper functions to free/destroy regions
vfio-ccw: document possible errors
vfio-ccw: Enable transparent CCW IPL from DASD
s390/pci: Log new handle in clp_disable_fh()
s390/cio, s390/qeth: cleanup PNSO CHSC
s390/qdio: remove q->first_to_kick
s390/qdio: fix up qdio_start_irq() kerneldoc
s390: remove critical section cleanup from entry.S
s390: add machine check SIGP
s390/pci: ioremap() align with generic code
s390/ap: introduce new ap function ap_get_qdev()
Documentation/s390: Update / remove developerWorks web links
...
- Support for userspace to send requests directly to the on-chip GZIP
accelerator on Power9.
- Rework of our lockless page table walking (__find_linux_pte()) to make it
safe against parallel page table manipulations without relying on an IPI for
serialisation.
- A series of fixes & enhancements to make our machine check handling more
robust.
- Lots of plumbing to add support for "prefixed" (64-bit) instructions on
Power10.
- Support for using huge pages for the linear mapping on 8xx (32-bit).
- Remove obsolete Xilinx PPC405/PPC440 support, and an associated sound driver.
- Removal of some obsolete 40x platforms and associated cruft.
- Initial support for booting on Power10.
- Lots of other small features, cleanups & fixes.
Thanks to:
Alexey Kardashevskiy, Alistair Popple, Andrew Donnellan, Andrey Abramov,
Aneesh Kumar K.V, Balamuruhan S, Bharata B Rao, Bulent Abali, Cédric Le
Goater, Chen Zhou, Christian Zigotzky, Christophe JAILLET, Christophe Leroy,
Dmitry Torokhov, Emmanuel Nicolet, Erhard F., Gautham R. Shenoy, Geoff Levand,
George Spelvin, Greg Kurz, Gustavo A. R. Silva, Gustavo Walbon, Haren Myneni,
Hari Bathini, Joel Stanley, Jordan Niethe, Kajol Jain, Kees Cook, Leonardo
Bras, Madhavan Srinivasan., Mahesh Salgaonkar, Markus Elfring, Michael
Neuling, Michal Simek, Nathan Chancellor, Nathan Lynch, Naveen N. Rao,
Nicholas Piggin, Oliver O'Halloran, Paul Mackerras, Pingfan Liu, Qian Cai, Ram
Pai, Raphael Moreira Zinsly, Ravi Bangoria, Sam Bobroff, Sandipan Das, Segher
Boessenkool, Stephen Rothwell, Sukadev Bhattiprolu, Tyrel Datwyler, Wolfram
Sang, Xiongfeng Wang.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl7aYZ8THG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgPiKD/9zNCuZLFMAFrIdbm0HlYA2RGYZFT75
GUHsqYyei1pxA7PgM3KwJiXELVODsBv0eQbgNh1tbecKrxPRegN/cywd1KLjPZ7I
v5/qweQP8MvR0RhzjbhvUcO0jq/f8u2LbJr5mUfVzjU6tAvrvcWo3oZqDElsekCS
kgyOH3r1vZ2PLTMiGFhb0gWi2iqc+6BHU1AFCGPCMjB1Vu5d5+54VvZ/6lllGsOF
yg9CBXmmVvQ+Bn6tH4zdEB78FYxnAIwBqlbmL79i5ca+HQJ0Sw6HuPRy9XYq35p6
2EiXS4Wrgp7i7+1TN3HO362u5Onb8TSyQU7NS6yCFPoJ6JQxcJMBIw6mHhnXOPuZ
CrjgcdwUMjx8uDoKmX1Epbfuex2w+AysW+4yBHPFiSgl3klKC3D0wi95mR485w2F
rN8uzJtrDeFKcYZJG7IoB/cgFCCPKGf9HaXr8q0S/jBKMffx91ul3cfzlfdIXOCw
FDNw/+ZX7UD6ddFEG12ZTO+vdL8yf1uCRT/DIZwUiDMIA0+M6F4nc7j3lfyZfoO1
65f9UlhoLxScq7VH2fKH4UtZatO9cPID2z1CmiY4UbUIPtFDepSuYClgLF+Duf4b
rkfxhKU0+Ja1zNH5XNc+L+Bc5/W4lFiJXz02dYIjtHoUpWkc1aToOETVwzggYFNM
G3PXIBOI0jRgRw==
=o0WU
-----END PGP SIGNATURE-----
Merge tag 'powerpc-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- Support for userspace to send requests directly to the on-chip GZIP
accelerator on Power9.
- Rework of our lockless page table walking (__find_linux_pte()) to
make it safe against parallel page table manipulations without
relying on an IPI for serialisation.
- A series of fixes & enhancements to make our machine check handling
more robust.
- Lots of plumbing to add support for "prefixed" (64-bit) instructions
on Power10.
- Support for using huge pages for the linear mapping on 8xx (32-bit).
- Remove obsolete Xilinx PPC405/PPC440 support, and an associated sound
driver.
- Removal of some obsolete 40x platforms and associated cruft.
- Initial support for booting on Power10.
- Lots of other small features, cleanups & fixes.
Thanks to: Alexey Kardashevskiy, Alistair Popple, Andrew Donnellan,
Andrey Abramov, Aneesh Kumar K.V, Balamuruhan S, Bharata B Rao, Bulent
Abali, Cédric Le Goater, Chen Zhou, Christian Zigotzky, Christophe
JAILLET, Christophe Leroy, Dmitry Torokhov, Emmanuel Nicolet, Erhard F.,
Gautham R. Shenoy, Geoff Levand, George Spelvin, Greg Kurz, Gustavo A.
R. Silva, Gustavo Walbon, Haren Myneni, Hari Bathini, Joel Stanley,
Jordan Niethe, Kajol Jain, Kees Cook, Leonardo Bras, Madhavan
Srinivasan., Mahesh Salgaonkar, Markus Elfring, Michael Neuling, Michal
Simek, Nathan Chancellor, Nathan Lynch, Naveen N. Rao, Nicholas Piggin,
Oliver O'Halloran, Paul Mackerras, Pingfan Liu, Qian Cai, Ram Pai,
Raphael Moreira Zinsly, Ravi Bangoria, Sam Bobroff, Sandipan Das, Segher
Boessenkool, Stephen Rothwell, Sukadev Bhattiprolu, Tyrel Datwyler,
Wolfram Sang, Xiongfeng Wang.
* tag 'powerpc-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (299 commits)
powerpc/pseries: Make vio and ibmebus initcalls pseries specific
cxl: Remove dead Kconfig options
powerpc: Add POWER10 architected mode
powerpc/dt_cpu_ftrs: Add MMA feature
powerpc/dt_cpu_ftrs: Enable Prefixed Instructions
powerpc/dt_cpu_ftrs: Advertise support for ISA v3.1 if selected
powerpc: Add support for ISA v3.1
powerpc: Add new HWCAP bits
powerpc/64s: Don't set FSCR bits in INIT_THREAD
powerpc/64s: Save FSCR to init_task.thread.fscr after feature init
powerpc/64s: Don't let DT CPU features set FSCR_DSCR
powerpc/64s: Don't init FSCR_DSCR in __init_FSCR()
powerpc/32s: Fix another build failure with CONFIG_PPC_KUAP_DEBUG
powerpc/module_64: Use special stub for _mcount() with -mprofile-kernel
powerpc/module_64: Simplify check for -mprofile-kernel ftrace relocations
powerpc/module_64: Consolidate ftrace code
powerpc/32: Disable KASAN with pages bigger than 16k
powerpc/uaccess: Don't set KUEP by default on book3s/32
powerpc/uaccess: Don't set KUAP by default on book3s/32
powerpc/8xx: Reduce time spent in allow_user_access() and friends
...
Pull livepatching updates from Jiri Kosina:
- simplifications and improvements for issues Peter Ziljstra found
during his previous work on W^X cleanups.
This allows us to remove livepatch arch-specific .klp.arch sections
and add proper support for jump labels in patched code.
Also, this patchset removes the last module_disable_ro() usage in the
tree.
Patches from Josh Poimboeuf and Peter Zijlstra
- a few other minor cleanups
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching:
MAINTAINERS: add lib/livepatch to LIVE PATCHING
livepatch: add arch-specific headers to MAINTAINERS
livepatch: Make klp_apply_object_relocs static
MAINTAINERS: adjust to livepatch .klp.arch removal
module: Make module_enable_ro() static again
x86/module: Use text_mutex in apply_relocate_add()
module: Remove module_disable_ro()
livepatch: Remove module_disable_ro() usage
x86/module: Use text_poke() for late relocations
s390/module: Use s390_kernel_write() for late relocations
s390: Change s390_kernel_write() return type to match memcpy()
livepatch: Prevent module-specific KLP rela sections from referencing vmlinux symbols
livepatch: Remove .klp.arch
livepatch: Apply vmlinux-specific KLP relocations early
livepatch: Disallow vmlinux.ko
Merge more updates from Andrew Morton:
"More mm/ work, plenty more to come
Subsystems affected by this patch series: slub, memcg, gup, kasan,
pagealloc, hugetlb, vmscan, tools, mempolicy, memblock, hugetlbfs,
thp, mmap, kconfig"
* akpm: (131 commits)
arm64: mm: use ARCH_HAS_DEBUG_WX instead of arch defined
x86: mm: use ARCH_HAS_DEBUG_WX instead of arch defined
riscv: support DEBUG_WX
mm: add DEBUG_WX support
drivers/base/memory.c: cache memory blocks in xarray to accelerate lookup
mm/thp: rename pmd_mknotpresent() as pmd_mkinvalid()
powerpc/mm: drop platform defined pmd_mknotpresent()
mm: thp: don't need to drain lru cache when splitting and mlocking THP
hugetlbfs: get unmapped area below TASK_UNMAPPED_BASE for hugetlbfs
sparc32: register memory occupied by kernel as memblock.memory
include/linux/memblock.h: fix minor typo and unclear comment
mm, mempolicy: fix up gup usage in lookup_node
tools/vm/page_owner_sort.c: filter out unneeded line
mm: swap: memcg: fix memcg stats for huge pages
mm: swap: fix vmstats for huge pages
mm: vmscan: limit the range of LRU type balancing
mm: vmscan: reclaim writepage is IO cost
mm: vmscan: determine anon/file pressure balance at the reclaim root
mm: balance LRU lists based on relative thrashing
mm: only count actual rotations as LRU reclaim cost
...
There are multiple similar definitions for arch_clear_hugepage_flags() on
various platforms. Lets just add it's generic fallback definition for
platforms that do not override. This help reduce code duplication.
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: http://lkml.kernel.org/r/1588907271-11920-4-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are multiple similar definitions for is_hugepage_only_range() on
various platforms. Lets just add it's generic fallback definition for
platforms that do not override. This help reduce code duplication.
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: http://lkml.kernel.org/r/1588907271-11920-3-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull networking updates from David Miller:
1) Allow setting bluetooth L2CAP modes via socket option, from Luiz
Augusto von Dentz.
2) Add GSO partial support to igc, from Sasha Neftin.
3) Several cleanups and improvements to r8169 from Heiner Kallweit.
4) Add IF_OPER_TESTING link state and use it when ethtool triggers a
device self-test. From Andrew Lunn.
5) Start moving away from custom driver versions, use the globally
defined kernel version instead, from Leon Romanovsky.
6) Support GRO vis gro_cells in DSA layer, from Alexander Lobakin.
7) Allow hard IRQ deferral during NAPI, from Eric Dumazet.
8) Add sriov and vf support to hinic, from Luo bin.
9) Support Media Redundancy Protocol (MRP) in the bridging code, from
Horatiu Vultur.
10) Support netmap in the nft_nat code, from Pablo Neira Ayuso.
11) Allow UDPv6 encapsulation of ESP in the ipsec code, from Sabrina
Dubroca. Also add ipv6 support for espintcp.
12) Lots of ReST conversions of the networking documentation, from Mauro
Carvalho Chehab.
13) Support configuration of ethtool rxnfc flows in bcmgenet driver,
from Doug Berger.
14) Allow to dump cgroup id and filter by it in inet_diag code, from
Dmitry Yakunin.
15) Add infrastructure to export netlink attribute policies to
userspace, from Johannes Berg.
16) Several optimizations to sch_fq scheduler, from Eric Dumazet.
17) Fallback to the default qdisc if qdisc init fails because otherwise
a packet scheduler init failure will make a device inoperative. From
Jesper Dangaard Brouer.
18) Several RISCV bpf jit optimizations, from Luke Nelson.
19) Correct the return type of the ->ndo_start_xmit() method in several
drivers, it's netdev_tx_t but many drivers were using
'int'. From Yunjian Wang.
20) Add an ethtool interface for PHY master/slave config, from Oleksij
Rempel.
21) Add BPF iterators, from Yonghang Song.
22) Add cable test infrastructure, including ethool interfaces, from
Andrew Lunn. Marvell PHY driver is the first to support this
facility.
23) Remove zero-length arrays all over, from Gustavo A. R. Silva.
24) Calculate and maintain an explicit frame size in XDP, from Jesper
Dangaard Brouer.
25) Add CAP_BPF, from Alexei Starovoitov.
26) Support terse dumps in the packet scheduler, from Vlad Buslov.
27) Support XDP_TX bulking in dpaa2 driver, from Ioana Ciornei.
28) Add devm_register_netdev(), from Bartosz Golaszewski.
29) Minimize qdisc resets, from Cong Wang.
30) Get rid of kernel_getsockopt and kernel_setsockopt in order to
eliminate set_fs/get_fs calls. From Christoph Hellwig.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2517 commits)
selftests: net: ip_defrag: ignore EPERM
net_failover: fixed rollback in net_failover_open()
Revert "tipc: Fix potential tipc_aead refcnt leak in tipc_crypto_rcv"
Revert "tipc: Fix potential tipc_node refcnt leak in tipc_rcv"
vmxnet3: allow rx flow hash ops only when rss is enabled
hinic: add set_channels ethtool_ops support
selftests/bpf: Add a default $(CXX) value
tools/bpf: Don't use $(COMPILE.c)
bpf, selftests: Use bpf_probe_read_kernel
s390/bpf: Use bcr 0,%0 as tail call nop filler
s390/bpf: Maintain 8-byte stack alignment
selftests/bpf: Fix verifier test
selftests/bpf: Fix sample_cnt shared between two threads
bpf, selftests: Adapt cls_redirect to call csum_level helper
bpf: Add csum_level helper for fixing up csum levels
bpf: Fix up bpf_skb_adjust_room helper's skb csum setting
sfc: add missing annotation for efx_ef10_try_update_nic_stats_vf()
crypto/chtls: IPv6 support for inline TLS
Crypto/chcr: Fixes a coccinile check error
Crypto/chcr: Fixes compilations warnings
...
- Move the arch-specific code into arch/arm64/kvm
- Start the post-32bit cleanup
- Cherry-pick a few non-invasive pre-NV patches
x86:
- Rework of TLB flushing
- Rework of event injection, especially with respect to nested virtualization
- Nested AMD event injection facelift, building on the rework of generic code
and fixing a lot of corner cases
- Nested AMD live migration support
- Optimization for TSC deadline MSR writes and IPIs
- Various cleanups
- Asynchronous page fault cleanups (from tglx, common topic branch with tip tree)
- Interrupt-based delivery of asynchronous "page ready" events (host side)
- Hyper-V MSRs and hypercalls for guest debugging
- VMX preemption timer fixes
s390:
- Cleanups
Generic:
- switch vCPU thread wakeup from swait to rcuwait
The other architectures, and the guest side of the asynchronous page fault
work, will come next week.
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl7VJcYUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroPf6QgAq4wU5wdd1lTGz/i3DIhNVJNJgJlp
ozLzRdMaJbdbn5RpAK6PEBd9+pt3+UlojpFB3gpJh2Nazv2OzV4yLQgXXXyyMEx1
5Hg7b4UCJYDrbkCiegNRv7f/4FWDkQ9dx++RZITIbxeskBBCEI+I7GnmZhGWzuC4
7kj4ytuKAySF2OEJu0VQF6u0CvrNYfYbQIRKBXjtOwuRK4Q6L63FGMJpYo159MBQ
asg3B1jB5TcuGZ9zrjL5LkuzaP4qZZHIRs+4kZsH9I6MODHGUxKonrkablfKxyKy
CFK+iaHCuEXXty5K0VmWM3nrTfvpEjVjbMc7e1QGBQ5oXsDM0pqn84syRg==
=v7Wn
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"ARM:
- Move the arch-specific code into arch/arm64/kvm
- Start the post-32bit cleanup
- Cherry-pick a few non-invasive pre-NV patches
x86:
- Rework of TLB flushing
- Rework of event injection, especially with respect to nested
virtualization
- Nested AMD event injection facelift, building on the rework of
generic code and fixing a lot of corner cases
- Nested AMD live migration support
- Optimization for TSC deadline MSR writes and IPIs
- Various cleanups
- Asynchronous page fault cleanups (from tglx, common topic branch
with tip tree)
- Interrupt-based delivery of asynchronous "page ready" events (host
side)
- Hyper-V MSRs and hypercalls for guest debugging
- VMX preemption timer fixes
s390:
- Cleanups
Generic:
- switch vCPU thread wakeup from swait to rcuwait
The other architectures, and the guest side of the asynchronous page
fault work, will come next week"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (256 commits)
KVM: selftests: fix rdtsc() for vmx_tsc_adjust_test
KVM: check userspace_addr for all memslots
KVM: selftests: update hyperv_cpuid with SynDBG tests
x86/kvm/hyper-v: Add support for synthetic debugger via hypercalls
x86/kvm/hyper-v: enable hypercalls regardless of hypercall page
x86/kvm/hyper-v: Add support for synthetic debugger interface
x86/hyper-v: Add synthetic debugger definitions
KVM: selftests: VMX preemption timer migration test
KVM: nVMX: Fix VMX preemption timer migration
x86/kvm/hyper-v: Explicitly align hcall param for kvm_hyperv_exit
KVM: x86/pmu: Support full width counting
KVM: x86/pmu: Tweak kvm_pmu_get_msr to pass 'struct msr_data' in
KVM: x86: announce KVM_FEATURE_ASYNC_PF_INT
KVM: x86: acknowledgment mechanism for async pf page ready notifications
KVM: x86: interrupt based APF 'page ready' event delivery
KVM: introduce kvm_read_guest_offset_cached()
KVM: rename kvm_arch_can_inject_async_page_present() to kvm_arch_can_dequeue_async_page_present()
KVM: x86: extend struct kvm_vcpu_pv_apf_data with token info
Revert "KVM: async_pf: Fix #DF due to inject "Page not Present" and "Page Ready" exceptions simultaneously"
KVM: VMX: Replace zero-length array with flexible-array
...
Pull uaccess/csum updates from Al Viro:
"Regularize the sitation with uaccess checksum primitives:
- fold csum_partial_... into csum_and_copy_..._user()
- on x86 collapse several access_ok()/stac()/clac() into
user_access_begin()/user_access_end()"
* 'uaccess.csum' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
default csum_and_copy_to_user(): don't bother with access_ok()
take the dummy csum_and_copy_from_user() into net/checksum.h
arm: switch to csum_and_copy_from_user()
sh32: convert to csum_and_copy_from_user()
m68k: convert to csum_and_copy_from_user()
xtensa: switch to providing csum_and_copy_from_user()
sparc: switch to providing csum_and_copy_from_user()
parisc: turn csum_partial_copy_from_user() into csum_and_copy_from_user()
alpha: turn csum_partial_copy_from_user() into csum_and_copy_from_user()
ia64: turn csum_partial_copy_from_user() into csum_and_copy_from_user()
ia64: csum_partial_copy_nocheck(): don't abuse csum_partial_copy_from_user()
x86: switch 32bit csum_and_copy_to_user() to user_access_{begin,end}()
x86: switch both 32bit and 64bit to providing csum_and_copy_from_user()
x86_64: csum_..._copy_..._user(): switch to unsafe_..._user()
get rid of csum_partial_copy_to_user()
If two page ready notifications happen back to back the second one is not
delivered and the only mechanism we currently have is
kvm_check_async_pf_completion() check in vcpu_run() loop. The check will
only be performed with the next vmexit when it happens and in some cases
it may take a while. With interrupt based page ready notification delivery
the situation is even worse: unlike exceptions, interrupts are not handled
immediately so we must check if the slot is empty. This is slow and
unnecessary. Introduce dedicated MSR_KVM_ASYNC_PF_ACK MSR to communicate
the fact that the slot is free and host should check its notification
queue. Mandate using it for interrupt based 'page ready' APF event
delivery.
As kvm_check_async_pf_completion() is going away from vcpu_run() we need
a way to communicate the fact that vcpu->async_pf.done queue has
transitioned from empty to non-empty state. Introduce
kvm_arch_async_page_present_queued() and KVM_REQ_APF_READY to do the job.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200525144125.143875-7-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
An innocent reader of the following x86 KVM code:
bool kvm_arch_can_inject_async_page_present(struct kvm_vcpu *vcpu)
{
if (!(vcpu->arch.apf.msr_val & KVM_ASYNC_PF_ENABLED))
return true;
...
may get very confused: if APF mechanism is not enabled, why do we report
that we 'can inject async page present'? In reality, upon injection
kvm_arch_async_page_present() will check the same condition again and,
in case APF is disabled, will just drop the item. This is fine as the
guest which deliberately disabled APF doesn't expect to get any APF
notifications.
Rename kvm_arch_can_inject_async_page_present() to
kvm_arch_can_dequeue_async_page_present() to make it clear what we are
checking: if the item can be dequeued (meaning either injected or just
dropped).
On s390 kvm_arch_can_inject_async_page_present() always returns 'true' so
the rename doesn't matter much.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200525144125.143875-4-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
now that can be done conveniently - all non-trivial cases have
_HAVE_ARCH_COPY_AND_CSUM_FROM_USER defined, so the fallback in
net/checksum.h is used only for dummy (copy_from_user, then
csum_partial) implementation. Allowing us to get rid of all
dummy instances, both of csum_and_copy_from_user() and
csum_partial_copy_from_user().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
CHSC3D (PNSO - perform network subchannel operation) is used for
OC0 (Store-network-bridging-information) as well as for
OC3 (Store-network-address-information). So common fields are renamed
from *brinfo* to *pnso*.
Also *_bridge_host_* is changed into *_addr_change_*, e.g.
qeth_bridge_host_event to qeth_addr_change_event, for the
same reasons.
The keywords in the card traces are changed accordingly.
Remove unused L3 types, as PNSO will only return Layer2 entries.
Make PNSO CHSC implementation more consistent with existing API usage:
Add new function ccw_device_pnso() to drivers/s390/cio/device_ops.c and
the function declaration to arch/s390/include/asm/ccwdev.h, which takes
a struct ccw_device * as parameter instead of schid and calls
chsc_pnso().
PNSO CHSC has no strict relationship to qdio. So move the calling
function from qdio to qeth_l2 and move the necessary structures to a
new file arch/s390/include/asm/chsc.h.
Do response code evaluation only in chsc_error_from_response() and
use return code in all other places. qeth_anset_makerc() was meant to
evaluate the PNSO response code, but never did, because pnso_rc was
already non-zero.
Indentation was corrected in some places.
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Reviewed-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
The current code is rather complex and caused a lot of subtle
and hard to debug bugs in the past. Simplify the code by calling
the system_call handler with interrupts disabled, save
machine state, and re-enable them later.
This requires significant changes to the machine check handling code
as well. When the machine check interrupt arrived while being in kernel
mode the new code will signal pending machine checks with a SIGP external
call. When userspace was interrupted, the handler will switch to the
kernel stack and directly execute s390_handle_mcck().
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
This will be used with the upcoming entry.S changes to signal
that there's a machine check pending that cannot be handled in
the Machine check handler itself.
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
The MSCC bug fix in 'net' had to be slightly adjusted because the
register accesses are done slightly differently in net-next.
Signed-off-by: David S. Miller <davem@davemloft.net>
Let's use the same signature and parameter names as in the generic
ioremap() definition making the physical address' type explicit.
Add a check against address wrap around as in the generic
lib/ioremap.c:ioremap_prot() code.
Finally use free_vm_area() instead of vunmap() as in the generic code.
Besides being clearer free_vm_area() can also skip a few additional
checks compared with vunmap().
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
On s390 PCI Virtual Functions (VFs) are scanned by firmware and are made
available to Linux via the hot-plug interface. As such the common code
path of doing the scan directly using the parent Physical Function (PF)
is not used and fenced off with the no_vf_scan attribute.
Even if the partition created the VFs itself e.g. using the sriov_numvfs
attribute of a PF, the PF/VF links thus need to be established after the
fact. To do this when a VF is plugged we scan through all functions on
the same zbus and test whether they are the parent PF in which case we
establish the necessary links.
With these links established there is now no more need to fence off
pci_iov_remove_virtfn() for pdev->no_vf_scan as the common code now
works fine.
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Link: https://lore.kernel.org/r/20200506154139.90609-3-schnelle@linux.ibm.com
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
commit 5e1fb45ec8 ("s390/ccwgroup: remove pm support") removed power
management support from the ccwgroup bus driver. So remove the
associated callbacks from all ccwgroup drivers.
CC: Vineeth Vijayan <vneethv@linux.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Two new stats for exposing halt-polling cpu usage:
halt_poll_success_ns
halt_poll_fail_ns
Thus sum of these 2 stats is the total cpu time spent polling. "success"
means the VCPU polled until a virtual interrupt was delivered. "fail"
means the VCPU had to schedule out (either because the maximum poll time
was reached or it needed to yield the CPU).
To avoid touching every arch's kvm_vcpu_stat struct, only update and
export halt-polling cpu usage stats if we're on x86.
Exporting cpu usage as a u64 and in nanoseconds means we will overflow at
~500 years, which seems reasonably large.
Signed-off-by: David Matlack <dmatlack@google.com>
Signed-off-by: Jon Cargille <jcargill@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Message-Id: <20200508182240.68440-1-jcargill@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The s390_mmio_read/write syscalls are currently broken when running with
MIO.
The new pcistb_mio/pcstg_mio/pcilg_mio instructions are executed
similiarly to normal load/store instructions and do address translation
in the current address space. That means inside the kernel they are
aware of mappings into kernel address space while outside the kernel
they use user space mappings (usually created through mmap'ing a PCI
device file).
Now when existing user space applications use the s390_pci_mmio_write
and s390_pci_mmio_read syscalls, they pass I/O addresses that are mapped
into user space so as to be usable with the new instructions without
needing a syscall. Accessing these addresses with the old instructions
as done currently leads to a kernel panic.
Also, for such a user space mapping there may not exist an equivalent
kernel space mapping which means we can't just use the new instructions
in kernel space.
Instead of replicating user mappings in the kernel which then might
collide with other mappings, we can conceptually execute the new
instructions as if executed by the user space application using the
secondary address space. This even allows us to directly store to the
user pointer without the need for copy_to/from_user().
Cc: stable@vger.kernel.org
Fixes: 71ba41c9b1 ("s390/pci: provide support for MIO instructions")
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
s390_kernel_write()'s function type is almost identical to memcpy().
Change its return type to "void *" so they can be used interchangeably.
Cc: linux-s390@vger.kernel.org
Cc: heiko.carstens@de.ibm.com
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Joe Lawrence <joe.lawrence@redhat.com>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> # s390
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Recognize IPL Block's Ipl Type of "nvme". Populate related structs and sysfs
entries.
Signed-off-by: Jason J. Herne <jjherne@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
We allow multiple functions on a single bus.
We suppress the ZPCI_DEVFN definition and replace its
occurences with zpci->devfn.
We verify the number of device during the registration.
There can never be more domains in use than existing
devices, so we do not need to verify the count of domain
after having verified the count of devices.
Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
The current PCI implementation do not provide a bus resource.
This leads to a notice being print at boot.
Let's do it more nicely and provide the bus resource.
Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
The zPCI bus is in charge to handle common zPCI resources for
zPCI devices.
Creating the zPCI bus, the PCI bus, the zPCI devices and the
PCI devices and hotplug slots
done in a specific order:
- PCI hotplug slot creation needs a PCI bus
- PCI bus needs a PCI domain
which is reported by the pci_domain_nr() when setting up the
host bridge
- PCI domain is set from the zPCI with devfn 0
this is necessary to have a reproducible enumeration
Therefore we can not create devices or hotplug slots for any PCI
device associated with a zPCI device before having discovered
the function zero of the bus.
The discovery and initialization of devices can be done at several
points in the code:
- On Events, serialized in a thread context
- On initialization, in the kernel init thread context
- When powering on the hotplug slot, in a user thread context
The removal of devices and their parent bus may also be done on
events or for devices when powering down the slot.
To guarantee the existence of the bus and devices until they are
no more needed we use kref in zPCI bus and introduce a reference
count in the zPCI devices.
In this patch the zPCI bus still only accept a device with
a devfn 0.
Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Firmware provides the bus/devfn part of the PCI addresses of a zPCI
function inside the new field RID of the CLP query PCI function
with a bit to know if this field is available to use.
Let's add these fields to the clp_rsp_query_pci structure,
add corresponding fields to zdev and initialize them.
Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Using PCI multifunctions in S390 is a new feature we may want
to ignore to continue provide the same topology as in the past
to userland even if the configuration supports exposing the
topology of a multi-Function device.
A new boolean parameters allows to overwrite the kernel
pci configuration:
- pci=norid when on, disallow the use a new firmware field,
RID, which provides the PCI <bus>:<device>.<function> part
of the PCI address.
To be used in the following patches and satisfy the checkpatch.pl
the variable is exposed in pci.h
Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
In the future the bus sysdata may not directly point to the
zpci_dev.
In preparation of upcoming patches let us abstract the
access to the zpci_dev from the device inside the pci device.
Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Add SysFS attribute that provides the port number for PCI functions
representing a single port of a multi-port device.
Signed-off-by: Alexander Schmidt <alexs@linux.ibm.com>
Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Merge yet more updates from Andrew Morton:
- Almost all of the rest of MM (memcg, slab-generic, slab, pagealloc,
gup, hugetlb, pagemap, memremap)
- Various other things (hfs, ocfs2, kmod, misc, seqfile)
* akpm: (34 commits)
ipc/util.c: sysvipc_find_ipc() should increase position index
kernel/gcov/fs.c: gcov_seq_next() should increase position index
fs/seq_file.c: seq_read(): add info message about buggy .next functions
drivers/dma/tegra20-apb-dma.c: fix platform_get_irq.cocci warnings
change email address for Pali Rohár
selftests: kmod: test disabling module autoloading
selftests: kmod: fix handling test numbers above 9
docs: admin-guide: document the kernel.modprobe sysctl
fs/filesystems.c: downgrade user-reachable WARN_ONCE() to pr_warn_once()
kmod: make request_module() return an error when autoloading is disabled
mm/memremap: set caching mode for PCI P2PDMA memory to WC
mm/memory_hotplug: add pgprot_t to mhp_params
powerpc/mm: thread pgprot_t through create_section_mapping()
x86/mm: introduce __set_memory_prot()
x86/mm: thread pgprot_t through init_memory_mapping()
mm/memory_hotplug: rename mhp_restrictions to mhp_params
mm/memory_hotplug: drop the flags field from struct mhp_restrictions
mm/special: create generic fallbacks for pte_special() and pte_mkspecial()
mm/vma: introduce VM_ACCESS_FLAGS
mm/vma: define a default value for VM_DATA_DEFAULT_FLAGS
...
There are many platforms with exact same value for VM_DATA_DEFAULT_FLAGS
This creates a default value for VM_DATA_DEFAULT_FLAGS in line with the
existing VM_STACK_DEFAULT_FLAGS. While here, also define some more
macros with standard VMA access flag combinations that are used
frequently across many platforms. Apart from simplification, this
reduces code duplication as well.
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Guo Ren <guoren@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paulburton@kernel.org>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Rich Felker <dalias@libc.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Chris Zankel <chris@zankel.net>
Link: http://lkml.kernel.org/r/1583391014-8170-2-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Upper-layer drivers allocate their SBALs by calling qdio_alloc_buffers()
for each individual queue. But when later passing the SBAL addresses to
qdio_establish(), they need to be in a single array of pointers.
So if the driver uses multiple Input or Output queues, it needs to
allocate a temporary array just to present all its SBAL pointers in this
layout.
This patch slightly changes the format of the QDIO initialization data,
so that drivers can pass a per-queue array where each element points to
a queue's SBAL array.
zfcp doesn't use multiple queues, so the impact there is trivial.
For qeth this brings a nice reduction in complexity, and removes
a page-sized allocation.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
All that qdio_allocate() actually uses from the init_data is the cdev,
and the number of Input and Output Queues. Have the driver pass those as
parameters, and defer the init_data processing into qdio_establish().
This includes writing per-device(!) trace entries, and most of the
sanity checks.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
- Update maintainers. Niklas Schnelle takes over zpci and Vineeth Vijayan
common io code.
- Extend cpuinfo to include topology information.
- Add new extended counters for IBM z15 and sampling buffer allocation
rework in perf code.
- Add control over zeroing out memory during system restart.
- CCA protected key block version 2 support and other fixes/improvements
in crypto code.
- Convert to new fallthrough; annotations.
- Replace zero-length arrays with flexible-arrays.
- QDIO debugfs and other small improvements.
- Drop 2-level paging support optimization for compat tasks. Varios
mm cleanups.
- Remove broken and unused hibernate / power management support.
- Remove fake numa support which does not bring any benefits.
- Exclude offline CPUs from CPU topology masks to be more consistent
with other architectures.
- Prevent last branching instruction address leaking to userspace.
- Other small various fixes and improvements all over the code.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAl6Ig2YACgkQjYWKoQLX
FBj2gggAibnHOl9d0ngX1mVT4nz51R3V8z5sEQjNMr2uHBmaTqs7pi/00gaFMxoC
NngVEXvL443jSogQivthGgXPpRCV9xdKE3sp38j7fF4LgHoeuDtGd1oaX4W9Rqk0
7Yii35EaO2e2WHdOKaAbu+ZvDRunFjERyntc51MYaIUivFosogSo07vC73vFIArF
VGStS09fJ4Ny76ott896T7Ulx1Iek/MkF1vponEMLGNUIcLIQbbxZxOwgz0pHuEF
SlyyJBnhOIaAJGOYlKREQDt1cew+hsxluPU+a01bwdsmdZv9LH1BGwLayDqTH58i
QWvtEpzJFmDvo9jGM1v81ebaGnyCKg==
=hiGF
-----END PGP SIGNATURE-----
Merge tag 's390-5.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Vasily Gorbik:
- Update maintainers. Niklas Schnelle takes over zpci and Vineeth
Vijayan common io code.
- Extend cpuinfo to include topology information.
- Add new extended counters for IBM z15 and sampling buffer allocation
rework in perf code.
- Add control over zeroing out memory during system restart.
- CCA protected key block version 2 support and other
fixes/improvements in crypto code.
- Convert to new fallthrough; annotations.
- Replace zero-length arrays with flexible-arrays.
- QDIO debugfs and other small improvements.
- Drop 2-level paging support optimization for compat tasks. Varios mm
cleanups.
- Remove broken and unused hibernate / power management support.
- Remove fake numa support which does not bring any benefits.
- Exclude offline CPUs from CPU topology masks to be more consistent
with other architectures.
- Prevent last branching instruction address leaking to userspace.
- Other small various fixes and improvements all over the code.
* tag 's390-5.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (57 commits)
s390/mm: cleanup init_new_context() callback
s390/mm: cleanup virtual memory constants usage
s390/mm: remove page table downgrade support
s390/qdio: set qdio_irq->cdev at allocation time
s390/qdio: remove unused function declarations
s390/ccwgroup: remove pm support
s390/ap: remove power management code from ap bus and drivers
s390/zcrypt: use kvmalloc instead of kmalloc for 256k alloc
s390/mm: cleanup arch_get_unmapped_area() and friends
s390/ism: remove pm support
s390/cio: use fallthrough;
s390/vfio: use fallthrough;
s390/zcrypt: use fallthrough;
s390: use fallthrough;
s390/cpum_sf: Fix wrong page count in error message
s390/diag: fix display of diagnose call statistics
s390/ap: Remove ap device suspend and resume callbacks
s390/pci: Improve handling of unset UID
s390/pci: Fix zpci_alloc_domain() over allocation
s390/qdio: pass ISC as parameter to chsc_sadc()
...
* GICv4.1 support
* 32bit host removal
PPC:
* secure (encrypted) using under the Protected Execution Framework
ultravisor
s390:
* allow disabling GISA (hardware interrupt injection) and protected
VMs/ultravisor support.
x86:
* New dirty bitmap flag that sets all bits in the bitmap when dirty
page logging is enabled; this is faster because it doesn't require bulk
modification of the page tables.
* Initial work on making nested SVM event injection more similar to VMX,
and less buggy.
* Various cleanups to MMU code (though the big ones and related
optimizations were delayed to 5.8). Instead of using cr3 in function
names which occasionally means eptp, KVM too has standardized on "pgd".
* A large refactoring of CPUID features, which now use an array that
parallels the core x86_features.
* Some removal of pointer chasing from kvm_x86_ops, which will also be
switched to static calls as soon as they are available.
* New Tigerlake CPUID features.
* More bugfixes, optimizations and cleanups.
Generic:
* selftests: cleanups, new MMU notifier stress test, steal-time test
* CSV output for kvm_stat.
KVM/MIPS has been broken since 5.5, it does not compile due to a patch committed
by MIPS maintainers. I had already prepared a fix, but the MIPS maintainers
prefer to fix it in generic code rather than KVM so they are taking care of it.
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl6GOnIUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroMfxwf/ZKLZiRoaovXCOG71M/eHtQb8ZIqU
3MPy+On3eC5Sk/aBxWUL9EFZsbYG6kYdbZ1VOvG9XPBoLlnkDSm/IR0kaELHtnjj
oGVda/tvGn46Ne39y8xBptmb91WDcWH0vFthT/CwlMxAw3xjr+gG7Qyo+8F2CW6m
SSSuLiHSBnyO1cQKruBTHZ8qnR8LlnfXEqtd6Y4LFLic0LbLIoIdRcT3wjQrcZrm
Djd7wbTEYZjUfoqZ72ekwEDUsONcDLDSKcguDO9pSMSCGhpxCVT5Vy68KRpoIMs2
nzNWDKjvqQo5zb2+GWxJgkd12Hv+n7PCXZMbVrWBu1pQsewUns9m4mkpGw==
=6fGt
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"ARM:
- GICv4.1 support
- 32bit host removal
PPC:
- secure (encrypted) using under the Protected Execution Framework
ultravisor
s390:
- allow disabling GISA (hardware interrupt injection) and protected
VMs/ultravisor support.
x86:
- New dirty bitmap flag that sets all bits in the bitmap when dirty
page logging is enabled; this is faster because it doesn't require
bulk modification of the page tables.
- Initial work on making nested SVM event injection more similar to
VMX, and less buggy.
- Various cleanups to MMU code (though the big ones and related
optimizations were delayed to 5.8). Instead of using cr3 in
function names which occasionally means eptp, KVM too has
standardized on "pgd".
- A large refactoring of CPUID features, which now use an array that
parallels the core x86_features.
- Some removal of pointer chasing from kvm_x86_ops, which will also
be switched to static calls as soon as they are available.
- New Tigerlake CPUID features.
- More bugfixes, optimizations and cleanups.
Generic:
- selftests: cleanups, new MMU notifier stress test, steal-time test
- CSV output for kvm_stat"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (277 commits)
x86/kvm: fix a missing-prototypes "vmread_error"
KVM: x86: Fix BUILD_BUG() in __cpuid_entry_get_reg() w/ CONFIG_UBSAN=y
KVM: VMX: Add a trampoline to fix VMREAD error handling
KVM: SVM: Annotate svm_x86_ops as __initdata
KVM: VMX: Annotate vmx_x86_ops as __initdata
KVM: x86: Drop __exit from kvm_x86_ops' hardware_unsetup()
KVM: x86: Copy kvm_x86_ops by value to eliminate layer of indirection
KVM: x86: Set kvm_x86_ops only after ->hardware_setup() completes
KVM: VMX: Configure runtime hooks using vmx_x86_ops
KVM: VMX: Move hardware_setup() definition below vmx_x86_ops
KVM: x86: Move init-only kvm_x86_ops to separate struct
KVM: Pass kvm_init()'s opaque param to additional arch funcs
s390/gmap: return proper error code on ksm unsharing
KVM: selftests: Fix cosmetic copy-paste error in vm_mem_region_move()
KVM: Fix out of range accesses to memslots
KVM: X86: Micro-optimize IPI fastpath delay
KVM: X86: Delay read msr data iff writes ICR MSR
KVM: PPC: Book3S HV: Add a capability for enabling secure guests
KVM: arm64: GICv4.1: Expose HW-based SGIs in debugfs
KVM: arm64: GICv4.1: Allow non-trapping WFI when using HW SGIs
...
Change a header to mandatory-y if both of the following are met:
[1] At least one architecture (except um) specifies it as generic-y in
arch/*/include/asm/Kbuild
[2] Every architecture (except um) either has its own implementation
(arch/*/include/asm/*.h) or specifies it as generic-y in
arch/*/include/asm/Kbuild
This commit was generated by the following shell script.
----------------------------------->8-----------------------------------
arches=$(cd arch; ls -1 | sed -e '/Kconfig/d' -e '/um/d')
tmpfile=$(mktemp)
grep "^mandatory-y +=" include/asm-generic/Kbuild > $tmpfile
find arch -path 'arch/*/include/asm/Kbuild' |
xargs sed -n 's/^generic-y += \(.*\)/\1/p' | sort -u |
while read header
do
mandatory=yes
for arch in $arches
do
if ! grep -q "generic-y += $header" arch/$arch/include/asm/Kbuild &&
! [ -f arch/$arch/include/asm/$header ]; then
mandatory=no
break
fi
done
if [ "$mandatory" = yes ]; then
echo "mandatory-y += $header" >> $tmpfile
for arch in $arches
do
sed -i "/generic-y += $header/d" arch/$arch/include/asm/Kbuild
done
fi
done
sed -i '/^mandatory-y +=/d' include/asm-generic/Kbuild
LANG=C sort $tmpfile >> include/asm-generic/Kbuild
----------------------------------->8-----------------------------------
One obvious benefit is the diff stat:
25 files changed, 52 insertions(+), 557 deletions(-)
It is tedious to list generic-y for each arch that needs it.
So, mandatory-y works like a fallback default (by just wrapping
asm-generic one) when arch does not have a specific header
implementation.
See the following commits:
def3f7cefea1b39bae16
It is tedious to convert headers one by one, so I processed by a shell
script.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Simek <michal.simek@xilinx.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Link: http://lkml.kernel.org/r/20200210175452.5030-1-masahiroy@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull networking updates from David Miller:
"Highlights:
1) Fix the iwlwifi regression, from Johannes Berg.
2) Support BSS coloring and 802.11 encapsulation offloading in
hardware, from John Crispin.
3) Fix some potential Spectre issues in qtnfmac, from Sergey
Matyukevich.
4) Add TTL decrement action to openvswitch, from Matteo Croce.
5) Allow paralleization through flow_action setup by not taking the
RTNL mutex, from Vlad Buslov.
6) A lot of zero-length array to flexible-array conversions, from
Gustavo A. R. Silva.
7) Align XDP statistics names across several drivers for consistency,
from Lorenzo Bianconi.
8) Add various pieces of infrastructure for offloading conntrack, and
make use of it in mlx5 driver, from Paul Blakey.
9) Allow using listening sockets in BPF sockmap, from Jakub Sitnicki.
10) Lots of parallelization improvements during configuration changes
in mlxsw driver, from Ido Schimmel.
11) Add support to devlink for generic packet traps, which report
packets dropped during ACL processing. And use them in mlxsw
driver. From Jiri Pirko.
12) Support bcmgenet on ACPI, from Jeremy Linton.
13) Make BPF compatible with RT, from Thomas Gleixnet, Alexei
Starovoitov, and your's truly.
14) Support XDP meta-data in virtio_net, from Yuya Kusakabe.
15) Fix sysfs permissions when network devices change namespaces, from
Christian Brauner.
16) Add a flags element to ethtool_ops so that drivers can more simply
indicate which coalescing parameters they actually support, and
therefore the generic layer can validate the user's ethtool
request. Use this in all drivers, from Jakub Kicinski.
17) Offload FIFO qdisc in mlxsw, from Petr Machata.
18) Support UDP sockets in sockmap, from Lorenz Bauer.
19) Fix stretch ACK bugs in several TCP congestion control modules,
from Pengcheng Yang.
20) Support virtual functiosn in octeontx2 driver, from Tomasz
Duszynski.
21) Add region operations for devlink and use it in ice driver to dump
NVM contents, from Jacob Keller.
22) Add support for hw offload of MACSEC, from Antoine Tenart.
23) Add support for BPF programs that can be attached to LSM hooks,
from KP Singh.
24) Support for multiple paths, path managers, and counters in MPTCP.
From Peter Krystad, Paolo Abeni, Florian Westphal, Davide Caratti,
and others.
25) More progress on adding the netlink interface to ethtool, from
Michal Kubecek"
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2121 commits)
net: ipv6: rpl_iptunnel: Fix potential memory leak in rpl_do_srh_inline
cxgb4/chcr: nic-tls stats in ethtool
net: dsa: fix oops while probing Marvell DSA switches
net/bpfilter: remove superfluous testing message
net: macb: Fix handling of fixed-link node
net: dsa: ksz: Select KSZ protocol tag
netdevsim: dev: Fix memory leak in nsim_dev_take_snapshot_write
net: stmmac: add EHL 2.5Gbps PCI info and PCI ID
net: stmmac: add EHL PSE0 & PSE1 1Gbps PCI info and PCI ID
net: stmmac: create dwmac-intel.c to contain all Intel platform
net: dsa: bcm_sf2: Support specifying VLAN tag egress rule
net: dsa: bcm_sf2: Add support for matching VLAN TCI
net: dsa: bcm_sf2: Move writing of CFP_DATA(5) into slicing functions
net: dsa: bcm_sf2: Check earlier for FLOW_EXT and FLOW_MAC_EXT
net: dsa: bcm_sf2: Disable learning for ASP port
net: dsa: b53: Deny enslaving port 7 for 7278 into a bridge
net: dsa: b53: Prevent tagged VLAN on port 7 for 7278
net: dsa: b53: Restore VLAN entries upon (re)configuration
net: dsa: bcm_sf2: Fix overflow checks
hv_netvsc: Remove unnecessary round_up for recv_completion_cnt
...
Pull locking updates from Ingo Molnar:
"The main changes in this cycle were:
- Continued user-access cleanups in the futex code.
- percpu-rwsem rewrite that uses its own waitqueue and atomic_t
instead of an embedded rwsem. This addresses a couple of
weaknesses, but the primary motivation was complications on the -rt
kernel.
- Introduce raw lock nesting detection on lockdep
(CONFIG_PROVE_RAW_LOCK_NESTING=y), document the raw_lock vs. normal
lock differences. This too originates from -rt.
- Reuse lockdep zapped chain_hlocks entries, to conserve RAM
footprint on distro-ish kernels running into the "BUG:
MAX_LOCKDEP_CHAIN_HLOCKS too low!" depletion of the lockdep
chain-entries pool.
- Misc cleanups, smaller fixes and enhancements - see the changelog
for details"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (55 commits)
fs/buffer: Make BH_Uptodate_Lock bit_spin_lock a regular spinlock_t
thermal/x86_pkg_temp: Make pkg_temp_lock a raw_spinlock_t
Documentation/locking/locktypes: Minor copy editor fixes
Documentation/locking/locktypes: Further clarifications and wordsmithing
m68knommu: Remove mm.h include from uaccess_no.h
x86: get rid of user_atomic_cmpxchg_inatomic()
generic arch_futex_atomic_op_inuser() doesn't need access_ok()
x86: don't reload after cmpxchg in unsafe_atomic_op2() loop
x86: convert arch_futex_atomic_op_inuser() to user_access_begin/user_access_end()
objtool: whitelist __sanitizer_cov_trace_switch()
[parisc, s390, sparc64] no need for access_ok() in futex handling
sh: no need of access_ok() in arch_futex_atomic_op_inuser()
futex: arch_futex_atomic_op_inuser() calling conventions change
completion: Use lockdep_assert_RT_in_threaded_ctx() in complete_all()
lockdep: Add posixtimer context tracing bits
lockdep: Annotate irq_work
lockdep: Add hrtimer context tracing bits
lockdep: Introduce wait-type checks
completion: Use simple wait queues
sched/swait: Prepare usage in completions
...
The set of values asce_limit may be assigned with is TASK_SIZE_MAX,
_REGION1_SIZE, _REGION2_SIZE and 0 as a special case if the callback
was called from execve().
Do VM_BUG_ON() if asce_limit is something else.
Save few CPU cycles by removing unnecessary asce_limit re-assignment
in case of 3-level task and redundant PGD entry type reconstruction.
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
This update consolidates page table handling code. Because
there are hardly any 31-bit binaries left we do not need to
optimize for that.
No extra efforts are needed to ensure that a compat task does
not map anything above 2GB. The TASK_SIZE limit for 31-bit
tasks is 2GB already and the generic code does check that a
resulting map address would not surpass that limit.
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Move access_ok() in and pagefault_enable()/pagefault_disable() out.
Mechanical conversion only - some instances don't really need
a separate access_ok() at all (e.g. the ones only using
get_user()/put_user(), or architectures where access_ok()
is always true); we'll deal with that in followups.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Factor out check_asce_limit() function and fix few style
defects in arch_get_unmapped_area() family of functions.
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
[heiko.carstens@de.ibm.com: small coding style changes]
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
- mark sie control block as 512 byte aligned
- use fallthrough;
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJeegAeAAoJEBF7vIC1phx8PgMP/jcN57GRXU0yML+r8izA1JBZ
0hnV9Mesec5sUDyDLTLuAJBS7tDvfLKi8A6QLdyTuPkUihPpWp0VD/QW4tJe1WDD
MOPA56xH3yN7ll/q+IX7QurwY4jDCReV8DIsHjrOvDYKSBHxZSsgrR/G5ivtqzRJ
VWl1qpB3/FgPtsdTW4x1nciaAYulWp7K75F2zWFpoRe0EZUXY8IyLJMZjScMVRnk
DbdA8+jsFqeYCJC1JSnE6TShP2RvDNu5NjE2pInyWbXAA1PwcFM+bML+ZZ4kJlii
9+cDqctXNC4MdxOJVMKEOEdoQ1M3yYzKn/J7an7Zly7kY54G8uJkQGCaBmlg1K0k
r57WP0DpFu5kuNFFJg52bBcAOUPEgFiyIitICG+nMn9BjLA3zY7bjUSCGHviIzLm
pNPL+t7KVWyyn5t8X25CuFUkCQskDSALpC3SFdL4iPo+fpBBw/GPqngjmx4zdrHg
jkOImPWt2n2KK9I4kfxw/NIx8Q06wHHzIpY2uutHA++TNF8WMgXR1RvJgZvOAO0G
JLGaOUZNV5BcTsK38ftEMh9awqBG9J4l8DPwPoRG/9r394W25O1KdjiNnTVH9qYU
INUArhtfeeAYoSteDCflD7cgORbGQLOXBePtCO8+Ayt2lTJLW98CDfCrCAKawQLl
O+RMxuDshobeH/AGT7yr
=nsOM
-----END PGP SIGNATURE-----
Merge tag 'kvm-s390-next-5.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
KVM: s390: cleanups for 5.7
- mark sie control block as 512 byte aligned
- use fallthrough;
When the support for polling drivers was initially added, it only
considered Input Queue 0. But as QDIO interrupts are actually for the
full device and not a single queue, this doesn't really fit for
configurations where multiple Input Queues are used.
Rework the qdio code so that interrupts for a polling driver are not
split up into actions for each queue. Instead deliver the interrupt as
a single event, and let the driver decide which queue needs what action.
When re-enabling the QDIO interrupt via qdio_start_irq(), this means
that the qdio code needs to
(1) put _all_ eligible queues back into a state where they raise IRQs,
(2) and afterwards check _all_ eligible queues for new work to bridge
the race window.
On the qeth side of things (as the only qdio polling driver), we can now
add CQ polling support to the main NAPI poll routine. It doesn't consume
NAPI budget, and to avoid hogging the CPU we yield control after
completing one full queue worth of buffers.
The subsequent qdio_start_irq() will check for any additional work, and
have us re-schedule the NAPI instance accordingly.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The sie block must be aligned to 512 bytes. Mark it as such.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
When UID checking is enabled a UID value of 0 is invalid and can not be
set by the user. On z/VM it is however used to indicate an unset UID.
Until now, this lead to the behavior that one PCI function could be
attached with UID 0 after which z/VM would prohibit further attachment.
Now if the user then turns off UID checking in z/VM the user could
seemingly attach additional PCI functions that would however not show up
in Linux as that would not be informed of the change in UID checking
mode. This is unexpected and confusing and lead to bug reports against
Linux.
Instead now, if we encounter an unset UID value of 0 treat it as
indicating that UID checking was turned off, switch to automatic domain
allocation, and warn the user of the possible misconfiguration.
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Until now zpci_alloc_domain() only prevented more than
CONFIG_PCI_NR_FUNCTIONS from being added when using automatic domain
allocation. When explicit UIDs were defined UIDs above
CONFIG_PCI_NR_FUNCTIONS were not counted at all.
When more PCI functions are added this could lead to various errors
including under sized IRQ vectors and similar issues.
Fix this by explicitly tracking the number of allocated domains.
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Show number of cores that run at least one SMT thread
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Re-IPL for both CCW and FCP is currently done by using diag 308 with the
"Load Clear" subcode, which means that all memory will be cleared.
This can increase re-IPL duration considerably on very large machines.
For CCW devices, there is also a "Load Normal" subcode that was only used
for dump kernels so far. For FCP devices, a similar "Load Normal" subcode
was introduced with z14. The "Load Normal" diag 308 subcode allows to
re-IPL without clearing memory.
This patch adds a new "clear" sysfs attribute to /sys/firmware/reipl for
both the ccw and fcp subdirectories, which can be set to either "0" or "1"
to disable or enable re-IPL with memory clearing. The default value is "0",
which disables memory clearing.
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
The CPU topology masks on s390 contain also bits of CPUs which
are offline. Currently this is already a problem, since common
code scheduler expects e.g. cpu_smt_mask() to reflect reality.
This update changes the described behaviour and s390 starts to
behave like all other architectures.
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Variable cpus_with_topology is a leftover that became
unneeded once the fake NUMA support has been removed.
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Show CPU physical address as reported by STAP instruction
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
1. Allow to disable gisa
2. protected virtual machines
Protected VMs (PVM) are KVM VMs, where KVM can't access the VM's
state like guest memory and guest registers anymore. Instead the
PVMs are mostly managed by a new entity called Ultravisor (UV),
which provides an API, so KVM and the PV can request management
actions.
PVMs are encrypted at rest and protected from hypervisor access
while running. They switch from a normal operation into protected
mode, so we can still use the standard boot process to load a
encrypted blob and then move it into protected mode.
Rebooting is only possible by passing through the unprotected/normal
mode and switching to protected again.
One mm related patch will go via Andrews mm tree ( mm/gup/writeback:
add callbacks for inaccessible pages)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJeZf9tAAoJEBF7vIC1phx89J0P/iv3wCoMNDqAttnHa/UQFF04
njUadNYkAADDrsabIEOs9O+BE1/4BVspnIunE4+xw76p5M/7/g5eIhXWcLudhlnL
+XtvuEwz/2ffA9JWAAYNKB7cGqBM9BCC+iYzAF9ah6sPLmlDCoF+hRe0g+0tXSON
cklUJFril9bOcxd/MxrzFLcmipbxT/Z4/10eBY+FHcm6SQGOKAtJH0xL7X3PfPI5
L/6ZhML9exsj1Iplkrl8BomMRoYOrvfq/jMaZp9SwmfXaOKYmNU3a19MhzfZ593h
bfR92H8kZRy/TpBd7EnpxYGQ/n53HkUhFMhtqkkkeHW1rCo8ccwC4VfnXb+KqQp+
nJ8KieWG+OlKKFDuZPl5Gq+jQqjJfzchbyMTYnBNe+GPT5zg76tJXmQyDn5X9p3R
mfg+9ZEeEonMu7px93Ht1gLdPiC2gjRckjuBDPqMGEhG2z2SQ/MLri+WnproIQRa
TcE7rZBtuyrGFTq4M4dEcsUW02xnOaav6H57kkl8EwqYwgDHlqoUbt0AvLFyW07a
RlH7drmhKDwTJkcOhOLeLNM8Un6NvnsLZ8Lbcr9rRf9Z9Lpc+zW88BSwJ7MM/GH8
FEQM8Omnn8KAJTENpIm3bHHyvsi0kJEhl+c3Ila3QnYzXZbJ3ZDaJZngMAbUUnVl
YNeFyyALzOgVVBx4kvTm
=x6Hn
-----END PGP SIGNATURE-----
Merge tag 'kvm-s390-next-5.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
KVM: s390: Features and Enhancements for 5.7 part1
1. Allow to disable gisa
2. protected virtual machines
Protected VMs (PVM) are KVM VMs, where KVM can't access the VM's
state like guest memory and guest registers anymore. Instead the
PVMs are mostly managed by a new entity called Ultravisor (UV),
which provides an API, so KVM and the PV can request management
actions.
PVMs are encrypted at rest and protected from hypervisor access
while running. They switch from a normal operation into protected
mode, so we can still use the standard boot process to load a
encrypted blob and then move it into protected mode.
Rebooting is only possible by passing through the unprotected/normal
mode and switching to protected again.
One mm related patch will go via Andrews mm tree ( mm/gup/writeback:
add callbacks for inaccessible pages)
Now that all callers of kvm_free_memslot() pass NULL for @dont, remove
the param from the top-level routine and all arch's implementations.
No functional change intended.
Tested-by: Christoffer Dall <christoffer.dall@arm.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When userspace executes a syscall or gets interrupted,
BEAR contains a kernel address when returning to userspace.
This make it pretty easy to figure out where the kernel is
mapped even with KASLR enabled. To fix this, add lpswe to
lowcore and always execute it there, so userspace sees only
the lowcore address of lpswe. For this we have to extend
both critical_cleanup and the SWITCH_ASYNC macro to also check
for lpswe addresses in lowcore.
Fixes: b2d24b97b2 ("s390/kernel: add support for kernel address space layout randomization (KASLR)")
Cc: <stable@vger.kernel.org> # v5.2+
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Embedding the hotplug_slot in zdev structure allows to
greatly simplify the hotplug handling by eliminating
the handling of the slot_list.
Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Make page, frame, virtual and physical address conversion macros
more expressive by avoiding redundant definitions and defining
new macros using existing ones.
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
On s390 there currently is no implementation of pud_write(). That was ok
as long as we had our own implementation of get_user_pages_fast() which
checked for pud protection by testing the bit directly w/o using
pud_write(). The other callers of pud_write() are not reachable on s390.
After commit 1a42010cdc ("s390/mm: convert to the generic
get_user_pages_fast code") we use the generic get_user_pages_fast(), which
does call pud_write() in pud_access_permitted() for FOLL_WRITE access on
a large pud. Without an s390 specific pud_write(), the generic version is
called, which contains a BUG() statement to remind us that we don't have a
proper implementation. This results in a kernel panic.
Fix this by providing an implementation of pud_write().
Cc: <stable@vger.kernel.org> # 5.2+
Fixes: 1a42010cdc ("s390/mm: convert to the generic get_user_pages_fast code")
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
For protected VMs, the VCPU resets are done by the Ultravisor, as KVM
has no access to the VCPU registers.
Note that the ultravisor will only accept a call for the exact reset
that has been requested.
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Code 5 for the set cpu state UV call tells the UV to load a PSW from
the SE header (first IPL) or from guest location 0x0 (diag 308 subcode
0/1). Also it sets the cpu into operating state afterwards, so we can
start it.
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
VCPU states have to be reported to the ultravisor for SIGP
interpretation, kdump, kexec and reboot.
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
diag 308 subcode 0 and 1 require several KVM and Ultravisor interactions.
Specific to these "soft" reboots are
* The "unshare all" UVC
* The "prepare for reset" UVC
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
The SPX instruction is handled by the ultravisor. We do get a
notification intercept, though. Let us update our internal view.
In addition to that, when the guest prefix page is not secure, an
intercept 112 (0x70) is indicated. Let us make the prefix pages
secure again.
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Now that we can't access guest memory anymore, we have a dedicated
satellite block that's a bounce buffer for instruction data.
We re-use the memop interface to copy the instruction data to / from
userspace. This lets us re-use a lot of QEMU code which used that
interface to make logical guest memory accesses which are not possible
anymore in protected mode anyway.
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Guest registers for protected guests are stored at offset 0x380. We
will copy those to the usual places. Long term we could refactor this
or use register access functions.
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
The sclp interrupt is kind of special. The ultravisor polices that we
do not inject an sclp interrupt with payload if no sccb is outstanding.
On the other hand we have "asynchronous" event interrupts, e.g. for
console input.
We separate both variants into sclp interrupt and sclp event interrupt.
The sclp interrupt is masked until a previous servc instruction has
finished (sie exit 108).
[frankja@linux.ibm.com: factoring out write_sclp]
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
This defines the necessary data structures in the SIE control block to
inject machine checks,external and I/O interrupts. We first define the
the interrupt injection control, which defines the next interrupt to
inject. Then we define the fields that contain the payload for machine
checks,external and I/O interrupts.
This is then used to implement interruption injection for the following
list of interruption types:
- I/O (uses inject io interruption)
__deliver_io
- External (uses inject external interruption)
__deliver_cpu_timer
__deliver_ckc
__deliver_emergency_signal
__deliver_external_call
- cpu restart (uses inject restart interruption)
__deliver_restart
- machine checks (uses mcic, failing address and external damage)
__write_machine_check
Please note that posted interrupts (GISA) are not used for protected
guests as of today.
The service interrupt is handled in a followup patch.
Signed-off-by: Michael Mueller <mimu@linux.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
We have two new SIE exit codes dealing with instructions.
104 (0x68) for a secure instruction interception, on which the SIE needs
hypervisor action to complete the instruction. We can piggy-back on the
existing instruction handlers.
108 which is merely a notification and provides data for tracking and
management. For example this is used to tell the host about a new value
for the prefix register. As there will be several special case handlers
in later patches, we handle this in a separate function.
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Since there is no interception for load control and load psw
instruction in the protected mode, we need a new way to get notified
whenever we can inject an IRQ right after the guest has just enabled
the possibility for receiving them.
The new interception codes solve that problem by providing a
notification for changes to IRQ enablement relevant bits in CRs 0, 6
and 14, as well a the machine check mask bit in the PSW.
No special handling is needed for these interception codes, the KVM
pre-run code will consult all necessary CRs and PSW bits and inject
IRQs the guest is enabled for.
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Before we destroy the secure configuration, we better make all
pages accessible again. This also happens during reboot, where we reboot
into a non-secure guest that then can go again into secure mode. As
this "new" secure guest will have a new ID we cannot reuse the old page
state.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
KSM will not work on secure pages, because when the kernel reads a
secure page, it will be encrypted and hence no two pages will look the
same.
Let's mark the guest pages as unmergeable when we transition to secure
mode.
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
This contains 3 main changes:
1. changes in SIE control block handling for secure guests
2. helper functions for create/destroy/unpack secure guests
3. KVM_S390_PV_COMMAND ioctl to allow userspace dealing with secure
machines
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
This adds two new helper functions for doing UV CALLs.
The first variant handles UV CALLs that might have longer busy
conditions or just need longer when doing partial completion. We should
schedule when necessary.
The second variant handles UV CALLs that only need the handle but have
no payload (e.g. destroying a VM). We can provide a simple wrapper for
those.
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
The adapter interrupt page containing the indicator bits is currently
pinned. That means that a guest with many devices can pin a lot of
memory pages in the host. This also complicates the reference tracking
which is needed for memory management handling of protected virtual
machines. It might also have some strange side effects for madvise
MADV_DONTNEED and other things.
We can simply try to get the userspace page set the bits and free the
page. By storing the userspace address in the irq routing entry instead
of the guest address we can actually avoid many lookups and list walks
so that this variant is very likely not slower.
If userspace messes around with the memory slots the worst thing that
can happen is that we write to some other memory within that process.
As we get the the page with FOLL_WRITE this can also not be used to
write to shared read-only pages.
Signed-off-by: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
[borntraeger@de.ibm.com: patch simplification]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
This provides the basic ultravisor calls and page table handling to cope
with secure guests:
- provide arch_make_page_accessible
- make pages accessible after unmapping of secure guests
- provide the ultravisor commands convert to/from secure
- provide the ultravisor commands pin/unpin shared
- provide callbacks to make pages secure (inacccessible)
- we check for the expected pin count to only make pages secure if the
host is not accessing them
- we fence hugetlbfs for secure pages
- add missing radix-tree include into gmap.h
The basic idea is that a page can have 3 states: secure, normal or
shared. The hypervisor can call into a firmware function called
ultravisor that allows to change the state of a page: convert from/to
secure. The convert from secure will encrypt the page and make it
available to the host and host I/O. The convert to secure will remove
the host capability to access this page.
The design is that on convert to secure we will wait until writeback and
page refs are indicating no host usage. At the same time the convert
from secure (export to host) will be called in common code when the
refcount or the writeback bit is already set. This avoids races between
convert from and to secure.
Then there is also the concept of shared pages. Those are kind of secure
where the host can still access those pages. We need to be notified when
the guest "unshares" such a page, basically doing a convert to secure by
then. There is a call "pin shared page" that we use instead of convert
from secure when possible.
We do use PG_arch_1 as an optimization to minimize the convert from
secure/pin shared.
Several comments have been added in the code to explain the logic in
the relevant places.
Co-developed-by: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
Signed-off-by: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Before being able to host protected virtual machines, donate some of
the memory to the ultravisor. Besides that the ultravisor might impose
addressing limitations for memory used to back protected VM storage. Treat
that limit as protected virtualization host's virtual memory limit.
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Add "prot_virt" command line option which controls if the kernel
protected VMs support is enabled at early boot time. This has to be
done early, because it needs large amounts of memory and will disable
some features like STP time sync for the lpar.
Extend ultravisor info definitions and expose it via uv_info struct
filled in during startup.
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
It turned out that fake numa support is rather useless on s390, since
there are no scenarios where there is any performance or other benefit
when used.
However it does provide maintenance cost and breaks from time to time.
Therefore remove it.
CONFIG_NUMA is still supported with a very small backend and only one
node. This way userspace applications which require NUMA interfaces
continue to work.
Note that NODES_SHIFT is set to 1 (= 2 nodes) instead of 0 (= 1 node),
since there is quite a bit of kernel code which assumes that more than
one node is possible if CONFIG_NUMA is enabled.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
sbale->addr holds an absolute address (or for some FCP usage, an opaque
request ID), and should only be used with proper virt/phys translation.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
As the comment says, sl->sbal holds an absolute address. qeth currently
solves this through wild casting, while zfcp doesn't care.
Handle this properly in the code that actually builds the SL.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Steffen Maier <maier@linux.ibm.com> [for qdio]
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
s390 math emulation was removed with commit 5a79859ae0 ("s390:
remove 31 bit support"), rendering ieee_emulation_warnings useless.
The code still built because it was protected by CONFIG_MATHEMU, which
was no longer selectable.
This patch removes the sysctl_ieee_emulation_warnings declaration and
the sysctl entry declaration.
Link: https://lkml.kernel.org/r/20200214172628.3598516-1-steve@sk2.org
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Stephen Kitt <steve@sk2.org>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Clang warns:
In file included from ../arch/s390/purgatory/purgatory.c:10:
In file included from ../include/linux/kexec.h:18:
In file included from ../include/linux/crash_core.h:6:
In file included from ../include/linux/elfcore.h:5:
In file included from ../include/linux/user.h:1:
In file included from ../arch/s390/include/asm/user.h:11:
../arch/s390/include/asm/page.h:45:6: warning: converting the result of
'<<' to a boolean always evaluates to false
[-Wtautological-constant-compare]
if (PAGE_DEFAULT_KEY)
^
../arch/s390/include/asm/page.h:23:44: note: expanded from macro
'PAGE_DEFAULT_KEY'
#define PAGE_DEFAULT_KEY (PAGE_DEFAULT_ACC << 4)
^
1 warning generated.
Explicitly compare this against zero to silence the warning as it is
intended to be used in a boolean context.
Fixes: de3fa841e4 ("s390/mm: fix compile for PAGE_DEFAULT_KEY != 0")
Link: https://github.com/ClangBuiltLinux/linux/issues/860
Link: https://lkml.kernel.org/r/20200214064207.10381-1-natechancellor@gmail.com
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Clang warns:
In file included from ../arch/s390/boot/startup.c:3:
In file included from ../include/linux/elf.h:5:
In file included from ../arch/s390/include/asm/elf.h:132:
In file included from ../include/linux/compat.h:10:
In file included from ../include/linux/time.h:74:
In file included from ../include/linux/time32.h:13:
In file included from ../include/linux/timex.h:65:
../arch/s390/include/asm/timex.h:160:20: warning: passing 'unsigned char
[16]' to parameter of type 'char *' converts between pointers to integer
types with different sign [-Wpointer-sign]
get_tod_clock_ext(clk);
^~~
../arch/s390/include/asm/timex.h:149:44: note: passing argument to
parameter 'clk' here
static inline void get_tod_clock_ext(char *clk)
^
Change clk's type to just be char so that it matches what happens in
get_tod_clock_ext.
Fixes: 57b28f6631 ("[S390] s390_hypfs: Add new attributes")
Link: https://github.com/ClangBuiltLinux/linux/issues/861
Link: http://lkml.kernel.org/r/20200208140858.47970-1-natechancellor@gmail.com
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
* fix register corruption
* ENOTSUPP/EOPNOTSUPP mixed
* reset cleanups/fixes
* selftests
x86:
* Bug fixes and cleanups
* AMD support for APIC virtualization even in combination with
in-kernel PIT or IOAPIC.
MIPS:
* Compilation fix.
Generic:
* Fix refcount overflow for zero page.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJeOuf7AAoJEL/70l94x66DOBQH/j1W9lUpbDgr9aWbrZT+O/yP
FWzUDrRlCZCjV1FQKbGPa4YLeDRTG5n+RIQTjmCGRqiMqeoELSJ1+iK99e97nG/u
L28zf/90Nf0R+wwHL4AOFeploTYfG4WP8EVnlr3CG2UCJrNjxN1KU7yRZoWmWa2d
ckLJ8ouwNvx6VZd233LVmT38EP4352d1LyqIL8/+oXDIyAcRJLFQu1gRCwagsh3w
1v1czowFpWnRQ/z9zU7YD+PA4v85/Ge8sVVHlpi1X5NgV/khk4U6B0crAw6M+la+
mTmpz9g56oAh9m9NUdtv4zDCz1EWGH0v8+ZkAajUKtrM0ftJMn57P6p8PH4VVlE=
=5+Wl
-----END PGP SIGNATURE-----
Merge tag 'kvm-5.6-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull more KVM updates from Paolo Bonzini:
"s390:
- fix register corruption
- ENOTSUPP/EOPNOTSUPP mixed
- reset cleanups/fixes
- selftests
x86:
- Bug fixes and cleanups
- AMD support for APIC virtualization even in combination with
in-kernel PIT or IOAPIC.
MIPS:
- Compilation fix.
Generic:
- Fix refcount overflow for zero page"
* tag 'kvm-5.6-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (42 commits)
KVM: vmx: delete meaningless vmx_decache_cr0_guest_bits() declaration
KVM: x86: Mark CR4.UMIP as reserved based on associated CPUID bit
x86: vmxfeatures: rename features for consistency with KVM and manual
KVM: SVM: relax conditions for allowing MSR_IA32_SPEC_CTRL accesses
KVM: x86: Fix perfctr WRMSR for running counters
x86/kvm/hyper-v: don't allow to turn on unsupported VMX controls for nested guests
x86/kvm/hyper-v: move VMX controls sanitization out of nested_enable_evmcs()
kvm: mmu: Separate generating and setting mmio ptes
kvm: mmu: Replace unsigned with unsigned int for PTE access
KVM: nVMX: Remove stale comment from nested_vmx_load_cr3()
KVM: MIPS: Fold comparecount_func() into comparecount_wakeup()
KVM: MIPS: Fix a build error due to referencing not-yet-defined function
x86/kvm: do not setup pv tlb flush when not paravirtualized
KVM: fix overflow of zero page refcount with ksm running
KVM: x86: Take a u64 when checking for a valid dr7 value
KVM: x86: use raw clock values consistently
KVM: x86: reorganize pvclock_gtod_data members
KVM: nVMX: delete meaningless nested_vmx_run() declaration
KVM: SVM: allow AVIC without split irqchip
kvm: ioapic: Lazy update IOAPIC EOI
...
- Add KPROBES_ON_FTRACE support.
- Add EP11 AES secure keys support.
- PAES rework and prerequisites for paes-s390 ciphers selftests.
- Fix page table upgrade for hugetlbfs.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAl465KkACgkQjYWKoQLX
FBiR/wf/e+Fj/mDYHElcZ55MWaORBpp8NT94IYSt0RbII1PEh9cB8NciYLQdFFmc
bUlNj7u3fHwk1D8S3pOSYKhIaHQQOWDqd/uNTzbCicbbVhuwmslLc+jffnORtlKe
mCHeQsVAw3NwE8FIPhPMTAKBZV0pLkM4T9PA2xgeuB5cShoMgXgLgUoIwHJ4c2TP
WwnolIJ/QR0nKpmPI5lp0+PjjSk/8nA/VvmpxgYbJCTQm8dhwhAfePh8Kf6pEp6K
wETUaIyWkX1a+kI9h2qIBsR7KplqqrKABA5sxnPDQW/kut1Pc/2fWxMOBxux0f/V
Kk+f6yoVbe7X6VYm+V4AyyAzQMRggQ==
=9Eeg
-----END PGP SIGNATURE-----
Merge tag 's390-5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull more s390 updates from Vasily Gorbik:
"The second round of s390 fixes and features for 5.6:
- Add KPROBES_ON_FTRACE support
- Add EP11 AES secure keys support
- PAES rework and prerequisites for paes-s390 ciphers selftests
- Fix page table upgrade for hugetlbfs"
* tag 's390-5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/pkey/zcrypt: Support EP11 AES secure keys
s390/zcrypt: extend EP11 card and queue sysfs attributes
s390/zcrypt: add new low level ep11 functions support file
s390/zcrypt: ep11 structs rework, export zcrypt_send_ep11_cprb
s390/zcrypt: enable card/domain autoselect on ep11 cprbs
s390/crypto: enable clear key values for paes ciphers
s390/pkey: Add support for key blob with clear key value
s390/crypto: Rework on paes implementation
s390: support KPROBES_ON_FTRACE
s390/mm: fix dynamic pagetable upgrade for hugetlbfs
- Enable CMA
- Add support for MB v11
- Defconfig updates
- Minor fixes
-----BEGIN PGP SIGNATURE-----
iF0EABECAB0WIQQbPNTMvXmYlBPRwx7KSWXLKUoMIQUCXjlJ1gAKCRDKSWXLKUoM
IWy9AJ4tauV9sUb+zNadrYxI+2zemRstUwCfQ49LG4kHpFCv8ldSTmhBPJY/3MI=
=QpT4
-----END PGP SIGNATURE-----
Merge tag 'microblaze-v5.6-rc1' of git://git.monstr.eu/linux-2.6-microblaze
Pull Microblaze update from Michal Simek:
- enable CMA
- add support for MB v11
- defconfig updates
- minor fixes
* tag 'microblaze-v5.6-rc1' of git://git.monstr.eu/linux-2.6-microblaze:
microblaze: Add ID for Microblaze v11
microblaze: Prevent the overflow of the start
microblaze: Wire CMA allocator
asm-generic: Make dma-contiguous.h a mandatory include/asm header
microblaze: Sync defconfig with latest Kconfig layout
microblaze: defconfig: Disable EXT2 driver and Enable EXT3 & EXT4 drivers
microblaze: Align comments with register usage
dma-continuguous.h is generic for all architectures except arm32 which has
its own version.
Similar change was done for msi.h by commit a1b39bae16
("asm-generic: Make msi.h a mandatory include/asm header")
Suggested-by: Christoph Hellwig <hch@infradead.org>
Link: https://lore.kernel.org/linux-arm-kernel/20200117080446.GA8980@lst.de/T/#m92bb56b04161057635d4142e1b3b9b6b0a70122e
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Paul Walmsley <paul.walmsley@sifive.com> # for arch/riscv
walk_page_range() is going to be allowed to walk page tables other than
those of user space. For this it needs to know when it has reached a
'leaf' entry in the page tables. This information is provided by the
p?d_leaf() functions/macros.
For s390, pud_large() and pmd_large() are already implemented as static
inline functions. Add a macro to provide the p?d_leaf names for the
generic code to use.
Link: http://lkml.kernel.org/r/20191218162402.45610-9-steven.price@arm.com
Signed-off-by: Steven Price <steven.price@arm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: "Liang, Kan" <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zong Li <zong.li@sifive.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
CRNG hasn't initialized, instead of the old blocking pool. Also clean
up archrandom.h, and some other miscellaneous cleanups.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAl40j1kACgkQ8vlZVpUN
gaPCywf8CWS9HFd2Iipj60gkTVugjlL5ib0lbfhQcAAwwzw1GLTXJSMBzzoMRHY/
ZI2sJZS1m0V1oWNnXXVKi+A1VXmlValWXAc+7fvbeaIe5pRT1EHP14s4Kz7/4d8Q
dk0b8cxNpR8u5CcbN8y9D+71IKpdksUbX7uGuGfw3bncQdRNwJVf+oS1fMGS0Rsb
F8ddQaED7iFpX2BMl56afQ4t2t0LA5+eLYMGoYoJx5fgd9BseP0TEcjj9Y4Z30M7
+GO4NZjUbAY0syx9r8hx3P/5miWZm2J9QJmJoXHhr5+IcAKM+6+Uo6X6gkOEqV4i
U//V1cqNuowV5ckE4Na+MfBillinsQ==
=HeFM
-----END PGP SIGNATURE-----
Merge tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random
Pull random changes from Ted Ts'o:
"Change /dev/random so that it uses the CRNG and only blocking if the
CRNG hasn't initialized, instead of the old blocking pool. Also clean
up archrandom.h, and some other miscellaneous cleanups"
* tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random: (24 commits)
s390x: Mark archrandom.h functions __must_check
powerpc: Mark archrandom.h functions __must_check
powerpc: Use bool in archrandom.h
x86: Mark archrandom.h functions __must_check
linux/random.h: Mark CONFIG_ARCH_RANDOM functions __must_check
linux/random.h: Use false with bool
linux/random.h: Remove arch_has_random, arch_has_random_seed
s390: Remove arch_has_random, arch_has_random_seed
powerpc: Remove arch_has_random, arch_has_random_seed
x86: Remove arch_has_random, arch_has_random_seed
random: remove some dead code of poolinfo
random: fix typo in add_timer_randomness()
random: Add and use pr_fmt()
random: convert to ENTROPY_BITS for better code readability
random: remove unnecessary unlikely()
random: remove kernel.random.read_wakeup_threshold
random: delete code to pull data into pools
random: remove the blocking pool
random: make /dev/random be almost like /dev/urandom
random: ignore GRND_RANDOM in getentropy(2)
...
Pull updates from Andrew Morton:
"Most of -mm and quite a number of other subsystems: hotfixes, scripts,
ocfs2, misc, lib, binfmt, init, reiserfs, exec, dma-mapping, kcov.
MM is fairly quiet this time. Holidays, I assume"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (118 commits)
kcov: ignore fault-inject and stacktrace
include/linux/io-mapping.h-mapping: use PHYS_PFN() macro in io_mapping_map_atomic_wc()
execve: warn if process starts with executable stack
reiserfs: prevent NULL pointer dereference in reiserfs_insert_item()
init/main.c: fix misleading "This architecture does not have kernel memory protection" message
init/main.c: fix quoted value handling in unknown_bootoption
init/main.c: remove unnecessary repair_env_string in do_initcall_level
init/main.c: log arguments and environment passed to init
fs/binfmt_elf.c: coredump: allow process with empty address space to coredump
fs/binfmt_elf.c: coredump: delete duplicated overflow check
fs/binfmt_elf.c: coredump: allocate core ELF header on stack
fs/binfmt_elf.c: make BAD_ADDR() unlikely
fs/binfmt_elf.c: better codegen around current->mm
fs/binfmt_elf.c: don't copy ELF header around
fs/binfmt_elf.c: fix ->start_code calculation
fs/binfmt_elf.c: smaller code generation around auxv vector fill
lib/find_bit.c: uninline helper _find_next_bit()
lib/find_bit.c: join _find_next_bit{_le}
uapi: rename ext2_swab() to swab() and share globally in swab.h
lib/scatterlist.c: adjust indentation in __sg_alloc_table
...
Add the new kernel command line parameter 'dfltcc=' to configure s390
zlib hardware support.
Format: { on | off | def_only | inf_only | always }
on: s390 zlib hardware support for compression on
level 1 and decompression (default)
off: No s390 zlib hardware support
def_only: s390 zlib hardware support for deflate
only (compression on level 1)
inf_only: s390 zlib hardware support for inflate
only (decompression)
always: Same as 'on' but ignores the selected compression
level always using hardware support (used for debugging)
Link: http://lkml.kernel.org/r/20200103223334.20669-5-zaslonko@linux.ibm.com
Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Cc: Chris Mason <clm@fb.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: David Sterba <dsterba@suse.com>
Cc: Eduard Shishkin <edward6@linux.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
PPC: Bugfixes
x86:
* Support for mapping DAX areas with large nested page table entries.
* Cleanups and bugfixes here too. A particularly important one is
a fix for FPU load when the thread has TIF_NEED_FPU_LOAD. There is
also a race condition which could be used in guest userspace to exploit
the guest kernel, for which the embargo expired today.
* Fast path for IPI delivery vmexits, shaving about 200 clock cycles
from IPI latency.
* Protect against "Spectre-v1/L1TF" (bring data in the cache via
speculative out of bound accesses, use L1TF on the sibling hyperthread
to read it), which unfortunately is an even bigger whack-a-mole game
than SpectreV1.
Sean continues his mission to rewrite KVM. In addition to a sizable
number of x86 patches, this time he contributed a pretty large refactoring
of vCPU creation that affects all architectures but should not have any
visible effect.
s390 will come next week together with some more x86 patches.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJeMxtCAAoJEL/70l94x66DQxIIAJv9hMmXLQHGFnUMskjGErR6
DCLSC0YRdRMwE50CerblyJtGsMwGsPyHZwvZxoAceKJ9w0Yay9cyaoJ87ItBgHoY
ce0HrqIUYqRSJ/F8WH2lSzkzMBr839rcmqw8p1tt4D5DIsYnxHGWwRaaP+5M/1KQ
YKFu3Hea4L00U339iIuDkuA+xgz92LIbsn38svv5fxHhPAyWza0rDEYHNgzMKuoF
IakLf5+RrBFAh6ZuhYWQQ44uxjb+uQa9pVmcqYzzTd5t1g4PV5uXtlJKesHoAvik
Eba8IEUJn+HgQJjhp3YxQYuLeWOwRF3bwOiZ578MlJ4OPfYXMtbdlqCQANHOcGk=
=H/q1
-----END PGP SIGNATURE-----
Merge tag 'kvm-5.6-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"This is the first batch of KVM changes.
ARM:
- cleanups and corner case fixes.
PPC:
- Bugfixes
x86:
- Support for mapping DAX areas with large nested page table entries.
- Cleanups and bugfixes here too. A particularly important one is a
fix for FPU load when the thread has TIF_NEED_FPU_LOAD. There is
also a race condition which could be used in guest userspace to
exploit the guest kernel, for which the embargo expired today.
- Fast path for IPI delivery vmexits, shaving about 200 clock cycles
from IPI latency.
- Protect against "Spectre-v1/L1TF" (bring data in the cache via
speculative out of bound accesses, use L1TF on the sibling
hyperthread to read it), which unfortunately is an even bigger
whack-a-mole game than SpectreV1.
Sean continues his mission to rewrite KVM. In addition to a sizable
number of x86 patches, this time he contributed a pretty large
refactoring of vCPU creation that affects all architectures but should
not have any visible effect.
s390 will come next week together with some more x86 patches"
* tag 'kvm-5.6-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (204 commits)
x86/KVM: Clean up host's steal time structure
x86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag is not missed
x86/kvm: Cache gfn to pfn translation
x86/kvm: Introduce kvm_(un)map_gfn()
x86/kvm: Be careful not to clear KVM_VCPU_FLUSH_TLB bit
KVM: PPC: Book3S PR: Fix -Werror=return-type build failure
KVM: PPC: Book3S HV: Release lock on page-out failure path
KVM: arm64: Treat emulated TVAL TimerValue as a signed 32-bit integer
KVM: arm64: pmu: Only handle supported event counters
KVM: arm64: pmu: Fix chained SW_INCR counters
KVM: arm64: pmu: Don't mark a counter as chained if the odd one is disabled
KVM: arm64: pmu: Don't increment SW_INCR if PMCR.E is unset
KVM: x86: Use a typedef for fastop functions
KVM: X86: Add 'else' to unify fastop and execute call path
KVM: x86: inline memslot_valid_for_gpte
KVM: x86/mmu: Use huge pages for DAX-backed files
KVM: x86/mmu: Remove lpage_is_disallowed() check from set_spte()
KVM: x86/mmu: Fold max_mapping_level() into kvm_mmu_hugepage_adjust()
KVM: x86/mmu: Zap any compound page when collapsing sptes
KVM: x86/mmu: Remove obsolete gfn restoration in FNAME(fetch)
...
The code seems to be quite old and uses lots of unneeded spaces for
alignment, which doesn't really help with readability.
Let's:
* Get rid of the extra spaces
* Remove the ULs as they are not needed on 0s
* Define constants for the CR 0 and 14 initial values
* Use the sizeof of the gcr array to memset it to 0
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Link: https://lore.kernel.org/r/20200131100205.74720-3-frankja@linux.ibm.com
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Extend the low level ep11 misc functions implementation by
several functions to support EP11 key objects for paes and pkey:
- EP11 AES secure key generation
- EP11 AES secure key generation from given clear key value
- EP11 AES secure key blob check
- findcard function returns list of apqns based on given criterias
- EP11 AES secure key derive to CPACF protected key
Extend the pkey module to be able to generate and handle EP11
secure keys and also use them as base for deriving protected
keys for CPACF usage. These ioctls are extended to support
EP11 keys: PKEY_GENSECK2, PKEY_CLR2SECK2, PKEY_VERIFYKEY2,
PKEY_APQNS4K, PKEY_APQNS4KT, PKEY_KBLOB2PROTK2.
Additionally the 'clear key' token to protected key now uses
an EP11 card if the other ways (via PCKMO, via CCA) fail.
The PAES cipher implementation needed a new upper limit for
the max key size, but is now also working with EP11 keys.
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Minor rework for struct ep11_cprb and struct ep11_urb. Use of u8, u16,
u32 instead of unsigned char. Declare pointers to mem from userspace
with __user to give sparse a chance to check.
Export zcrypt_send_ep11_cprb() function as this function will be
called by code in progress which will build ep11 cprbs within the
zcrypt device driver zoo and send them to EP11 crypto cards.
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
For EP11 CPRBs there was only to choose between specify
one or more ep11 targets or not give a target at all. Without
any target the zcrypt code assumed AUTOSELECT. For EP11 this
ended up in choosing any EP11 APQN with regards to the weight.
However, CCA CPRBs can have a more fine granular target
addressing. The caller can give 0xFFFF as AUTOSELECT for
the card and/or the domain. So it's possible to address
any card but domain given or any domain but card given.
This patch now introduces the very same for EP11 CPRB handling.
An EP11 target entry now may contain 0xFFFF as card and/or
domain value with the meaning of ANY card or domain. So
now the same behavior as with CCA CPRBs becomes possible:
Address any card with given domain or address any domain within
given card.
For convenience the zcrypt.h header file now has two new
defines AUTOSEL_AP and AUTOSEL_DOM covering the 0xFFFF
value to address card any and domain any.
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Instead of using our own kprobes-on-ftrace handling convert the
code to support KPROBES_ON_FTRACE.
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Commit ee71d16d22 ("s390/mm: make TASK_SIZE independent from the number
of page table levels") changed the logic of TASK_SIZE and also removed the
arch_mmap_check() implementation for s390. This combination has a subtle
effect on how get_unmapped_area() for hugetlbfs pages works. It is now
possible that a user process establishes a hugetlbfs mapping at an address
above 4 TB, without triggering a dynamic pagetable upgrade from 3 to 4
levels.
This is because hugetlbfs mappings will not use mm->get_unmapped_area, but
rather file->f_op->get_unmapped_area, which currently is the generic
implementation of hugetlb_get_unmapped_area() that does not know about s390
dynamic pagetable upgrades, but with the new definition of TASK_SIZE, it
will now allow mappings above 4 TB.
Subsequent access to such a mapped address above 4 TB will result in a page
fault loop, because the CPU cannot translate such a large address with 3
pagetable levels. The fault handler will try to map in a hugepage at the
address, but due to the folded pagetable logic it will end up with creating
entries in the 3 level pagetable, possibly overwriting existing mappings,
and then it all repeats when the access is retried.
Apart from the page fault loop, this can have various nasty effects, e.g.
kernel panic from one of the BUG_ON() checks in memory management code,
or even data loss if an existing mapping gets overwritten.
Fix this by implementing HAVE_ARCH_HUGETLB_UNMAPPED_AREA support for s390,
providing an s390 version for hugetlb_get_unmapped_area() with pagetable
upgrade support similar to arch_get_unmapped_area(), which will then be
used instead of the generic version.
Fixes: ee71d16d22 ("s390/mm: make TASK_SIZE independent from the number of page table levels")
Cc: <stable@vger.kernel.org> # 4.12+
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
This series is slightly unusual because it includes Arnd's compat
ioctl tree here:
1c46a2cf2d Merge tag 'block-ioctl-cleanup-5.6' into 5.6/scsi-queue
Excluding Arnd's changes, this is mostly an update of the usual
drivers: megaraid_sas, mpt3sas, qla2xxx, ufs, lpfc, hisi_sas. There
are a couple of core and base updates around error propagation and
atomicity in the attribute container base we use for the SCSI
transport classes. The rest is minor changes and updates.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXjHQJyYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishZZ8AQC02N+v
iUnTl1YxGPjIWBbnHuUxN2Qbb9D3C6gAT1LkigEArlk163K3A1XEQHF/VNCdAz/f
01XYTd3p1VHuegIBHlk=
=Cn52
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"This series is slightly unusual because it includes Arnd's compat
ioctl tree here:
1c46a2cf2d Merge tag 'block-ioctl-cleanup-5.6' into 5.6/scsi-queue
Excluding Arnd's changes, this is mostly an update of the usual
drivers: megaraid_sas, mpt3sas, qla2xxx, ufs, lpfc, hisi_sas.
There are a couple of core and base updates around error propagation
and atomicity in the attribute container base we use for the SCSI
transport classes.
The rest is minor changes and updates"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (149 commits)
scsi: hisi_sas: Rename hisi_sas_cq.pci_irq_mask
scsi: hisi_sas: Add prints for v3 hw interrupt converge and automatic affinity
scsi: hisi_sas: Modify the file permissions of trigger_dump to write only
scsi: hisi_sas: Replace magic number when handle channel interrupt
scsi: hisi_sas: replace spin_lock_irqsave/spin_unlock_restore with spin_lock/spin_unlock
scsi: hisi_sas: use threaded irq to process CQ interrupts
scsi: ufs: Use UFS device indicated maximum LU number
scsi: ufs: Add max_lu_supported in struct ufs_dev_info
scsi: ufs: Delete is_init_prefetch from struct ufs_hba
scsi: ufs: Inline two functions into their callers
scsi: ufs: Move ufshcd_get_max_pwr_mode() to ufshcd_device_params_init()
scsi: ufs: Split ufshcd_probe_hba() based on its called flow
scsi: ufs: Delete struct ufs_dev_desc
scsi: ufs: Fix ufshcd_probe_hba() reture value in case ufshcd_scsi_add_wlus() fails
scsi: ufs-mediatek: enable low-power mode for hibern8 state
scsi: ufs: export some functions for vendor usage
scsi: ufs-mediatek: add dbg_register_dump implementation
scsi: qla2xxx: Fix a NULL pointer dereference in an error path
scsi: qla1280: Make checking for 64bit support consistent
scsi: megaraid_sas: Update driver version to 07.713.01.00-rc1
...
- Add clang 10 build support.
- Fix BUG() implementation to contain precise bug address, which is
relevant for kprobes.
- Make ftraced function appear in a stacktrace.
- Minor perf improvements and refactoring.
- Possible deadlock and recovery fixes in pci code.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAl4wVuIACgkQjYWKoQLX
FBijMAf9EiLpg3ZmCsd4JMYup7XPpnDoey4S6X1MwoAFgnsQS3qRdwdQCjRyGMxV
VN0q5aG9WRH5YpO8YgyPPzrZ0fVo/0BDEuckZ/eNXAKPPGVVpAEXcgQ+R4QD+6+U
OgAym/3q27CwNeUp9XDzZ5jjXhL8Y+v3S900OoxTbn6YHx/0K+FDdJSmysnB+4aG
5JDjMH42MrKstVlY3van3A4WNs5vBNLx+pLUhcsENLio1Ni01qHkRh28GLzrkDrA
q/VonLFxjFlzQ2F0D5HTVT9nk+Z1RstMq92gUZLOK/tEd036f/j+TMyVm6WG98OV
VEXz2ByH19ur2Inw8nTCOPeN1X44Lw==
=4l6g
-----END PGP SIGNATURE-----
Merge tag 's390-5.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Vasily Gorbik:
- Add clang 10 build support.
- Fix BUG() implementation to contain precise bug address, which is
relevant for kprobes.
- Make ftraced function appear in a stacktrace.
- Minor perf improvements and refactoring.
- Possible deadlock and recovery fixes in pci code.
* tag 's390-5.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390: fix __EMIT_BUG() macro
s390/ftrace: generate traced function stack frame
s390: adjust -mpacked-stack support check for clang 10
s390/jump_label: use "i" constraint for clang
s390/cpum_sf: Use DIV_ROUND_UP
s390/cpum_sf: Use kzalloc and minor changes
s390/cpum_sf: Convert debug trace to common layout
s390/pci: Fix possible deadlock in recover_store()
s390/pci: Recover handle in clp_set_pci_fn()
Pull scheduler updates from Ingo Molnar:
"These were the main changes in this cycle:
- More -rt motivated separation of CONFIG_PREEMPT and
CONFIG_PREEMPTION.
- Add more low level scheduling topology sanity checks and warnings
to filter out nonsensical topologies that break scheduling.
- Extend uclamp constraints to influence wakeup CPU placement
- Make the RT scheduler more aware of asymmetric topologies and CPU
capacities, via uclamp metrics, if CONFIG_UCLAMP_TASK=y
- Make idle CPU selection more consistent
- Various fixes, smaller cleanups, updates and enhancements - please
see the git log for details"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (58 commits)
sched/fair: Define sched_idle_cpu() only for SMP configurations
sched/topology: Assert non-NUMA topology masks don't (partially) overlap
idle: fix spelling mistake "iterrupts" -> "interrupts"
sched/fair: Remove redundant call to cpufreq_update_util()
sched/psi: create /proc/pressure and /proc/pressure/{io|memory|cpu} only when psi enabled
sched/fair: Fix sgc->{min,max}_capacity calculation for SD_OVERLAP
sched/fair: calculate delta runnable load only when it's needed
sched/cputime: move rq parameter in irqtime_account_process_tick
stop_machine: Make stop_cpus() static
sched/debug: Reset watchdog on all CPUs while processing sysrq-t
sched/core: Fix size of rq::uclamp initialization
sched/uclamp: Fix a bug in propagating uclamp value in new cgroups
sched/fair: Load balance aggressively for SCHED_IDLE CPUs
sched/fair : Improve update_sd_pick_busiest for spare capacity case
watchdog: Remove soft_lockup_hrtimer_cnt and related code
sched/rt: Make RT capacity-aware
sched/fair: Make EAS wakeup placement consider uclamp restrictions
sched/fair: Make task_fits_capacity() consider uclamp restrictions
sched/uclamp: Rename uclamp_util_with() into uclamp_rq_util_with()
sched/uclamp: Make uclamp util helpers use and return UL values
...
Remove kvm_arch_vcpu_init() and kvm_arch_vcpu_uninit() now that all
arch specific implementations are nops.
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We must not use the pointer output without validating the
success of the random read.
Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20200110145422.49141-11-broonie@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
These symbols are currently part of the generic archrandom.h
interface, but are currently unused and can be removed.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20200110145422.49141-4-broonie@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Setting a kprobe on getname_flags() failed:
$ echo 'p:tmr1 getname_flags +0(%r2):ustring' > kprobe_events
-bash: echo: write error: Invalid argument
Debugging the kprobes code showed that the address of
getname_flags() is contained in the __bug_table. Kprobes
doesn't allow to set probes at BUG() locations.
$ objdump -j __bug_table -x build/fs/namei.o
[..]
0000000000000108 R_390_PC32 .text+0x00000000000075a8
000000000000010c R_390_PC32 .L223+0x0000000000000004
I was expecting getname_flags() to start with a BUG(), but:
7598: e3 20 10 00 00 04 lg %r2,0(%r1)
759e: c0 f4 00 00 00 00 jg 759e <putname+0x7e>
75a0: R_390_PLT32DBL kmem_cache_free+0x2
75a4: a7 f4 00 01 j 75a6 <putname+0x86>
00000000000075a8 <getname_flags>:
75a8: c0 04 00 00 00 00 brcl 0,75a8 <getname_flags>
75ae: eb 6f f0 48 00 24 stmg %r6,%r15,72(%r15)
75b4: b9 04 00 ef lgr %r14,%r15
75b8: e3 f0 ff a8 ff 71 lay %r15,-88(%r15)
So the BUG() is actually the last opcode of the previous function.
Fix this by switching to using the MONITOR CALL (MC) instruction,
and set the entry in __bug_table to the beginning of that MC.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Currently kernel build fails under clang if jump labels are enabled.
The problem is "X" constraint usage "Any operand whatsoever is allowed",
for which clang produces the following:
.pushsection __jump_table,"aw"
.balign 8
.long 0b-.,.Ltmp577-.
.quad %r0+0-. # %r0 is not allowed here
.popsection
Under gcc constraints "X" or "jdd" (gcc > 9) are used for static keys.
Ideally, we'd have used "i" for gcc, but it doesn't work in all cases
with -fPIC code. This is gcc-specific problem that doesn't exist in llvm.
Since clang does not have "jdd" simply always use "i" constraint for it.
Suggested-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
When we try to recover a PCI function using
echo 1 > /sys/bus/pci/devices/<id>/recover
or manually with
echo 1 > /sys/bus/pci/devices/<id>/remove
echo 0 > /sys/bus/pci/slots/<slot>/power
echo 1 > /sys/bus/pci/slots/<slot>/power
clp_disable_fn() / clp_enable_fn() call clp_set_pci_fn() to first
disable and then reenable the function.
When the function is already in the requested state we may be left with
an invalid function handle.
To get a new valid handle we do a clp_list_pci() call. For this we need
both the function ID and function handle in clp_set_pci_fn() so pass the
zdev and get both.
To simplify things also pull setting the refreshed function handle into
clp_set_pci_fn()
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
In order to avoid needless #ifdef CONFIG_COMPAT checks,
move the compat_ptr() definition to linux/compat.h
where it can be seen by any file regardless of the
architecture.
Only s390 needs a special definition, this can use the
self-#define trick we have elsewhere.
Reviewed-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>