With a relocatable kernel that could reside at any place in memory, the
storage size for the SYSCALL and PGM_CHECK handlers needs to be
increased from .long to .quad.
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Reviewed-by: Philipp Rudo <prudo@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This patch adds support for building a relocatable kernel with -fPIE.
The kernel will be relocated to 0 early in the boot process.
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Reviewed-by: Philipp Rudo <prudo@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Provide support for PCI I/O instructions that work on mapped IO addresses.
Signed-off-by: Sebastian Ott <sebott@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This is a preparation patch for usage of new pci instructions.
No functional change.
Signed-off-by: Sebastian Ott <sebott@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Provide a kernel parameter to force the usage of floating interrupts.
Signed-off-by: Sebastian Ott <sebott@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Gather statistics to distinguish floating and directed interrupts.
Signed-off-by: Sebastian Ott <sebott@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Up until now all interrupts on s390 have been floating. For MSI interrupts
we've used a global summary bit vector (with a bit for each function) and
a per-function interrupt bit vector (with a bit per MSI).
This patch introduces a new IRQ delivery mode: CPU directed interrupts.
In this new mode a per-CPU interrupt bit vector is used (with a bit per
MSI per function). Further it is now possible to direct an IRQ to a
specific CPU so we can finally support IRQ affinity.
If an interrupt can't be delivered because the appointed CPU is occupied
by a hypervisor the interrupt is delivered floating. For this a global
summary bit vector is used (with a bit per CPU).
Signed-off-by: Sebastian Ott <sebott@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Provide the ability to create cachesize aligned interrupt vectors.
These will be used for per-CPU interrupt vectors.
Signed-off-by: Sebastian Ott <sebott@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add an extra parameter for airq handlers to recognize
floating vs. directed interrupts.
Signed-off-by: Sebastian Ott <sebott@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Move everything interrupt related from pci.c to pci_irq.c.
No functional change.
Signed-off-by: Sebastian Ott <sebott@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Provide an interface for userspace so it can find out if a machine is
capeable of doing secure boot. The interface is, for example, needed for
zipl so it can find out which file format it can/should write to disk.
Signed-off-by: Philipp Rudo <prudo@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add kernel signature verification to kexec_file. The verification is based
on module signature verification and works with kernel images signed via
scripts/sign-file.
Signed-off-by: Philipp Rudo <prudo@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The leading 64 kB of a kernel image doesn't contain any data needed to boot
the new kernel when it was loaded via kexec_file. Thus kexec_file currently
strips them off before loading the image. Keep the leading 64 kB in order
to be able to pass a ipl_report to the next kernel.
Signed-off-by: Philipp Rudo <prudo@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
s390_image_load and s390_elf_load have the same code to load the different
components. Combine this functionality in one shared function.
While at it move kexec_file_update_kernel into the new function as well.
Signed-off-by: Philipp Rudo <prudo@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Access the parmarea in head.S via a struct instead of individual offsets.
While at it make the fields in the parmarea .quads.
Signed-off-by: Philipp Rudo <prudo@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Read the IPL Report block provided by secure-boot, add the entries
of the certificate list to the system key ring and print the list
of components.
PR: Adjust to Vasilys bootdata_preserved patch set. Preserve ipl_cert_list
for later use in kexec_file.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Philipp Rudo <prudo@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
To transport the information required for secure boot a new IPL report
will be created at boot time. It will be written to memory right after
the IPL parameter block. To work with the IPL report a couple of
additional structure definitions are added the the uapi/ipl.h header.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The IPL parameter block is used as an interface between Linux and
the machine to query and change the boot device and boot options.
To be able to create IPL parameter block in user space and pass it
as segment to kexec provide an uapi header with proper structure
definitions for the block.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The ipl_info union in struct ipl_parameter_block has the same name as
the struct ipl_info. This does not help while reading the code and the
union in struct ipl_parameter_block does not need to be named. Drop
the name from the union.
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
We do track the current steal time of the host CPUs. Let us use
this value to disable halt polling if the steal time goes beyond
a configured value.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Instead of adding a new machine option to disable/enable the keywrapping
options of pckmo (like for AES and DEA) we can now use the CPU model to
decide. As ECC is also wrapped with the AES key we need that to be
enabled.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
This enables stfle.151 and adds the subfunctions for DFLTCC. Bit 151 is
added to the list of facilities that will be enabled when there is no
cpu model involved as DFLTCC requires no additional handling from
userspace, e.g. for migration.
Please note that a cpu model enabled user space can and will have the
final decision on the facility bits for a guests.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Collin Walling <walling@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
With git commit d1874a0c28
"s390/mm: make the pxd_offset functions more robust" and a 2-level page
table it can now happen that pgd_bad() gets asked to verify a large
segment table entry. If the entry is marked as dirty pgd_bad() will
incorrectly return true.
Change the pgd_bad(), p4d_bad(), pud_bad() and pmd_bad() functions to
first verify the table type, return false if the table level is lower
than what the function is suppossed to check, return true if the table
level is too high, and otherwise check the relevant region and segment
table bits. pmd_bad() has to check against ~SEGMENT_ENTRY_BITS for
normal page table pointers or ~SEGMENT_ENTRY_BITS_LARGE for large
segment table entries. Same for pud_bad() which has to check against
~_REGION_ENTRY_BITS or ~_REGION_ENTRY_BITS_LARGE.
Fixes: d1874a0c28 ("s390/mm: make the pxd_offset functions more robust")
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
arch/s390/lib/uaccess.c is built without kasan instrumentation. Kasan
checks are performed explicitly in copy_from_user/copy_to_user
functions. But since those functions could be inlined, calls from
files like uaccess.c with instrumentation disabled won't generate
kasan reports. This is currently the case with strncpy_from_user
function which was revealed by newly added kasan test. Avoid inlining of
copy_from_user/copy_to_user when the kernel is built with kasan support
to make sure kasan checks are fully functional.
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
A few architectures use <asm/segment.h> internally, but nothing in
common code does. Remove all the empty or almost empty versions of it,
including the asm-generic one.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Define the gup_fast_permitted to check against the asce_limit of the
mm attached to the current task, then replace the s390 specific gup
code with the generic implementation in mm/gup.c.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Change the way how pgd_offset, p4d_offset, pud_offset and pmd_offset
walk the page tables. pgd_offset now always calculates the index for
the top-level page table and adds it to the pgd, this is either a
segment table offset for a 2-level setup, a region-3 offset for 3-levels,
region-2 offset for 4-levels, or a region-1 offset for a 5-level setup.
The other three functions p4d_offset, pud_offset and pmd_offset will
only add the respective offset if they dereference the passed pointer.
With the new way of walking the page tables a sequence like this from
mm/gup.c now works:
pgdp = pgd_offset(current->mm, addr);
pgd = READ_ONCE(*pgdp);
p4dp = p4d_offset(&pgd, addr);
p4d = READ_ONCE(*p4dp);
pudp = pud_offset(&p4d, addr);
pud = READ_ONCE(*pudp);
pmdp = pmd_offset(&pud, addr);
pmd = READ_ONCE(*pmdp);
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This enables stfle.150 and adds the subfunctions for SORTL. Bit 150 is
added to the list of facilities that will be enabled when there is no
cpu model involved as sortl requires no additional handling from
userspace, e.g. for migration.
Please note that a cpu model enabled user space can and will have the
final decision on the facility bits for a guests.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Collin Walling <walling@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
This enables stfle.155 and adds the subfunctions for KDSA. Bit 155 is
added to the list of facilities that will be enabled when there is no
cpu model involved as MSA9 requires no additional handling from
userspace, e.g. for migration.
Please note that a cpu model enabled user space can and will have the
final decision on the facility bits for a guests.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Collin Walling <walling@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
llvm on s390 has problems with __builtin_return_address(n), with n>0,
this results in a somewhat cryptic error message:
fatal error: error in backend: Unsupported stack frame traversal count
To work around it, use the direct return address directly. This
is probably not ideal here, but gets things to compile and should
only lead to inferior reporting, not to misbehavior of the generated
code.
Link: https://bugs.llvm.org/show_bug.cgi?id=41424
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
clang fails to use the %O and %R inline assembly modifiers
the same way as gcc, leading to build failures with every use
of __load_psw_mask():
/tmp/nmi-4a9f80.s: Assembler messages:
/tmp/nmi-4a9f80.s:571: Error: junk at end of line: `+8(160(%r11))'
/tmp/nmi-4a9f80.s:626: Error: junk at end of line: `+8(160(%r11))'
Replace these with a more conventional way of passing the addresses
that should work with both clang and gcc.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Building system calls with clang results in a warning
about an alias from a global function to a static one:
../fs/namei.c:3847:1: warning: unused function '__se_sys_mkdirat' [-Wunused-function]
SYSCALL_DEFINE3(mkdirat, int, dfd, const char __user *, pathname, umode_t, mode)
^
../include/linux/syscalls.h:219:36: note: expanded from macro 'SYSCALL_DEFINE3'
#define SYSCALL_DEFINE3(name, ...) SYSCALL_DEFINEx(3, _##name, __VA_ARGS__)
^
../include/linux/syscalls.h:228:2: note: expanded from macro 'SYSCALL_DEFINEx'
__SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
^
../arch/s390/include/asm/syscall_wrapper.h:126:18: note: expanded from macro '__SYSCALL_DEFINEx'
asmlinkage long __se_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__)) \
^
<scratch space>:31:1: note: expanded from here
__se_sys_mkdirat
^
The only reference to the static __se_sys_mkdirat() here is the alias, but
this only gets evaluated later. Making this function global as well avoids
the warning.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The CALL_ON_STACK helper currently does not work with clang and for
calls without arguments. It does not initialize r2 although the constraint
is "+&d". Rework the CALL_FMT_x and the CALL_ON_STACK macros to work
with clang and produce optimal code in all cases.
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The Ultravisor Call Facility (stfle bit 158) defines an API to the
Ultravisor (UV calls), a mini hypervisor located at machine
level. With help of the Ultravisor, KVM will be able to run
"protected" VMs, special VMs whose memory and management data are
unavailable to KVM.
The protected VMs can also request services from the Ultravisor.
The guest api consists of UV calls to share and unshare memory with the
kvm hypervisor.
To enable this feature support PROTECTED_VIRTUALIZATION_GUEST kconfig
option has been introduced.
Co-developed-by: Janosch Frank <frankja@de.ibm.com>
Signed-off-by: Janosch Frank <frankja@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
.boot.preserved.data is a better fit for ipl block than .boot.data
which is discarded after init. Reusing .boot.preserved.data allows to
simplify code a little bit and avoid copying data from .boot.data to
persistent variables.
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Introduce .boot.preserve.data section which is similar to .boot.data and
"shared" between the decompressor code and the decompressed kernel. The
decompressor will store values in it, and copy over to the decompressed
image before starting it. This method allows to avoid using pre-defined
addresses and other hacks to pass values between those boot phases.
Unlike .boot.data section .boot.preserved.data is NOT a part of init data,
and hence will be preserved for the kernel life time.
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
As the generic rwsem-xadd code is using the appropriate acquire and
release versions of the atomic operations, the arch specific rwsem.h
files will not be that much faster than the generic code as long as the
atomic functions are properly implemented. So we can remove those arch
specific rwsem.h and stop building asm/rwsem.h to reduce maintenance
effort.
Currently, only x86, alpha and ia64 have implemented architecture
specific fast paths. I don't have access to alpha and ia64 systems for
testing, but they are legacy systems that are not likely to be updated
to the latest kernel anyway.
By using a rwsem microbenchmark, the total locking rates on a 4-socket
56-core 112-thread x86-64 system before and after the patch were as
follows (mixed means equal # of read and write locks):
Before Patch After Patch
# of Threads wlock rlock mixed wlock rlock mixed
------------ ----- ----- ----- ----- ----- -----
1 29,201 30,143 29,458 28,615 30,172 29,201
2 6,807 13,299 1,171 7,725 15,025 1,804
4 6,504 12,755 1,520 7,127 14,286 1,345
8 6,762 13,412 764 6,826 13,652 726
16 6,693 15,408 662 6,599 15,938 626
32 6,145 15,286 496 5,549 15,487 511
64 5,812 15,495 60 5,858 15,572 60
There were some run-to-run variations for the multi-thread tests. For
x86-64, using the generic C code fast path seems to be a little bit
faster than the assembly version with low lock contention. Looking at
the assembly version of the fast paths, there are assembly to/from C
code wrappers that save and restore all the callee-clobbered registers
(7 registers on x86-64). The assembly generated from the generic C
code doesn't need to do that. That may explain the slight performance
gain here.
The generic asm rwsem.h can also be merged into kernel/locking/rwsem.h
with no code change as no other code other than those under
kernel/locking needs to access the internal rwsem macros and functions.
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-c6x-dev@linux-c6x.org
Cc: linux-m68k@lists.linux-m68k.org
Cc: linux-riscv@lists.infradead.org
Cc: linux-um@lists.infradead.org
Cc: linux-xtensa@linux-xtensa.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: nios2-dev@lists.rocketboards.org
Cc: openrisc@lists.librecores.org
Cc: uclinux-h8-devel@lists.sourceforge.jp
Link: https://lkml.kernel.org/r/20190322143008.21313-2-longman@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull s390 fixes from Martin Schwidefsky:
"Improvements and bug fixes for 5.1-rc2:
- Fix early free of the channel program in vfio
- On AP device removal make sure that all messages are flushed with
the driver still attached that queued the message
- Limit brk randomization to 32MB to reduce the chance that the heap
of ld.so is placed after the main stack
- Add a rolling average for the steal time of a CPU, this will be
needed for KVM to decide when to do busy waiting
- Fix a warning in the CPU-MF code
- Add a notification handler for AP configuration change to react
faster to new AP devices"
* tag 's390-5.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/cpumf: Fix warning from check_processor_id
zcrypt: handle AP Info notification from CHSC SEI command
vfio: ccw: only free cp on final interrupt
s390/vtime: steal time exponential moving average
s390/zcrypt: revisit ap device remove procedure
s390: limit brk randomization to 32MB