Commit Graph

1496 Commits

Author SHA1 Message Date
Thomas Gleixner
55e0391572 x86/irq: Consolidate DMAR irq allocation
None of the DMAR specific fields are required.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20200826112332.163462706@linutronix.de
2020-09-16 16:52:33 +02:00
Thomas Gleixner
33a65ba470 x86_ioapic_Consolidate_IOAPIC_allocation
Move the IOAPIC specific fields into their own struct and reuse the common
devid. Get rid of the #ifdeffery as it does not matter at all whether the
alloc info is a couple of bytes longer or not.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Wei Liu <wei.liu@kernel.org>
Acked-by: Joerg Roedel <jroedel@suse.de>
Link: https://lore.kernel.org/r/20200826112332.054367732@linutronix.de
2020-09-16 16:52:32 +02:00
Thomas Gleixner
2bf1e7bced x86/msi: Consolidate HPET allocation
None of the magic HPET fields are required in any way.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Joerg Roedel <jroedel@suse.de>
Link: https://lore.kernel.org/r/20200826112331.943993771@linutronix.de
2020-09-16 16:52:31 +02:00
Thomas Gleixner
6b6256e616 iommu/irq_remapping: Consolidate irq domain lookup
Now that the iommu implementations handle the X86_*_GET_PARENT_DOMAIN
types, consolidate the two getter functions.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Joerg Roedel <jroedel@suse.de>
Link: https://lore.kernel.org/r/20200826112331.741909337@linutronix.de
2020-09-16 16:52:30 +02:00
Thomas Gleixner
b4c364da32 x86/irq: Add allocation type for parent domain retrieval
irq_remapping_ir_irq_domain() is used to retrieve the remapping parent
domain for an allocation type. irq_remapping_irq_domain() is for retrieving
the actual device domain for allocating interrupts for a device.

The two functions are similar and can be unified by using explicit modes
for parent irq domain retrieval.

Add X86_IRQ_ALLOC_TYPE_IOAPIC/HPET_GET_PARENT and use it in the iommu
implementations. Drop the parent domain retrieval for PCI_MSI/X as that is
unused.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Joerg Roedel <jroedel@suse.de>
Link: https://lore.kernel.org/r/20200826112331.436350257@linutronix.de
2020-09-16 16:52:29 +02:00
Thomas Gleixner
801b5e4c4e x86_irq_Rename_X86_IRQ_ALLOC_TYPE_MSI_to_reflect_PCI_dependency
No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Joerg Roedel <jroedel@suse.de>
Link: https://lore.kernel.org/r/20200826112331.343103175@linutronix.de
2020-09-16 16:52:29 +02:00
Thomas Gleixner
9d55f02ad4 x86/msi: Remove pointless vcpu_affinity callback
Setting the irq_set_vcpu_affinity() callback to
irq_chip_set_vcpu_affinity_parent() is a pointless exercise because the
function which utilizes it searchs the domain hierarchy to find a parent
domain which has such a callback.

Remove the useless indirection.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20200826112331.250130127@linutronix.de
2020-09-16 16:52:29 +02:00
Thomas Gleixner
b0a19555ef x86/msi: Move compose message callback where it belongs
Composing the MSI message at the MSI chip level is wrong because the
underlying parent domain is the one which knows how the message should be
composed for the direct vector delivery or the interrupt remapping table
entry.

The interrupt remapping aware PCI/MSI chip does that already. Make the
direct delivery chip do the same and move the composition of the direct
delivery MSI message to the vector domain irq chip.

This prepares for the upcoming device MSI support to avoid having
architecture specific knowledge in the device MSI domain irq chips.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20200826112331.157603198@linutronix.de
2020-09-16 16:52:28 +02:00
Thomas Gleixner
13b90cadfc genirq/chip: Use the first chip in irq_chip_compose_msi_msg()
The documentation of irq_chip_compose_msi_msg() claims that with
hierarchical irq domains the first chip in the hierarchy which has an
irq_compose_msi_msg() callback is chosen. But the code just keeps
iterating after it finds a chip with a compose callback.

The x86 HPET MSI implementation relies on that behaviour, but that does not
make it more correct.

The message should always be composed at the domain which manages the
underlying resource (e.g. APIC or remap table) because that domain knows
about the required layout of the message.

On X86 the following hierarchies exist:

1)   vector -------- PCI/MSI
2)   vector -- IR -- PCI/MSI

The vector domain has a different message format than the IR (remapping)
domain. So obviously the PCI/MSI domain can't compose the message without
having knowledge about the parent domain, which is exactly the opposite of
what hierarchical domains want to achieve.

X86 actually has two different PCI/MSI chips where #1 has a compose
callback and #2 does not. #2 delegates the composition to the remap domain
where it belongs, but #1 does it at the PCI/MSI level.

For the upcoming device MSI support it's necessary to change this and just
let the first domain which can compose the message take care of it. That
way the top level chip does not have to worry about it and the device MSI
code does not need special knowledge about topologies. It just sets the
compose callback to NULL and lets the hierarchy pick the first chip which
has one.

Due to that the attempt to move the compose callback from the direct
delivery PCI/MSI domain to the vector domain made the system fail to boot
with interrupt remapping enabled because in the remapping case
irq_chip_compose_msi_msg() keeps iterating and choses the compose callback
of the vector domain which obviously creates the wrong format for the remap
table.

Break out of the loop when the first irq chip with a compose callback is
found and fixup the HPET code temporarily. That workaround will be removed
once the direct delivery compose callback is moved to the place where it
belongs in the vector domain.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Marc Zyngier <maz@kernel.org>                                                                                                                                                                                                                                     Link: https://lore.kernel.org/r/20200826112331.047917603@linutronix.de
2020-09-16 16:52:28 +02:00
Linus Torvalds
dcc5c6f013 Three interrupt related fixes for X86:
- Move disabling of the local APIC after invoking fixup_irqs() to ensure
    that interrupts which are incoming are noted in the IRR and not ignored.
 
  - Unbreak affinity setting. The rework of the entry code reused the
    regular exception entry code for device interrupts. The vector number is
    pushed into the errorcode slot on the stack which is then lifted into an
    argument and set to -1 because that's regs->orig_ax which is used in
    quite some places to check whether the entry came from a syscall. But it
    was overlooked that orig_ax is used in the affinity cleanup code to
    validate whether the interrupt has arrived on the new target. It turned
    out that this vector check is pointless because interrupts are never
    moved from one vector to another on the same CPU. That check is a
    historical leftover from the time where x86 supported multi-CPU
    affinities, but not longer needed with the now strict single CPU
    affinity. Famous last words ...
 
  - Add a missing check for an empty cpumask into the matrix allocator. The
    affinity change added a warning to catch the case where an interrupt is
    moved on the same CPU to a different vector. This triggers because a
    condition with an empty cpumask returns an assignment from the allocator
    as the allocator uses for_each_cpu() without checking the cpumask for
    being empty. The historical inconsistent for_each_cpu() behaviour of
    ignoring the cpumask and unconditionally claiming that CPU0 is in the
    mask striked again. Sigh.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl9L6WYTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoRV5D/9dRq/4pn5g1esnzm4GhIr2To3Qp6cl
 s7VswTdN8FmWBqVz79ZVYqj663UpL3pPY1np01ctrxRQLeDVfWcI2BMR5irnny8h
 otORhFysuDUl+yfuomWVbzfQQNJ+VeQVeWKD3cIhD1I3sXqDX5Wpa8n086hYKQXx
 eutVC3+JdzJZFm68xarlLW7h2f1au1eZZFgVnyY+J5KO9Dwm63a4RITdDVk7KV4t
 uKEDza5P9SY+kE9LAGNq8BAEObf9FeMXw0mRM7atRKVsJQQGVk6bgiuaRr01w1+W
 hQCPx/3g6PHFnGgx/KQgHf1jgrZFhXOyIDo6ZeFy+SJGIZRB3n8o5Kjns2l8Pa+K
 2qy1TRoZIsGkwGCi/BM6viLzBikbh/gnGYy/8KTEJdKs8P3ZKHUZVSAB1dpapOWX
 4n+rKoVPnvxgRSeZZo+tgLkvUdh+/9Huyr9vHiYjtbbB8tFvjlkOmrZ6sirHByDy
 jg6TjOJVb1CC/PoW4M7JNfmeKvHQnTACwH6djdVGDLPJspuUsYkPI0Uk0CX21SA3
 45Tuylvl9jT6+vq95Av2RbAiipmSpZ/O1NHV8Paf466SKmhUgG3lv5PHh3xTm1U2
 Be/RbJ75x4Muuw42ttU1LcpcLPcOZRQNEREoNd5UysgYYgWRekBvU+ZRQNW4g2nw
 3JDgJgm0iBUN9w==
 =zIi4
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2020-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:
 "Three interrupt related fixes for X86:

   - Move disabling of the local APIC after invoking fixup_irqs() to
     ensure that interrupts which are incoming are noted in the IRR and
     not ignored.

   - Unbreak affinity setting.

     The rework of the entry code reused the regular exception entry
     code for device interrupts. The vector number is pushed into the
     errorcode slot on the stack which is then lifted into an argument
     and set to -1 because that's regs->orig_ax which is used in quite
     some places to check whether the entry came from a syscall.

     But it was overlooked that orig_ax is used in the affinity cleanup
     code to validate whether the interrupt has arrived on the new
     target. It turned out that this vector check is pointless because
     interrupts are never moved from one vector to another on the same
     CPU. That check is a historical leftover from the time where x86
     supported multi-CPU affinities, but not longer needed with the now
     strict single CPU affinity. Famous last words ...

   - Add a missing check for an empty cpumask into the matrix allocator.

     The affinity change added a warning to catch the case where an
     interrupt is moved on the same CPU to a different vector. This
     triggers because a condition with an empty cpumask returns an
     assignment from the allocator as the allocator uses for_each_cpu()
     without checking the cpumask for being empty. The historical
     inconsistent for_each_cpu() behaviour of ignoring the cpumask and
     unconditionally claiming that CPU0 is in the mask struck again.
     Sigh.

  plus a new entry into the MAINTAINER file for the HPE/UV platform"

* tag 'x86-urgent-2020-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq/matrix: Deal with the sillyness of for_each_cpu() on UP
  x86/irq: Unbreak interrupt affinity setting
  x86/hotplug: Silence APIC only after all interrupts are migrated
  MAINTAINERS: Add entry for HPE Superdome Flex (UV) maintainers
2020-08-30 12:01:23 -07:00
Thomas Gleixner
e027fffff7 x86/irq: Unbreak interrupt affinity setting
Several people reported that 5.8 broke the interrupt affinity setting
mechanism.

The consolidation of the entry code reused the regular exception entry code
for device interrupts and changed the way how the vector number is conveyed
from ptregs->orig_ax to a function argument.

The low level entry uses the hardware error code slot to push the vector
number onto the stack which is retrieved from there into a function
argument and the slot on stack is set to -1.

The reason for setting it to -1 is that the error code slot is at the
position where pt_regs::orig_ax is. A positive value in pt_regs::orig_ax
indicates that the entry came via a syscall. If it's not set to a negative
value then a signal delivery on return to userspace would try to restart a
syscall. But there are other places which rely on pt_regs::orig_ax being a
valid indicator for syscall entry.

But setting pt_regs::orig_ax to -1 has a nasty side effect vs. the
interrupt affinity setting mechanism, which was overlooked when this change
was made.

Moving interrupts on x86 happens in several steps. A new vector on a
different CPU is allocated and the relevant interrupt source is
reprogrammed to that. But that's racy and there might be an interrupt
already in flight to the old vector. So the old vector is preserved until
the first interrupt arrives on the new vector and the new target CPU. Once
that happens the old vector is cleaned up, but this cleanup still depends
on the vector number being stored in pt_regs::orig_ax, which is now -1.

That -1 makes the check for cleanup: pt_regs::orig_ax == new_vector
always false. As a consequence the interrupt is moved once, but then it
cannot be moved anymore because the cleanup of the old vector never
happens.

There would be several ways to convey the vector information to that place
in the guts of the interrupt handling, but on deeper inspection it turned
out that this check is pointless and a leftover from the old affinity model
of X86 which supported multi-CPU affinities. Under this model it was
possible that an interrupt had an old and a new vector on the same CPU, so
the vector match was required.

Under the new model the effective affinity of an interrupt is always a
single CPU from the requested affinity mask. If the affinity mask changes
then either the interrupt stays on the CPU and on the same vector when that
CPU is still in the new affinity mask or it is moved to a different CPU, but
it is never moved to a different vector on the same CPU.

Ergo the cleanup check for the matching vector number is not required and
can be removed which makes the dependency on pt_regs:orig_ax go away.

The remaining check for new_cpu == smp_processsor_id() is completely
sufficient. If it matches then the interrupt was successfully migrated and
the cleanup can proceed.

For paranoia sake add a warning into the vector assignment code to
validate that the assumption of never moving to a different vector on
the same CPU holds.

Fixes: 633260fa14 ("x86/irq: Convey vector as argument and not in ptregs")
Reported-by: Alex bykov <alex.bykov@scylladb.com>
Reported-by: Avi Kivity <avi@scylladb.com>
Reported-by: Alexander Graf <graf@amazon.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Graf <graf@amazon.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87wo1ltaxz.fsf@nanos.tec.linutronix.de
2020-08-27 09:29:23 +02:00
Gustavo A. R. Silva
df561f6688 treewide: Use fallthrough pseudo-keyword
Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-08-23 17:36:59 -05:00
Linus Torvalds
97d052ea3f A set of locking fixes and updates:
- Untangle the header spaghetti which causes build failures in various
     situations caused by the lockdep additions to seqcount to validate that
     the write side critical sections are non-preemptible.
 
   - The seqcount associated lock debug addons which were blocked by the
     above fallout.
 
     seqcount writers contrary to seqlock writers must be externally
     serialized, which usually happens via locking - except for strict per
     CPU seqcounts. As the lock is not part of the seqcount, lockdep cannot
     validate that the lock is held.
 
     This new debug mechanism adds the concept of associated locks.
     sequence count has now lock type variants and corresponding
     initializers which take a pointer to the associated lock used for
     writer serialization. If lockdep is enabled the pointer is stored and
     write_seqcount_begin() has a lockdep assertion to validate that the
     lock is held.
 
     Aside of the type and the initializer no other code changes are
     required at the seqcount usage sites. The rest of the seqcount API is
     unchanged and determines the type at compile time with the help of
     _Generic which is possible now that the minimal GCC version has been
     moved up.
 
     Adding this lockdep coverage unearthed a handful of seqcount bugs which
     have been addressed already independent of this.
 
     While generaly useful this comes with a Trojan Horse twist: On RT
     kernels the write side critical section can become preemtible if the
     writers are serialized by an associated lock, which leads to the well
     known reader preempts writer livelock. RT prevents this by storing the
     associated lock pointer independent of lockdep in the seqcount and
     changing the reader side to block on the lock when a reader detects
     that a writer is in the write side critical section.
 
  - Conversion of seqcount usage sites to associated types and initializers.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl8xmPYTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoTuQEACyzQCjU8PgehPp9oMqWzaX2fcVyuZO
 QU2yw6gmz2oTz3ZHUNwdW8UnzGh2OWosK3kDruoD9FtSS51lER1/ISfSPCGfyqxC
 KTjOcB1Kvxwq/3LcCx7Zi3ZxWApat74qs3EhYhKtEiQ2Y9xv9rLq8VV1UWAwyxq0
 eHpjlIJ6b6rbt+ARslaB7drnccOsdK+W/roNj4kfyt+gezjBfojGRdMGQNMFcpnv
 shuTC+vYurAVIiVA/0IuizgHfwZiXOtVpjVoEWaxg6bBH6HNuYMYzdSa/YrlDkZs
 n/aBI/Xkvx+Eacu8b1Zwmbzs5EnikUK/2dMqbzXKUZK61eV4hX5c2xrnr1yGWKTs
 F/juh69Squ7X6VZyKVgJ9RIccVueqwR2EprXWgH3+RMice5kjnXH4zURp0GHALxa
 DFPfB6fawcH3Ps87kcRFvjgm6FBo0hJ1AxmsW1dY4ACFB9azFa2euW+AARDzHOy2
 VRsUdhL9CGwtPjXcZ/9Rhej6fZLGBXKr8uq5QiMuvttp4b6+j9FEfBgD4S6h8csl
 AT2c2I9LcbWqyUM9P4S7zY/YgOZw88vHRuDH7tEBdIeoiHfrbSBU7EQ9jlAKq/59
 f+Htu2Io281c005g7DEeuCYvpzSYnJnAitj5Lmp/kzk2Wn3utY1uIAVszqwf95Ul
 81ppn2KlvzUK8g==
 =7Gj+
 -----END PGP SIGNATURE-----

Merge tag 'locking-urgent-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking updates from Thomas Gleixner:
 "A set of locking fixes and updates:

   - Untangle the header spaghetti which causes build failures in
     various situations caused by the lockdep additions to seqcount to
     validate that the write side critical sections are non-preemptible.

   - The seqcount associated lock debug addons which were blocked by the
     above fallout.

     seqcount writers contrary to seqlock writers must be externally
     serialized, which usually happens via locking - except for strict
     per CPU seqcounts. As the lock is not part of the seqcount, lockdep
     cannot validate that the lock is held.

     This new debug mechanism adds the concept of associated locks.
     sequence count has now lock type variants and corresponding
     initializers which take a pointer to the associated lock used for
     writer serialization. If lockdep is enabled the pointer is stored
     and write_seqcount_begin() has a lockdep assertion to validate that
     the lock is held.

     Aside of the type and the initializer no other code changes are
     required at the seqcount usage sites. The rest of the seqcount API
     is unchanged and determines the type at compile time with the help
     of _Generic which is possible now that the minimal GCC version has
     been moved up.

     Adding this lockdep coverage unearthed a handful of seqcount bugs
     which have been addressed already independent of this.

     While generally useful this comes with a Trojan Horse twist: On RT
     kernels the write side critical section can become preemtible if
     the writers are serialized by an associated lock, which leads to
     the well known reader preempts writer livelock. RT prevents this by
     storing the associated lock pointer independent of lockdep in the
     seqcount and changing the reader side to block on the lock when a
     reader detects that a writer is in the write side critical section.

   - Conversion of seqcount usage sites to associated types and
     initializers"

* tag 'locking-urgent-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
  locking/seqlock, headers: Untangle the spaghetti monster
  locking, arch/ia64: Reduce <asm/smp.h> header dependencies by moving XTP bits into the new <asm/xtp.h> header
  x86/headers: Remove APIC headers from <asm/smp.h>
  seqcount: More consistent seqprop names
  seqcount: Compress SEQCNT_LOCKNAME_ZERO()
  seqlock: Fold seqcount_LOCKNAME_init() definition
  seqlock: Fold seqcount_LOCKNAME_t definition
  seqlock: s/__SEQ_LOCKDEP/__SEQ_LOCK/g
  hrtimer: Use sequence counter with associated raw spinlock
  kvm/eventfd: Use sequence counter with associated spinlock
  userfaultfd: Use sequence counter with associated spinlock
  NFSv4: Use sequence counter with associated spinlock
  iocost: Use sequence counter with associated spinlock
  raid5: Use sequence counter with associated spinlock
  vfs: Use sequence counter with associated spinlock
  timekeeping: Use sequence counter with associated raw spinlock
  xfrm: policy: Use sequence counters with associated lock
  netfilter: nft_set_rbtree: Use sequence counter with associated rwlock
  netfilter: conntrack: Use sequence counter with associated spinlock
  sched: tasks: Use sequence counter with associated spinlock
  ...
2020-08-10 19:07:44 -07:00
Mike Rapoport
ca15ca406f mm: remove unneeded includes of <asm/pgalloc.h>
Patch series "mm: cleanup usage of <asm/pgalloc.h>"

Most architectures have very similar versions of pXd_alloc_one() and
pXd_free_one() for intermediate levels of page table.  These patches add
generic versions of these functions in <asm-generic/pgalloc.h> and enable
use of the generic functions where appropriate.

In addition, functions declared and defined in <asm/pgalloc.h> headers are
used mostly by core mm and early mm initialization in arch and there is no
actual reason to have the <asm/pgalloc.h> included all over the place.
The first patch in this series removes unneeded includes of
<asm/pgalloc.h>

In the end it didn't work out as neatly as I hoped and moving
pXd_alloc_track() definitions to <asm-generic/pgalloc.h> would require
unnecessary changes to arches that have custom page table allocations, so
I've decided to move lib/ioremap.c to mm/ and make pgalloc-track.h local
to mm/.

This patch (of 8):

In most cases <asm/pgalloc.h> header is required only for allocations of
page table memory.  Most of the .c files that include that header do not
use symbols declared in <asm/pgalloc.h> and do not require that header.

As for the other header files that used to include <asm/pgalloc.h>, it is
possible to move that include into the .c file that actually uses symbols
from <asm/pgalloc.h> and drop the include from the header file.

The process was somewhat automated using

	sed -i -E '/[<"]asm\/pgalloc\.h/d' \
                $(grep -L -w -f /tmp/xx \
                        $(git grep -E -l '[<"]asm/pgalloc\.h'))

where /tmp/xx contains all the symbols defined in
arch/*/include/asm/pgalloc.h.

[rppt@linux.ibm.com: fix powerpc warning]

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>	[m68k]
Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Matthew Wilcox <willy@infradead.org>
Link: http://lkml.kernel.org/r/20200627143453.31835-1-rppt@kernel.org
Link: http://lkml.kernel.org/r/20200627143453.31835-2-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 11:33:26 -07:00
Peter Zijlstra
0cd39f4600 locking/seqlock, headers: Untangle the spaghetti monster
By using lockdep_assert_*() from seqlock.h, the spaghetti monster
attacked.

Attack back by reducing seqlock.h dependencies from two key high level headers:

 - <linux/seqlock.h>:               -Remove <linux/ww_mutex.h>
 - <linux/time.h>:                  -Remove <linux/seqlock.h>
 - <linux/sched.h>:                 +Add    <linux/seqlock.h>

The price was to add it to sched.h ...

Core header fallout, we add direct header dependencies instead of gaining them
parasitically from higher level headers:

 - <linux/dynamic_queue_limits.h>:  +Add <asm/bug.h>
 - <linux/hrtimer.h>:               +Add <linux/seqlock.h>
 - <linux/ktime.h>:                 +Add <asm/bug.h>
 - <linux/lockdep.h>:               +Add <linux/smp.h>
 - <linux/sched.h>:                 +Add <linux/seqlock.h>
 - <linux/videodev2.h>:             +Add <linux/kernel.h>

Arch headers fallout:

 - PARISC: <asm/timex.h>:           +Add <asm/special_insns.h>
 - SH:     <asm/io.h>:              +Add <asm/page.h>
 - SPARC:  <asm/timer_64.h>:        +Add <uapi/asm/asi.h>
 - SPARC:  <asm/vvar.h>:            +Add <asm/processor.h>, <asm/barrier.h>
                                    -Remove <linux/seqlock.h>
 - X86:    <asm/fixmap.h>:          +Add <asm/pgtable_types.h>
                                    -Remove <asm/acpi.h>

There's also a bunch of parasitic header dependency fallout in .c files, not listed
separately.

[ mingo: Extended the changelog, split up & fixed the original patch. ]

Co-developed-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200804133438.GK2674@hirez.programming.kicks-ass.net
2020-08-06 16:13:13 +02:00
Ingo Molnar
13c01139b1 x86/headers: Remove APIC headers from <asm/smp.h>
The APIC headers are relatively complex and bring in additional
header dependencies - while smp.h is a relatively simple header
included from high level headers.

Remove the dependency and add in the missing #include's in .c
files where they gained it indirectly before.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-08-06 16:13:09 +02:00
Linus Torvalds
5183a617ec The biggest change is the removal of SGI UV1 support, which allowed the
removal of the legacy EFI old_mmap code as well.
 
 This removes quite a bunch of old code & quirks.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl8oYBoRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1izAg//adabmMctmGxpWETEj1UmZdgJcKu5l+H3
 SiC6hoB46NB5J6E2whhOXuJGqgBGY/qx9Qy3LMFwgXNki85TQEEFTcGfd+RcXTXz
 B+oASL5Z+KOJ+QQqFf8W/dYye4ZXXxQN8gFSl9zaY3iXSY28JLMFrwkckkIrBwLZ
 9DmB7NaxeMQ22IEan641sK/YafGIriHvMpe34s2/p2WpemyncG1y1tZcPxZ/9K9U
 7OZ+4O6gr9n3FloD9yp3z1jFXiU4gSWEgbZvy3bD/AjbXs+oaaaIFDZ6yM/C3JYo
 Qw0PPTBSXkKTmG3UaK4vml4NPROLqyAzv8k5UTJmYQiVlSkyAVW80mzMQ9qDAhfK
 ny2+XE7ahjsFiNLFbboSOqLuTyHogkzcFCBrwW1wCTnPDEX5QCQF9t0agrY4pqqZ
 8lSMLj0BmkAPoi5XHq4WVEJL+0GyYFWmJwdtUAzd2Hj5w6mse0TiL7PtqAUfdFH0
 E5nBnY5RvqtWwfEungGl+zFmSRvH8UzOUydMMuG8koNlGaNZF4TwNVrMukbVqArX
 34Zt1bQrLUY/T4xNM+Zx1BWOFd5D3BUuI7xVhE5zMg20A8PHtPzMByoomoQrgGX2
 MDVUoqnPxeXWUil06+ND6koOtaL+xJy0JsUIFs7h9SGsF+JxsrYKbbLfjjDvtpuK
 sdb+fYzmgBU=
 =NChe
 -----END PGP SIGNATURE-----

Merge tag 'x86-platform-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 platform updates from Ingo Molnar:
 "The biggest change is the removal of SGI UV1 support, which allowed
  the removal of the legacy EFI old_mmap code as well.

  This removes quite a bunch of old code & quirks"

* tag 'x86-platform-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/efi: Remove unused EFI_UV1_MEMMAP code
  x86/platform/uv: Remove uv bios and efi code related to EFI_UV1_MEMMAP
  x86/efi: Remove references to no-longer-used efi_have_uv1_memmap()
  x86/efi: Delete SGI UV1 detection.
  x86/platform/uv: Remove efi=old_map command line option
  x86/platform/uv: Remove vestigial mention of UV1 platform from bios header
  x86/platform/uv: Remove support for UV1 platform from uv
  x86/platform/uv: Remove support for uv1 platform from uv_hub
  x86/platform/uv: Remove support for UV1 platform from uv_bau
  x86/platform/uv: Remove support for UV1 platform from uv_mmrs
  x86/platform/uv: Remove support for UV1 platform from x2apic_uv_x
  x86/platform/uv: Remove support for UV1 platform from uv_tlb
  x86/platform/uv: Remove support for UV1 platform from uv_time
2020-08-03 17:38:43 -07:00
Thomas Gleixner
f0c7baca18 genirq/affinity: Make affinity setting if activated opt-in
John reported that on a RK3288 system the perf per CPU interrupts are all
affine to CPU0 and provided the analysis:

 "It looks like what happens is that because the interrupts are not per-CPU
  in the hardware, armpmu_request_irq() calls irq_force_affinity() while
  the interrupt is deactivated and then request_irq() with IRQF_PERCPU |
  IRQF_NOBALANCING.  

  Now when irq_startup() runs with IRQ_STARTUP_NORMAL, it calls
  irq_setup_affinity() which returns early because IRQF_PERCPU and
  IRQF_NOBALANCING are set, leaving the interrupt on its original CPU."

This was broken by the recent commit which blocked interrupt affinity
setting in hardware before activation of the interrupt. While this works in
general, it does not work for this particular case. As contrary to the
initial analysis not all interrupt chip drivers implement an activate
callback, the safe cure is to make the deferred interrupt affinity setting
at activation time opt-in.

Implement the necessary core logic and make the two irqchip implementations
for which this is required opt-in. In hindsight this would have been the
right thing to do, but ...

Fixes: baedb87d1b ("genirq/affinity: Handle affinity setting on inactive interrupts correctly")
Reported-by: John Keeping <john@metanate.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Marc Zyngier <maz@kernel.org>
Acked-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/87blk4tzgm.fsf@nanos.tec.linutronix.de
2020-07-27 16:20:40 +02:00
Jon Derrick
ec0160891e irqdomain/treewide: Free firmware node after domain removal
Commit 711419e504 ("irqdomain: Add the missing assignment of
domain->fwnode for named fwnode") unintentionally caused a dangling pointer
page fault issue on firmware nodes that were freed after IRQ domain
allocation. Commit e3beca48a4 fixed that dangling pointer issue by only
freeing the firmware node after an IRQ domain allocation failure. That fix
no longer frees the firmware node immediately, but leaves the firmware node
allocated after the domain is removed.

The firmware node must be kept around through irq_domain_remove, but should be
freed it afterwards.

Add the missing free operations after domain removal where where appropriate.

Fixes: e3beca48a4 ("irqdomain/treewide: Keep firmware node unconditionally allocated")
Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>	# drivers/pci
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1595363169-7157-1-git-send-email-jonathan.derrick@intel.com
2020-07-23 00:08:52 +02:00
Thomas Gleixner
baedb87d1b genirq/affinity: Handle affinity setting on inactive interrupts correctly
Setting interrupt affinity on inactive interrupts is inconsistent when
hierarchical irq domains are enabled. The core code should just store the
affinity and not call into the irq chip driver for inactive interrupts
because the chip drivers may not be in a state to handle such requests.

X86 has a hacky workaround for that but all other irq chips have not which
causes problems e.g. on GIC V3 ITS.

Instead of adding more ugly hacks all over the place, solve the problem in
the core code. If the affinity is set on an inactive interrupt then:

    - Store it in the irq descriptors affinity mask
    - Update the effective affinity to reflect that so user space has
      a consistent view
    - Don't call into the irq chip driver

This is the core equivalent of the X86 workaround and works correctly
because the affinity setting is established in the irq chip when the
interrupt is activated later on.

Note, that this is only effective when hierarchical irq domains are enabled
by the architecture. Doing it unconditionally would break legacy irq chip
implementations.

For hierarchial irq domains this works correctly as none of the drivers can
have a dependency on affinity setting in inactive state by design.

Remove the X86 workaround as it is not longer required.

Fixes: 02edee152d ("x86/apic/vector: Ignore set_affinity call for inactive interrupts")
Reported-by: Ali Saidi <alisaidi@amazon.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Ali Saidi <alisaidi@amazon.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200529015501.15771-1-alisaidi@amazon.com
Link: https://lkml.kernel.org/r/877dv2rv25.fsf@nanos.tec.linutronix.de
2020-07-17 23:30:43 +02:00
steve.wahl@hpe.com
a6b740f173 x86/platform/uv: Remove support for UV1 platform from x2apic_uv_x
UV1 is not longer supported.

Signed-off-by: Steve Wahl <steve.wahl@hpe.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200713212954.846026992@hpe.com
2020-07-17 16:47:44 +02:00
Thomas Gleixner
e3beca48a4 irqdomain/treewide: Keep firmware node unconditionally allocated
Quite some non OF/ACPI users of irqdomains allocate firmware nodes of type
IRQCHIP_FWNODE_NAMED or IRQCHIP_FWNODE_NAMED_ID and free them right after
creating the irqdomain. The only purpose of these FW nodes is to convey
name information. When this was introduced the core code did not store the
pointer to the node in the irqdomain. A recent change stored the firmware
node pointer in irqdomain for other reasons and missed to notice that the
usage sites which do the alloc_fwnode/create_domain/free_fwnode sequence
are broken by this. Storing a dangling pointer is dangerous itself, but in
case that the domain is destroyed later on this leads to a double free.

Remove the freeing of the firmware node after creating the irqdomain from
all affected call sites to cure this.

Fixes: 711419e504 ("irqdomain: Add the missing assignment of domain->fwnode for named fwnode")
Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/873661qakd.fsf@nanos.tec.linutronix.de
2020-07-14 17:44:42 +02:00
Linus Torvalds
076f14be7f The X86 entry, exception and interrupt code rework
This all started about 6 month ago with the attempt to move the Posix CPU
 timer heavy lifting out of the timer interrupt code and just have lockless
 quick checks in that code path. Trivial 5 patches.
 
 This unearthed an inconsistency in the KVM handling of task work and the
 review requested to move all of this into generic code so other
 architectures can share.
 
 Valid request and solved with another 25 patches but those unearthed
 inconsistencies vs. RCU and instrumentation.
 
 Digging into this made it obvious that there are quite some inconsistencies
 vs. instrumentation in general. The int3 text poke handling in particular
 was completely unprotected and with the batched update of trace events even
 more likely to expose to endless int3 recursion.
 
 In parallel the RCU implications of instrumenting fragile entry code came
 up in several discussions.
 
 The conclusion of the X86 maintainer team was to go all the way and make
 the protection against any form of instrumentation of fragile and dangerous
 code pathes enforcable and verifiable by tooling.
 
 A first batch of preparatory work hit mainline with commit d5f744f9a2.
 
 The (almost) full solution introduced a new code section '.noinstr.text'
 into which all code which needs to be protected from instrumentation of all
 sorts goes into. Any call into instrumentable code out of this section has
 to be annotated. objtool has support to validate this. Kprobes now excludes
 this section fully which also prevents BPF from fiddling with it and all
 'noinstr' annotated functions also keep ftrace off. The section, kprobes
 and objtool changes are already merged.
 
 The major changes coming with this are:
 
     - Preparatory cleanups
 
     - Annotating of relevant functions to move them into the noinstr.text
       section or enforcing inlining by marking them __always_inline so the
       compiler cannot misplace or instrument them.
 
     - Splitting and simplifying the idtentry macro maze so that it is now
       clearly separated into simple exception entries and the more
       interesting ones which use interrupt stacks and have the paranoid
       handling vs. CR3 and GS.
 
     - Move quite some of the low level ASM functionality into C code:
 
        - enter_from and exit to user space handling. The ASM code now calls
          into C after doing the really necessary ASM handling and the return
 	 path goes back out without bells and whistels in ASM.
 
        - exception entry/exit got the equivivalent treatment
 
        - move all IRQ tracepoints from ASM to C so they can be placed as
          appropriate which is especially important for the int3 recursion
          issue.
 
     - Consolidate the declaration and definition of entry points between 32
       and 64 bit. They share a common header and macros now.
 
     - Remove the extra device interrupt entry maze and just use the regular
       exception entry code.
 
     - All ASM entry points except NMI are now generated from the shared header
       file and the corresponding macros in the 32 and 64 bit entry ASM.
 
     - The C code entry points are consolidated as well with the help of
       DEFINE_IDTENTRY*() macros. This allows to ensure at one central point
       that all corresponding entry points share the same semantics. The
       actual function body for most entry points is in an instrumentable
       and sane state.
 
       There are special macros for the more sensitive entry points,
       e.g. INT3 and of course the nasty paranoid #NMI, #MCE, #DB and #DF.
       They allow to put the whole entry instrumentation and RCU handling
       into safe places instead of the previous pray that it is correct
       approach.
 
     - The INT3 text poke handling is now completely isolated and the
       recursion issue banned. Aside of the entry rework this required other
       isolation work, e.g. the ability to force inline bsearch.
 
     - Prevent #DB on fragile entry code, entry relevant memory and disable
       it on NMI, #MC entry, which allowed to get rid of the nested #DB IST
       stack shifting hackery.
 
     - A few other cleanups and enhancements which have been made possible
       through this and already merged changes, e.g. consolidating and
       further restricting the IDT code so the IDT table becomes RO after
       init which removes yet another popular attack vector
 
     - About 680 lines of ASM maze are gone.
 
 There are a few open issues:
 
    - An escape out of the noinstr section in the MCE handler which needs
      some more thought but under the aspect that MCE is a complete
      trainwreck by design and the propability to survive it is low, this was
      not high on the priority list.
 
    - Paravirtualization
 
      When PV is enabled then objtool complains about a bunch of indirect
      calls out of the noinstr section. There are a few straight forward
      ways to fix this, but the other issues vs. general correctness were
      more pressing than parawitz.
 
    - KVM
 
      KVM is inconsistent as well. Patches have been posted, but they have
      not yet been commented on or picked up by the KVM folks.
 
    - IDLE
 
      Pretty much the same problems can be found in the low level idle code
      especially the parts where RCU stopped watching. This was beyond the
      scope of the more obvious and exposable problems and is on the todo
      list.
 
 The lesson learned from this brain melting exercise to morph the evolved
 code base into something which can be validated and understood is that once
 again the violation of the most important engineering principle
 "correctness first" has caused quite a few people to spend valuable time on
 problems which could have been avoided in the first place. The "features
 first" tinkering mindset really has to stop.
 
 With that I want to say thanks to everyone involved in contributing to this
 effort. Special thanks go to the following people (alphabetical order):
 
    Alexandre Chartre
    Andy Lutomirski
    Borislav Petkov
    Brian Gerst
    Frederic Weisbecker
    Josh Poimboeuf
    Juergen Gross
    Lai Jiangshan
    Macro Elver
    Paolo Bonzini
    Paul McKenney
    Peter Zijlstra
    Vitaly Kuznetsov
    Will Deacon
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl7j510THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoU2WD/4refvaNm08fG7aiVYem3JJzr0+Pq5O
 /opwnI/1D973ApApj5W/Nd53sN5tVqOiXncSKgywRBWZxRCAGjVYypl9rjpvXu4l
 HlMjhEKBmWkDryxxrM98Vr7hl3hnId5laR56oFfH+G4LUsItaV6Uak/HfXZ4Mq1k
 iYVbEtl2CN+KJjvSgZ6Y1l853Ab5mmGvmeGNHHWCj8ZyjF3cOLoelDTQNnsb0wXM
 crKXBcXJSsCWKYyJ5PTvB82crQCET7Su+LgwK06w/ZbW1//2hVIjSCiN5o/V+aRJ
 06BZNMj8v9tfglkN8LEQvRIjTlnEQ2sq3GxbrVtA53zxkzbBCBJQ96w8yYzQX0ux
 yhqQ/aIZJ1wTYEjJzSkftwLNMRHpaOUnKvJndXRKAYi+eGI7syF61qcZSYGKuAQ/
 bK3b/CzU6QWr1235oTADxh4isEwxA0Pg5wtJCfDDOG0MJ9ALMSOGUkhoiz5EqpkU
 mzFAwfG/Uj7hRjlkms7Yj2OjZfnU7iypj63GgpXghLjr5ksRFKEOMw8e1GXltVHs
 zzwghUjqp2EPq0VOOQn3lp9lol5Prc3xfFHczKpO+CJW6Rpa4YVdqJmejBqJy/on
 Hh/T/ST3wa2qBeAw89vZIeWiUJZZCsQ0f//+2hAbzJY45Y6DuR9vbTAPb9agRgOM
 xg+YaCfpQqFc1A==
 =llba
 -----END PGP SIGNATURE-----

Merge tag 'x86-entry-2020-06-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 entry updates from Thomas Gleixner:
 "The x86 entry, exception and interrupt code rework

  This all started about 6 month ago with the attempt to move the Posix
  CPU timer heavy lifting out of the timer interrupt code and just have
  lockless quick checks in that code path. Trivial 5 patches.

  This unearthed an inconsistency in the KVM handling of task work and
  the review requested to move all of this into generic code so other
  architectures can share.

  Valid request and solved with another 25 patches but those unearthed
  inconsistencies vs. RCU and instrumentation.

  Digging into this made it obvious that there are quite some
  inconsistencies vs. instrumentation in general. The int3 text poke
  handling in particular was completely unprotected and with the batched
  update of trace events even more likely to expose to endless int3
  recursion.

  In parallel the RCU implications of instrumenting fragile entry code
  came up in several discussions.

  The conclusion of the x86 maintainer team was to go all the way and
  make the protection against any form of instrumentation of fragile and
  dangerous code pathes enforcable and verifiable by tooling.

  A first batch of preparatory work hit mainline with commit
  d5f744f9a2 ("Pull x86 entry code updates from Thomas Gleixner")

  That (almost) full solution introduced a new code section
  '.noinstr.text' into which all code which needs to be protected from
  instrumentation of all sorts goes into. Any call into instrumentable
  code out of this section has to be annotated. objtool has support to
  validate this.

  Kprobes now excludes this section fully which also prevents BPF from
  fiddling with it and all 'noinstr' annotated functions also keep
  ftrace off. The section, kprobes and objtool changes are already
  merged.

  The major changes coming with this are:

    - Preparatory cleanups

    - Annotating of relevant functions to move them into the
      noinstr.text section or enforcing inlining by marking them
      __always_inline so the compiler cannot misplace or instrument
      them.

    - Splitting and simplifying the idtentry macro maze so that it is
      now clearly separated into simple exception entries and the more
      interesting ones which use interrupt stacks and have the paranoid
      handling vs. CR3 and GS.

    - Move quite some of the low level ASM functionality into C code:

       - enter_from and exit to user space handling. The ASM code now
         calls into C after doing the really necessary ASM handling and
         the return path goes back out without bells and whistels in
         ASM.

       - exception entry/exit got the equivivalent treatment

       - move all IRQ tracepoints from ASM to C so they can be placed as
         appropriate which is especially important for the int3
         recursion issue.

    - Consolidate the declaration and definition of entry points between
      32 and 64 bit. They share a common header and macros now.

    - Remove the extra device interrupt entry maze and just use the
      regular exception entry code.

    - All ASM entry points except NMI are now generated from the shared
      header file and the corresponding macros in the 32 and 64 bit
      entry ASM.

    - The C code entry points are consolidated as well with the help of
      DEFINE_IDTENTRY*() macros. This allows to ensure at one central
      point that all corresponding entry points share the same
      semantics. The actual function body for most entry points is in an
      instrumentable and sane state.

      There are special macros for the more sensitive entry points, e.g.
      INT3 and of course the nasty paranoid #NMI, #MCE, #DB and #DF.
      They allow to put the whole entry instrumentation and RCU handling
      into safe places instead of the previous pray that it is correct
      approach.

    - The INT3 text poke handling is now completely isolated and the
      recursion issue banned. Aside of the entry rework this required
      other isolation work, e.g. the ability to force inline bsearch.

    - Prevent #DB on fragile entry code, entry relevant memory and
      disable it on NMI, #MC entry, which allowed to get rid of the
      nested #DB IST stack shifting hackery.

    - A few other cleanups and enhancements which have been made
      possible through this and already merged changes, e.g.
      consolidating and further restricting the IDT code so the IDT
      table becomes RO after init which removes yet another popular
      attack vector

    - About 680 lines of ASM maze are gone.

  There are a few open issues:

   - An escape out of the noinstr section in the MCE handler which needs
     some more thought but under the aspect that MCE is a complete
     trainwreck by design and the propability to survive it is low, this
     was not high on the priority list.

   - Paravirtualization

     When PV is enabled then objtool complains about a bunch of indirect
     calls out of the noinstr section. There are a few straight forward
     ways to fix this, but the other issues vs. general correctness were
     more pressing than parawitz.

   - KVM

     KVM is inconsistent as well. Patches have been posted, but they
     have not yet been commented on or picked up by the KVM folks.

   - IDLE

     Pretty much the same problems can be found in the low level idle
     code especially the parts where RCU stopped watching. This was
     beyond the scope of the more obvious and exposable problems and is
     on the todo list.

  The lesson learned from this brain melting exercise to morph the
  evolved code base into something which can be validated and understood
  is that once again the violation of the most important engineering
  principle "correctness first" has caused quite a few people to spend
  valuable time on problems which could have been avoided in the first
  place. The "features first" tinkering mindset really has to stop.

  With that I want to say thanks to everyone involved in contributing to
  this effort. Special thanks go to the following people (alphabetical
  order): Alexandre Chartre, Andy Lutomirski, Borislav Petkov, Brian
  Gerst, Frederic Weisbecker, Josh Poimboeuf, Juergen Gross, Lai
  Jiangshan, Macro Elver, Paolo Bonzin,i Paul McKenney, Peter Zijlstra,
  Vitaly Kuznetsov, and Will Deacon"

* tag 'x86-entry-2020-06-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (142 commits)
  x86/entry: Force rcu_irq_enter() when in idle task
  x86/entry: Make NMI use IDTENTRY_RAW
  x86/entry: Treat BUG/WARN as NMI-like entries
  x86/entry: Unbreak __irqentry_text_start/end magic
  x86/entry: __always_inline CR2 for noinstr
  lockdep: __always_inline more for noinstr
  x86/entry: Re-order #DB handler to avoid *SAN instrumentation
  x86/entry: __always_inline arch_atomic_* for noinstr
  x86/entry: __always_inline irqflags for noinstr
  x86/entry: __always_inline debugreg for noinstr
  x86/idt: Consolidate idt functionality
  x86/idt: Cleanup trap_init()
  x86/idt: Use proper constants for table size
  x86/idt: Add comments about early #PF handling
  x86/idt: Mark init only functions __init
  x86/entry: Rename trace_hardirqs_off_prepare()
  x86/entry: Clarify irq_{enter,exit}_rcu()
  x86/entry: Remove DBn stacks
  x86/entry: Remove debug IDT frobbing
  x86/entry: Optimize local_db_save() for virt
  ...
2020-06-13 10:05:47 -07:00
Linus Torvalds
6a45a65888 A set of fixes and updates for x86:
- Unbreak paravirt VDSO clocks. While the VDSO code was moved into lib
     for sharing a subtle check for the validity of paravirt clocks got
     replaced. While the replacement works perfectly fine for bare metal as
     the update of the VDSO clock mode is synchronous, it fails for paravirt
     clocks because the hypervisor can invalidate them asynchronous. Bring
     it back as an optional function so it does not inflict this on
     architectures which are free of PV damage.
 
   - Fix the jiffies to jiffies64 mapping on 64bit so it does not trigger
     an ODR violation on newer compilers
 
   - Three fixes for the SSBD and *IB* speculation mitigation maze to ensure
     consistency, not disabling of some *IB* variants wrongly and to prevent
     a rogue cross process shutdown of SSBD. All marked for stable.
 
   - Add yet more CPU models to the splitlock detection capable list !@#%$!
 
   - Bring the pr_info() back which tells that TSC deadline timer is enabled.
 
   - Reboot quirk for MacBook6,1
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl7ie1oTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYofXrEACDD0mNBU2c4vQiR+n4d41PqW1p15DM
 /wG7dYqYt2RdR6qOAspmNL5ilUP+L+eoT/86U9y0g4j3FtTREqyy6mpWE4MQzqaQ
 eKWVoeYt7l9QbR1kP4eks1CN94OyVBUPo3P78UPruWMB11iyKjyrkEdsDmRSLOdr
 6doqMFGHgowrQRwsLPFUt7b2lls6ssOSYgM/ChHi2Iga431ZuYYcRe2mNVsvqx3n
 0N7QZlJ/LivXdCmdpe3viMBsDaomiXAloKUo+HqgrCLYFXefLtfOq09U7FpddYqH
 ztxbGW/7gFn2HEbmdeaiufux263MdHtnjvdPhQZKHuyQmZzzxDNBFgOILSrBJb5y
 qLYJGhMa0sEwMBM9MMItomNgZnOITQ3WGYAdSCg3mG3jK4EXzr6aQm/Qz5SI+Cte
 bQKB2dgR53Gw/1uc7F5qMGQ2NzeUbKycT0ZbF3vkUPVh1kdU3juIntsovv2lFeBe
 Rog/rZliT1xdHrGAHRbubb2/3v66CSodMoYz0eQtr241Oz0LGwnyFqLN3qcZVLDt
 OtxHQ3bbaxevDEetJXfSh3CfHKNYMToAcszmGDse3MJxC7DL5AA51OegMa/GYOX6
 r5J99MUsEzZQoQYyXFf1MjwgxH4CQK1xBBUXYaVG65AcmhT21YbNWnCbxgf7hW+V
 hqaaUSig4V3NLw==
 =VlBk
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2020-06-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull more x86 updates from Thomas Gleixner:
 "A set of fixes and updates for x86:

   - Unbreak paravirt VDSO clocks.

     While the VDSO code was moved into lib for sharing a subtle check
     for the validity of paravirt clocks got replaced. While the
     replacement works perfectly fine for bare metal as the update of
     the VDSO clock mode is synchronous, it fails for paravirt clocks
     because the hypervisor can invalidate them asynchronously.

     Bring it back as an optional function so it does not inflict this
     on architectures which are free of PV damage.

   - Fix the jiffies to jiffies64 mapping on 64bit so it does not
     trigger an ODR violation on newer compilers

   - Three fixes for the SSBD and *IB* speculation mitigation maze to
     ensure consistency, not disabling of some *IB* variants wrongly and
     to prevent a rogue cross process shutdown of SSBD. All marked for
     stable.

   - Add yet more CPU models to the splitlock detection capable list
     !@#%$!

   - Bring the pr_info() back which tells that TSC deadline timer is
     enabled.

   - Reboot quirk for MacBook6,1"

* tag 'x86-urgent-2020-06-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/vdso: Unbreak paravirt VDSO clocks
  lib/vdso: Provide sanity check for cycles (again)
  clocksource: Remove obsolete ifdef
  x86_64: Fix jiffies ODR violation
  x86/speculation: PR_SPEC_FORCE_DISABLE enforcement for indirect branches.
  x86/speculation: Prevent rogue cross-process SSBD shutdown
  x86/speculation: Avoid force-disabling IBPB based on STIBP and enhanced IBRS.
  x86/cpu: Add Sapphire Rapids CPU model number
  x86/split_lock: Add Icelake microserver and Tigerlake CPU models
  x86/apic: Make TSC deadline timer detection message visible
  x86/reboot/quirks: Add MacBook6,1 reboot quirk
2020-06-11 15:54:31 -07:00
Thomas Gleixner
582f919123 x86/entry: Convert SMP system vectors to IDTENTRY_SYSVEC
Convert SMP system vectors to IDTENTRY_SYSVEC:

  - Implement the C entry point with DEFINE_IDTENTRY_SYSVEC
  - Emit the ASM stub with DECLARE_IDTENTRY_SYSVEC
  - Remove the ASM idtentries in 64-bit
  - Remove the BUILD_INTERRUPT entries in 32-bit
  - Remove the old prototypes

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202119.372234635@linutronix.de
2020-06-11 15:15:14 +02:00
Thomas Gleixner
db0338eec5 x86/entry: Convert APIC interrupts to IDTENTRY_SYSVEC
Convert APIC interrupts to IDTENTRY_SYSVEC:

  - Implement the C entry point with DEFINE_IDTENTRY_SYSVEC
  - Emit the ASM stub with DECLARE_IDTENTRY_SYSVEC
  - Remove the ASM idtentries in 64-bit
  - Remove the BUILD_INTERRUPT entries in 32-bit
  - Remove the old prototypes

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202119.280728850@linutronix.de
2020-06-11 15:15:13 +02:00
Thomas Gleixner
fa5e5c4092 x86/entry: Use idtentry for interrupts
Replace the extra interrupt handling code and reuse the existing idtentry
machinery. This moves the irq stack switching on 64-bit from ASM to C code;
32-bit already does the stack switching in C.

This requires to remove HAVE_IRQ_EXIT_ON_IRQ_STACK as the stack switch is
not longer in the low level entry code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202119.078690991@linutronix.de
2020-06-11 15:15:12 +02:00
Thomas Gleixner
633260fa14 x86/irq: Convey vector as argument and not in ptregs
Device interrupts which go through do_IRQ() or the spurious interrupt
handler have their separate entry code on 64 bit for no good reason.

Both 32 and 64 bit transport the vector number through ORIG_[RE]AX in
pt_regs. Further the vector number is forced to fit into an u8 and is
complemented and offset by 0x80 so it's in the signed character
range. Otherwise GAS would expand the pushq to a 5 byte instruction for any
vector > 0x7F.

Treat the vector number like an error code and hand it to the C function as
argument. This allows to get rid of the extra entry code in a later step.

Simplify the error code push magic by implementing the pushq imm8 via a
'.byte 0x6a, vector' sequence so GAS is not able to screw it up. As the
pushq imm8 is sign extending the resulting error code needs to be truncated
to 8 bits in C code.

Originally-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202118.796915981@linutronix.de
2020-06-11 15:15:11 +02:00
Mike Rapoport
65fddcfca8 mm: reorder includes after introduction of linux/pgtable.h
The replacement of <asm/pgrable.h> with <linux/pgtable.h> made the include
of the latter in the middle of asm includes.  Fix this up with the aid of
the below script and manual adjustments here and there.

	import sys
	import re

	if len(sys.argv) is not 3:
	    print "USAGE: %s <file> <header>" % (sys.argv[0])
	    sys.exit(1)

	hdr_to_move="#include <linux/%s>" % sys.argv[2]
	moved = False
	in_hdrs = False

	with open(sys.argv[1], "r") as f:
	    lines = f.readlines()
	    for _line in lines:
		line = _line.rstrip('
')
		if line == hdr_to_move:
		    continue
		if line.startswith("#include <linux/"):
		    in_hdrs = True
		elif not moved and in_hdrs:
		    moved = True
		    print hdr_to_move
		print line

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200514170327.31389-4-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:13 -07:00
Mike Rapoport
ca5999fde0 mm: introduce include/linux/pgtable.h
The include/linux/pgtable.h is going to be the home of generic page table
manipulation functions.

Start with moving asm-generic/pgtable.h to include/linux/pgtable.h and
make the latter include asm/pgtable.h.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200514170327.31389-3-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:13 -07:00
Linus Torvalds
88bc1de11c This tree cleans up various aspects of the UV platform support code,
it removes unnecessary functions and cleans up the rest.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl7VNO4RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1i61RAAogLRVi4ga4vmTk5SqUqtR4pupbHJv5IM
 IjkQN0HZ3+Oi6kRxwuOQ9xOOzQWm8GntkZeyN5FA73H7x+bYdU12MIKKTEDcW3xp
 Mg9FtzfeL0V4YmNkmlnIXycyYA3nBdSxnI/OL/58J9CLT15qXYkWjyvkbI2aJ3qL
 U8xM5cTTvhoARjd43o0eAfekTg0XdUAsgvO0vOM5+I1HrQP8SR3ZIFaMSR+MfAQx
 Nbz/UVUSDJ8BNzmS/CfFLFm0F2dkphlLC0r6eAOFZAYSIax0bRVklxV9qdScEQMK
 bkVKXGanCzVTBVM1HXDycLJaILlqcS18tK+VqNIAR5x2BXmaSG8jqwCW4NM0tcaN
 c5zemNsqnAH/VzxeFjE2BcDQnA1nkgj75Vm9O81HMQfyqR16M5pBzRXY/qBqslya
 vX5wLoD962BiVtbELqW6v+Ot29xMYlCLLlTbLHaWQraJS3TjuAvL0/sOFdWgs63F
 N7a+BLvikfYoKCS8IxW87BFBysy9nhv/4UwdaX5RpIQ1wgx/EJLDaowrM+L6Dzmw
 bhQ3AgRZGNZCBDm3uGU/LigTTxN93h5KqKnuUKv3H+tNvEKxPKEAqPyvL8fO8G0U
 BTJiM/XRIzQrkrmwCKqON1iKRjKB2fKklxiq4REIoSqKfeiQ3SVBIHENFnH96Ekf
 G9qptwFEZYI=
 =L8Vy
 -----END PGP SIGNATURE-----

Merge tag 'x86-platform-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 platform updates from Ingo Molnar:
 "This tree cleans up various aspects of the UV platform support code,
  it removes unnecessary functions and cleans up the rest"

* tag 'x86-platform-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/apic/uv: Remove code for unused distributed GRU mode
  x86/platform/uv: Remove the unused _uv_cpu_blade_processor_id() macro
  x86/platform/uv: Unexport uv_apicid_hibits
  x86/platform/uv: Remove _uv_hub_info_check()
  x86/platform/uv: Simplify uv_send_IPI_one()
  x86/platform/uv: Mark uv_min_hub_revision_id static
  x86/platform/uv: Mark is_uv_hubless() static
  x86/platform/uv: Remove the UV*_HUB_IS_SUPPORTED macros
  x86/platform/uv: Unexport symbols only used by x2apic_uv_x.c
  x86/platform/uv: Unexport sn_coherency_id
  x86/platform/uv: Remove the uv_partition_coherence_id() macro
  x86/platform/uv: Mark uv_bios_call() and uv_bios_call_irqsave() static
2020-06-01 14:48:20 -07:00
Linus Torvalds
eff5ddadab Misc updates:
- Extend the x86 family/model macros with a steppings dimension,
    because x86 life isn't complex enough and Intel uses steppings to
    differentiate between different CPUs. :-/
 
  - Convert the TSC deadline timer quirks to the steppings macros.
 
  - Clean up asm mnemonics.
 
  - Fix the handling of an AMD erratum, or in other words, fix a kernel erratum.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl7VL2wRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1hxYQ//dic//qQ+4GOL1wP1Qj8EiGOzaWdynoia
 oDi7Si1I9vy58iCCRgkQKmxEwfnHcM5+NC6/091S4BD2IE6o+iD1YhPsGZK8DT4Y
 FmeD8pgtx5LMJFMBe6KRyek1s0JblP6v0Q0BwUk7YtV6k0oSP+f/2n5BGj2+P7YH
 3Iw438M5JhIrzVp3PnCgJoZkSm9iRnZqbBtR8nd2SO+vx8M75cX27LL6fdaCypRj
 wH9w6+J2NhAZStmEv54LKOdO5RAPJjvatbTZFMEFdceAGFEbHPJIees7paoC+DTP
 3BuhzF/9ghDNKly6Zz3PtyNNDP1vglZ1W9dJkCfTXUWlZKbQV94Yk+JbP5mndxqn
 +f3eD/dInofHiCeAh1Sfj3BCGdOSjgFMBB57CKkCy4LehXwJ9C2eBcbxd4XMfEkd
 h0EywZrp1L10AxDHtq5x82xf1fwfTDyvlYmJrBshXfiitaySn+mPVJMuj3wvqJSP
 WKbJS4HfkekIaf9WoUA+Ay6FJdY7nNirViRrQEZVmDPTV0EDfcaNM5p6Ttkja3Ph
 VoVa8Ms8FRqTfh6xCfckYR+vI44U+AFNLM6YFyetGYc0yVXNzg3vLy2DbqLRolWy
 t1upDdNf1TMJg4BaMrBzZgDg/uI2BM3jeOj69U0cboO2JhJjxjl3qPeiYDKD50MK
 Z1Nho933894=
 =QKjn
 -----END PGP SIGNATURE-----

Merge tag 'x86-cpu-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cpu updates from Ingo Molnar:
 "Misc updates:

   - Extend the x86 family/model macros with a steppings dimension,
     because x86 life isn't complex enough and Intel uses steppings to
     differentiate between different CPUs. :-/

   - Convert the TSC deadline timer quirks to the steppings macros.

   - Clean up asm mnemonics.

   - Fix the handling of an AMD erratum, or in other words, fix a kernel
     erratum"

* tag 'x86-cpu-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Use RDRAND and RDSEED mnemonics in archrandom.h
  x86/cpu: Use INVPCID mnemonic in invpcid.h
  x86/cpu/amd: Make erratum #1054 a legacy erratum
  x86/apic: Convert the TSC deadline timer matching to steppings macro
  x86/cpu: Add a X86_MATCH_INTEL_FAM6_MODEL_STEPPINGS() macro
  x86/cpu: Add a steppings field to struct x86_cpu_id
2020-06-01 13:57:51 -07:00
Linus Torvalds
17e0a7cb6a Misc cleanups, with an emphasis on removing obsolete/dead code.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl7VLcQRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1iFnhAArGBqco3C2RPQugv7UDDbKEaMvxOGrc5B
 kwnyOS/k/yeIkfhT9u11oBuLcaj/Zgw8YCjFyRfaNsorRqnytLyZzZ6PvdCCE3YU
 X3DVYgulcdAQnM4bS2e3Kt9ciJvFxB27XNm0AfuyLMUxMqCD+iIO4gJ6TuQNBYy3
 dfUMfB1R9OUDW13GCrASe+p1Dw76uaqVngdFWJhnC8Rm49E6gFXq7CLQp5Cka81I
 KZeJ8I6ug9p3gqhOIXdi+S6g5CM5jf86Wkk7dOHwHFH7CceFb3FIz7z0n1je4Wgd
 L5rYX7+PwfNeZ73GIuvEBN+agJH2K0H/KmnlWNWeZHzc+J12MeruSdSMBIkBOEpn
 iSbYAOmDpQLzBjTdZjC8bDqTZf472WrTh4VwN9NxHLucjdC+IqGoTAvnyyEOmZ5o
 R7sv7Q++316CVwRhYVXbzwZcqtiinCDE1EkP5nKTo9z3z0kMF5+ce/k7wn5sgZIk
 zJq3LXtaToiDoDRAPGxcvFPts9MdC0EI1aKTIjaK/n6i2h/SpJfrTKgANWaldYTe
 XJIqlSB43saqf5YAQ3/sY+wnpCRBmmCU+sfKja4C8bH7RuggI3mZS19uhFs0Qctq
 Yx5bIXVSBAIqjJtgzQ0WAAZ5LrCpNNyAzb35ZYefQlGyJlx1URKXVBmxa6S99biU
 KiYX7Dk5uhQ=
 =0ZQd
 -----END PGP SIGNATURE-----

Merge tag 'x86-cleanups-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cleanups from Ingo Molnar:
 "Misc cleanups, with an emphasis on removing obsolete/dead code"

* tag 'x86-cleanups-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/spinlock: Remove obsolete ticket spinlock macros and types
  x86/mm: Drop deprecated DISCONTIGMEM support for 32-bit
  x86/apb_timer: Drop unused declaration and macro
  x86/apb_timer: Drop unused TSC calibration
  x86/io_apic: Remove unused function mp_init_irq_at_boot()
  x86/mm: Stop printing BRK addresses
  x86/audit: Fix a -Wmissing-prototypes warning for ia32_classify_syscall()
  x86/nmi: Remove edac.h include leftover
  mm: Remove MPX leftovers
  x86/mm/mmap: Fix -Wmissing-prototypes warnings
  x86/early_printk: Remove unused includes
  crash_dump: Remove no longer used saved_max_pfn
  x86/smpboot: Remove the last ICPU() macro
2020-06-01 13:47:10 -07:00
YueHaibing
fd52a75ca3 x86/io_apic: Remove unused function mp_init_irq_at_boot()
There are no callers in-tree anymore since

  ef9e56d894 ("x86/ioapic: Remove obsolete post hotplug update")

so remove it.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200508140808.49428-1-yuehaibing@huawei.com
2020-05-26 17:01:20 +02:00
Borislav Petkov
de308d1815 x86/apic: Make TSC deadline timer detection message visible
The commit

  c84cb3735f ("x86/apic: Move TSC deadline timer debug printk")

removed the message which said that the deadline timer was enabled.
It added a pr_debug() message which is issued when deadline timer
validation succeeds.

Well, issued only when CONFIG_DYNAMIC_DEBUG is enabled - otherwise
pr_debug() calls get optimized away if DEBUG is not defined in the
compilation unit.

Therefore, make the above message pr_info() so that it is visible in
dmesg.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200525104218.27018-1-bp@alien8.de
2020-05-26 10:54:18 +02:00
Steve Wahl
33649bf449 x86/apic/uv: Remove code for unused distributed GRU mode
Distributed GRU mode appeared in only one generation of UV hardware,
and no version of the BIOS has shipped with this feature enabled, and
we have no plans to ever change that.  The gru.s3.mode check has
always been and will continue to be false.  So remove this dead code.

Signed-off-by: Steve Wahl <steve.wahl@hpe.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Dimitri Sivanich <sivanich@hpe.com>
Link: https://lkml.kernel.org/r/20200513221123.GJ3240@raspberrypi
2020-05-23 16:19:57 +02:00
Christoph Hellwig
479d6d9045 x86/platform/uv: Unexport uv_apicid_hibits
This variable is not used by modular code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200504171527.2845224-11-hch@lst.de
2020-05-07 15:32:23 +02:00
Christoph Hellwig
fbe1d37866 x86/platform/uv: Remove _uv_hub_info_check()
Neither this functions nor the helpers used to implement it are used
anywhere in the kernel tree.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Not-acked-by:  Dimitri Sivanich <sivanich@hpe.com>
Cc: Russ Anderson <rja@hpe.com>
Link: https://lkml.kernel.org/r/20200504171527.2845224-10-hch@lst.de
2020-05-07 15:32:23 +02:00
Christoph Hellwig
8e77554580 x86/platform/uv: Simplify uv_send_IPI_one()
Merge two helpers only used by uv_send_IPI_one() into the main function.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Not-acked-by:  Dimitri Sivanich <sivanich@hpe.com>
Cc: Russ Anderson <rja@hpe.com>
Link: https://lkml.kernel.org/r/20200504171527.2845224-9-hch@lst.de
2020-05-07 15:32:22 +02:00
Christoph Hellwig
8263b05937 x86/platform/uv: Mark uv_min_hub_revision_id static
This variable is only used inside x2apic_uv_x and not even declared
in a header.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Not-acked-by:  Dimitri Sivanich <sivanich@hpe.com>
Cc: Russ Anderson <rja@hpe.com>
Link: https://lkml.kernel.org/r/20200504171527.2845224-8-hch@lst.de
2020-05-07 15:32:22 +02:00
Christoph Hellwig
e4dd8b8351 x86/platform/uv: Mark is_uv_hubless() static
is_uv_hubless() is only used in x2apic_uv_x.c.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Not-acked-by:  Dimitri Sivanich <sivanich@hpe.com>
Cc: Russ Anderson <rja@hpe.com>
Link: https://lkml.kernel.org/r/20200504171527.2845224-7-hch@lst.de
2020-05-07 15:32:21 +02:00
Borislav Petkov
66abf23883 x86/apic: Convert the TSC deadline timer matching to steppings macro
... and get rid of the function pointers which would spit out the
microcode revision based on the CPU stepping.

Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Mark Gross <mgross.linux.intel.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200506071516.25445-4-bp@alien8.de
2020-05-07 13:50:32 +02:00
Thomas Gleixner
c84cb3735f x86/apic: Move TSC deadline timer debug printk
Leon reported that the printk_once() in __setup_APIC_LVTT() triggers a
lockdep splat due to a lock order violation between hrtimer_base::lock and
console_sem, when the 'once' condition is reset via
/sys/kernel/debug/clear_warn_once after boot.

The initial printk cannot trigger this because that happens during boot
when the local APIC timer is set up on the boot CPU.

Prevent it by moving the printk to a place which is guaranteed to be only
called once during boot.

Mark the deadline timer check related functions and data __init while at
it.

Reported-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/87y2qhoshi.fsf@nanos.tec.linutronix.de
2020-05-01 19:15:41 +02:00
Linus Torvalds
2d385336af Updates for the interrupt subsystem:
Treewide:
 
     - Cleanup of setup_irq() which is not longer required because the
       memory allocator is available early. Most cleanup changes come
       through the various maintainer trees, so the final removal of
       setup_irq() is postponed towards the end of the merge window.
 
   Core:
 
     - Protection against unsafe invocation of interrupt handlers and unsafe
       interrupt injection including a fixup of the offending PCI/AER error
       injection mechanism.
 
       Invoking interrupt handlers from arbitrary contexts, i.e. outside of
       an actual interrupt, can cause inconsistent state on the fragile
       x86 interrupt affinity changing hardware trainwreck.
 
   Drivers:
 
     - Second wave of support for the new ARM GICv4.1
     - Multi-instance support for Xilinx and PLIC interrupt controllers
     - CPU-Hotplug support for PLIC
     - The obligatory new driver for X1000 TCU
     - Enhancements, cleanups and fixes all over the place
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl6B888THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoeMJD/9v8GcI/DSY87Fmo7s4odLFVU0J8zZ6
 7QlYjSPm4yWv4pqn1TEnEF2pKz5X9Euhoh8BmdMKtdXBqlS4Ix9N+pH8ModcxyQo
 aX97zuRUxvqfeeVE+yQRwbbMREj9jj9RW8FRtA39+l5H3uC1GDcc+2aAMIaykQ7+
 8lo/6wBd8ZrZ0gsNf4KjlBwMDYAlQSRWxrff38PQ2XRpGKowdp8JFYZuq5Vp0ljJ
 r2cE75ldmFSfmtuhhVroBRY0GAqW4/8v8/syAN3Q9jOEII60qhA0dqR085B9veWa
 DHSqgLmzyUFFXN7Ntzt/fDirJVsIM4BE9qGu3ftCYHMaPB8hG+xqjbZe9E3D2e/d
 +0Pb3TG8EHVOIwzv1t9+6462qYGkBhmBXtbj6GptPYk2Ai4HZlNaSsa8jUNyHvGz
 WDegdRjt7O5RjqDH/VwrQxW/AEp05f/1egweBXbq9aF6j9nqeOur75c/PdxZxAX5
 WUMtouXP2WN+sMW8k1T5cmVMGWxLGBB0wwG4LC/mXzHnkDiN1+2wEUHmhS8Voi3q
 3HXeYBJeukUYbVvMKRvWVAD330TxFjAyd6pPwCdoNY2ZngJnQWlDD9vbYYX2osoW
 kP+KhIANNBVqdK7NqlLoqcr3SdHn01pQYuVHejNzxb7E6/mmpMlaYDJc/rMPi/eM
 0/rzl8fAj/WyBQ==
 =DZ/G
 -----END PGP SIGNATURE-----

Merge tag 'irq-core-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq updates from Thomas Gleixner:
 "Updates for the interrupt subsystem:

  Treewide:

    - Cleanup of setup_irq() which is not longer required because the
      memory allocator is available early.

      Most cleanup changes come through the various maintainer trees, so
      the final removal of setup_irq() is postponed towards the end of
      the merge window.

  Core:

    - Protection against unsafe invocation of interrupt handlers and
      unsafe interrupt injection including a fixup of the offending
      PCI/AER error injection mechanism.

      Invoking interrupt handlers from arbitrary contexts, i.e. outside
      of an actual interrupt, can cause inconsistent state on the
      fragile x86 interrupt affinity changing hardware trainwreck.

  Drivers:

    - Second wave of support for the new ARM GICv4.1

    - Multi-instance support for Xilinx and PLIC interrupt controllers

    - CPU-Hotplug support for PLIC

    - The obligatory new driver for X1000 TCU

    - Enhancements, cleanups and fixes all over the place"

* tag 'irq-core-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (58 commits)
  unicore32: Replace setup_irq() by request_irq()
  sh: Replace setup_irq() by request_irq()
  hexagon: Replace setup_irq() by request_irq()
  c6x: Replace setup_irq() by request_irq()
  alpha: Replace setup_irq() by request_irq()
  irqchip/gic-v4.1: Eagerly vmap vPEs
  irqchip/gic-v4.1: Add VSGI property setup
  irqchip/gic-v4.1: Add VSGI allocation/teardown
  irqchip/gic-v4.1: Move doorbell management to the GICv4 abstraction layer
  irqchip/gic-v4.1: Plumb set_vcpu_affinity SGI callbacks
  irqchip/gic-v4.1: Plumb get/set_irqchip_state SGI callbacks
  irqchip/gic-v4.1: Plumb mask/unmask SGI callbacks
  irqchip/gic-v4.1: Add initial SGI configuration
  irqchip/gic-v4.1: Plumb skeletal VSGI irqchip
  irqchip/stm32: Retrigger both in eoi and unmask callbacks
  irqchip/gic-v3: Move irq_domain_update_bus_token to after checking for NULL domain
  irqchip/xilinx: Do not call irq_set_default_host()
  irqchip/xilinx: Enable generic irq multi handler
  irqchip/xilinx: Fill error code when irq domain registration fails
  irqchip/xilinx: Add support for multiple instances
  ...
2020-03-30 17:35:14 -07:00
Ingo Molnar
629b3df7ec Merge branch 'x86/cpu' into perf/core, to resolve conflict
Conflicts:
	arch/x86/events/intel/uncore.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-03-25 15:20:44 +01:00
Thomas Gleixner
adefe55e72 x86/kernel: Convert to new CPU match macros
The new macro set has a consistent namespace and uses C99 initializers
instead of the grufty C89 ones.

Get rid the of the local macro wrappers for consistency.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lkml.kernel.org/r/20200320131509.250559388@linutronix.de
2020-03-24 21:28:26 +01:00
Peter Xu
469ff207b4 x86/vector: Remove warning on managed interrupt migration
The vector management code assumes that managed interrupts cannot be
migrated away from an online CPU. free_moved_vector() has a WARN_ON_ONCE()
which triggers when a managed interrupt vector association on a online CPU
is cleared. The CPU offline code uses a different mechanism which cannot
trigger this.

This assumption is not longer correct because the new CPU isolation feature
which affects the placement of managed interrupts must be able to move a
managed interrupt away from an online CPU.

There are two reasons why this can happen:

  1) When the interrupt is activated the affinity mask which was
     established in irq_create_affinity_masks() is handed in to
     the vector allocation code. This mask contains all CPUs to which
     the interrupt can be made affine to, but this does not take the
     CPU isolation 'managed_irq' mask into account.

     When the interrupt is finally requested by the device driver then the
     affinity is checked again and the CPU isolation 'managed_irq' mask is
     taken into account, which moves the interrupt to a non-isolated CPU if
     possible.

  2) The interrupt can be affine to an isolated CPU because the
     non-isolated CPUs in the calculated affinity mask are not online.

     Once a non-isolated CPU which is in the mask comes online the
     interrupt is migrated to this non-isolated CPU

In both cases the regular online migration mechanism is used which triggers
the WARN_ON_ONCE() in free_moved_vector().

Case #1 could have been addressed by taking the isolation mask into
account, but that would require a massive code change in the activation
logic and the eventual migration event was accepted as a reasonable
tradeoff when the isolation feature was developed. But even if #1 would be
addressed, #2 would still trigger it.

Of course the warning in free_moved_vector() was overlooked at that time
and the above two cases which have been discussed during patch review have
obviously never been tested before the final submission.

So keep it simple and remove the warning.

[ tglx: Rewrote changelog and added a comment to free_moved_vector() ]

Fixes: 11ea68f553 ("genirq, sched/isolation: Isolate from handling managed interrupts")
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>                                                                                                                                                                       
Link: https://lkml.kernel.org/r/20200312205830.81796-1-peterx@redhat.com
2020-03-13 15:29:26 +01:00
Thomas Gleixner
008f1d60fe x86/apic/vector: Force interupt handler invocation to irq context
Sathyanarayanan reported that the PCI-E AER error injection mechanism
can result in a NULL pointer dereference in apic_ack_edge():

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
 RIP: 0010:apic_ack_edge+0x1e/0x40
 Call Trace:
   handle_edge_irq+0x7d/0x1e0
   generic_handle_irq+0x27/0x30
   aer_inject_write+0x53a/0x720

It crashes in irq_complete_move() which dereferences get_irq_regs() which
is obviously NULL when this is called from non interrupt context.

Of course the pointer could be checked, but that just papers over the real
issue. Invoking the low level interrupt handling mechanism from random code
can wreckage the fragile interrupt affinity mechanism of x86 as interrupts
can only be moved in interrupt context or with special care when a CPU goes
offline and the move has to be enforced.

In the best case this triggers the warning in the MSI affinity setter, but
if the call happens on the correct CPU it just corrupts state and might
prevent further interrupt delivery for the affected device.

Mark the APIC interrupts as unsuitable for being invoked in random contexts.

This prevents the AER injection from proliferating the wreckage, but that's
less broken than the current state of affairs and more correct than just
papering over the problem by sprinkling random checks all over the place
and silently corrupting state.

Reported-by: sathyanarayanan.kuppuswamy@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200306130623.684591280@linutronix.de
2020-03-08 11:06:40 +01:00
Linus Torvalds
1a2a76c268 A set of fixes for X86:
- Ensure that the PIT is set up when the local APIC is disable or
    configured in legacy mode. This is caused by an ordering issue
    introduced in the recent changes which skip PIT initialization when the
    TSC and APIC frequencies are already known.
 
  - Handle malformed SRAT tables during early ACPI parsing which caused an
    infinite loop anda boot hang.
 
  - Fix a long standing race in the affinity setting code which affects PCI
    devices with non-maskable MSI interrupts. The problem is caused by the
    non-atomic writes of the MSI address (destination APIC id) and data
    (vector) fields which the device uses to construct the MSI message. The
    non-atomic writes are mandated by PCI.
 
    If both fields change and the device raises an interrupt after writing
    address and before writing data, then the MSI block constructs a
    inconsistent message which causes interrupts to be lost and subsequent
    malfunction of the device.
 
    The fix is to redirect the interrupt to the new vector on the current
    CPU first and then switch it over to the new target CPU. This allows to
    observe an eventually raised interrupt in the transitional stage (old
    CPU, new vector) to be observed in the APIC IRR and retriggered on the
    new target CPU and the new vector. The potential spurious interrupts
    caused by this are harmless and can in the worst case expose a buggy
    driver (all handlers have to be able to deal with spurious interrupts as
    they can and do happen for various reasons).
 
  - Add the missing suspend/resume mechanism for the HYPERV hypercall page
    which prevents resume hibernation on HYPERV guests. This change got
    lost before the merge window.
 
  - Mask the IOAPIC before disabling the local APIC to prevent potentially
    stale IOAPIC remote IRR bits which cause stale interrupt lines after
    resume.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl5AEJwTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoWY2D/47ur9gsVQGryKzneVAr0SCsq4Un11e
 uifX4ldu4gCEBRTYhpgcpiFKeLvY/QJ6uOD+gQUHyy/s+lCf6yzE6UhXEqSCtcT7
 LkSxD8jAFf6KhMA6iqYBfyxUsPMXBetLjjHWsyc/kf15O/vbYm7qf05timmNZkDS
 S7C+yr3KRqRjLR7G7t4twlgC9aLcNUQihUdsH2qyTvjnlkYHJLDa0/Js7bFYYKVx
 9GdUDLvPFB1mZ76g012De4R3kJsWitiyLlQ38DP5VysKulnszUCdiXlgCEFrgxvQ
 OQhLafQzOAzvxQmP+1alODR0dmJZA8k0zsDeeTB/vTpRvv6+Pe2qUswLSpauBzuq
 TpDsrv8/5pwZh28+91f/Unk+tH8NaVNtGe/Uf+ePxIkn1nbqL84o4NHGplM6R97d
 HAWdZQZ1cGRLf6YRRJ+57oM/5xE3vBbF1Wn0+QDTFwdsk2vcxuQ4eB3M/8E1V7Zk
 upp8ty50bZ5+rxQ8XTq/eb8epSRnfLoBYpi4ux6MIOWRdmKDl40cDeZCzA2kNP7m
 qY1haaRN3ksqvhzc0Yf6cL+CgvC4ur8gRHezfOqmBzVoaLyVEFIVjgjR/ojf0bq8
 /v+L9D5+IdIv4jEZruRRs0gOXNDzoBbvf0qKGaO0tUTWiDsv7c5AGixp8aozniHS
 HXsv1lIpRuC7WQ==
 =WxKD
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2020-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:
 "A set of fixes for X86:

   - Ensure that the PIT is set up when the local APIC is disable or
     configured in legacy mode. This is caused by an ordering issue
     introduced in the recent changes which skip PIT initialization when
     the TSC and APIC frequencies are already known.

   - Handle malformed SRAT tables during early ACPI parsing which caused
     an infinite loop anda boot hang.

   - Fix a long standing race in the affinity setting code which affects
     PCI devices with non-maskable MSI interrupts. The problem is caused
     by the non-atomic writes of the MSI address (destination APIC id)
     and data (vector) fields which the device uses to construct the MSI
     message. The non-atomic writes are mandated by PCI.

     If both fields change and the device raises an interrupt after
     writing address and before writing data, then the MSI block
     constructs a inconsistent message which causes interrupts to be
     lost and subsequent malfunction of the device.

     The fix is to redirect the interrupt to the new vector on the
     current CPU first and then switch it over to the new target CPU.
     This allows to observe an eventually raised interrupt in the
     transitional stage (old CPU, new vector) to be observed in the APIC
     IRR and retriggered on the new target CPU and the new vector.

     The potential spurious interrupts caused by this are harmless and
     can in the worst case expose a buggy driver (all handlers have to
     be able to deal with spurious interrupts as they can and do happen
     for various reasons).

   - Add the missing suspend/resume mechanism for the HYPERV hypercall
     page which prevents resume hibernation on HYPERV guests. This
     change got lost before the merge window.

   - Mask the IOAPIC before disabling the local APIC to prevent
     potentially stale IOAPIC remote IRR bits which cause stale
     interrupt lines after resume"

* tag 'x86-urgent-2020-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/apic: Mask IOAPIC entries when disabling the local APIC
  x86/hyperv: Suspend/resume the hypercall page for hibernation
  x86/apic/msi: Plug non-maskable MSI affinity race
  x86/boot: Handle malformed SRAT tables during early ACPI parsing
  x86/timer: Don't skip PIT setup when APIC is disabled or in legacy mode
2020-02-09 12:11:12 -08:00
Tony W Wang-oc
0f378d73d4 x86/apic: Mask IOAPIC entries when disabling the local APIC
When a system suspends, the local APIC is disabled in the suspend sequence,
but the IOAPIC is left in the current state. This means unmasked interrupt
lines stay unmasked. This is usually the case for IOAPIC pin 9 to which the
ACPI interrupt is connected.

That means that in suspended state the IOAPIC can respond to an external
interrupt, e.g. the wakeup via keyboard/RTC/ACPI, but the interrupt message
cannot be handled by the disabled local APIC. As a consequence the Remote
IRR bit is set, but the local APIC does not send an EOI to acknowledge
it. This causes the affected interrupt line to become stale and the stale
Remote IRR bit will cause a hang when __synchronize_hardirq() is invoked
for that interrupt line.

To prevent this, mask all IOAPIC entries before disabling the local
APIC. The resume code already has the unmask operation inside.

[ tglx: Massaged changelog ]

Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/1579076539-7267-1-git-send-email-TonyWWang-oc@zhaoxin.com
2020-02-07 15:32:16 +01:00
Thomas Gleixner
6f1a4891a5 x86/apic/msi: Plug non-maskable MSI affinity race
Evan tracked down a subtle race between the update of the MSI message and
the device raising an interrupt internally on PCI devices which do not
support MSI masking. The update of the MSI message is non-atomic and
consists of either 2 or 3 sequential 32bit wide writes to the PCI config
space.

   - Write address low 32bits
   - Write address high 32bits (If supported by device)
   - Write data

When an interrupt is migrated then both address and data might change, so
the kernel attempts to mask the MSI interrupt first. But for MSI masking is
optional, so there exist devices which do not provide it. That means that
if the device raises an interrupt internally between the writes then a MSI
message is sent built from half updated state.

On x86 this can lead to spurious interrupts on the wrong interrupt
vector when the affinity setting changes both address and data. As a
consequence the device interrupt can be lost causing the device to
become stuck or malfunctioning.

Evan tried to handle that by disabling MSI accross an MSI message
update. That's not feasible because disabling MSI has issues on its own:

 If MSI is disabled the PCI device is routing an interrupt to the legacy
 INTx mechanism. The INTx delivery can be disabled, but the disablement is
 not working on all devices.

 Some devices lose interrupts when both MSI and INTx delivery are disabled.

Another way to solve this would be to enforce the allocation of the same
vector on all CPUs in the system for this kind of screwed devices. That
could be done, but it would bring back the vector space exhaustion problems
which got solved a few years ago.

Fortunately the high address (if supported by the device) is only relevant
when X2APIC is enabled which implies interrupt remapping. In the interrupt
remapping case the affinity setting is happening at the interrupt remapping
unit and the PCI MSI message is programmed only once when the PCI device is
initialized.

That makes it possible to solve it with a two step update:

  1) Target the MSI msg to the new vector on the current target CPU

  2) Target the MSI msg to the new vector on the new target CPU

In both cases writing the MSI message is only changing a single 32bit word
which prevents the issue of inconsistency.

After writing the final destination it is necessary to check whether the
device issued an interrupt while the intermediate state #1 (new vector,
current CPU) was in effect.

This is possible because the affinity change is always happening on the
current target CPU. The code runs with interrupts disabled, so the
interrupt can be detected by checking the IRR of the local APIC. If the
vector is pending in the IRR then the interrupt is retriggered on the new
target CPU by sending an IPI for the associated vector on the target CPU.

This can cause spurious interrupts on both the local and the new target
CPU.

 1) If the new vector is not in use on the local CPU and the device
    affected by the affinity change raised an interrupt during the
    transitional state (step #1 above) then interrupt entry code will
    ignore that spurious interrupt. The vector is marked so that the
    'No irq handler for vector' warning is supressed once.

 2) If the new vector is in use already on the local CPU then the IRR check
    might see an pending interrupt from the device which is using this
    vector. The IPI to the new target CPU will then invoke the handler of
    the device, which got the affinity change, even if that device did not
    issue an interrupt

 3) If the new vector is in use already on the local CPU and the device
    affected by the affinity change raised an interrupt during the
    transitional state (step #1 above) then the handler of the device which
    uses that vector on the local CPU will be invoked.

expose issues in device driver interrupt handlers which are not prepared to
handle a spurious interrupt correctly. This not a regression, it's just
exposing something which was already broken as spurious interrupts can
happen for a lot of reasons and all driver handlers need to be able to deal
with them.

Reported-by: Evan Green <evgreen@chromium.org>
Debugged-by: Evan Green <evgreen@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Evan Green <evgreen@chromium.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87imkr4s7n.fsf@nanos.tec.linutronix.de
2020-02-01 09:31:47 +01:00
Thomas Gleixner
979923871f x86/timer: Don't skip PIT setup when APIC is disabled or in legacy mode
Tony reported a boot regression caused by the recent workaround for systems
which have a disabled (clock gate off) PIT.

On his machine the kernel fails to initialize the PIT because
apic_needs_pit() does not take into account whether the local APIC
interrupt delivery mode will actually allow to setup and use the local
APIC timer. This should be easy to reproduce with acpi=off on the
command line which also disables HPET.

Due to the way the PIT/HPET and APIC setup ordering works (APIC setup can
require working PIT/HPET) the information is not available at the point
where apic_needs_pit() makes this decision.

To address this, split out the interrupt mode selection from
apic_intr_mode_init(), invoke the selection before making the decision
whether PIT is required or not, and add the missing checks into
apic_needs_pit().

Fixes: c8c4076723 ("x86/timer: Skip PIT initialization on modern chipsets")
Reported-by: Anthony Buckley <tony.buckley000@gmail.com>
Tested-by: Anthony Buckley <tony.buckley000@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Daniel Drake <drake@endlessm.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206125
Link: https://lore.kernel.org/r/87sgk6tmk2.fsf@nanos.tec.linutronix.de
2020-01-29 12:50:12 +01:00
Arnd Bergmann
d0b7788804 x86/apic/uv: Avoid unused variable warning
When CONFIG_PROC_FS is disabled, the compiler warns about an unused
variable:

arch/x86/kernel/apic/x2apic_uv_x.c: In function 'uv_setup_proc_files':
arch/x86/kernel/apic/x2apic_uv_x.c:1546:8: error: unused variable 'name' [-Werror=unused-variable]
  char *name = hubless ? "hubless" : "hubbed";

Simplify the code so this variable is no longer needed.

Fixes: 8785968bce ("x86/platform/uv: Add UV Hubbed/Hubless Proc FS Files")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20191212140419.315264-1-arnd@arndb.de
2020-01-17 14:34:41 +01:00
Linus Torvalds
da42761df5 Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 platform updates from Ingo Molnar:
 "UV platform updates (with a 'hubless' variant) and Jailhouse updates
  for better UART support"

* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/jailhouse: Only enable platform UARTs if available
  x86/jailhouse: Improve setup data version comparison
  x86/platform/uv: Account for UV Hubless in is_uvX_hub Ops
  x86/platform/uv: Check EFI Boot to set reboot type
  x86/platform/uv: Decode UVsystab Info
  x86/platform/uv: Add UV Hubbed/Hubless Proc FS Files
  x86/platform/uv: Setup UV functions for Hubless UV Systems
  x86/platform/uv: Add return code to UV BIOS Init function
  x86/platform/uv: Return UV Hubless System Type
  x86/platform/uv: Save OEM_ID from ACPI MADT probe
2019-11-26 09:52:37 -08:00
Linus Torvalds
fd2615908d Merge branches 'core-objtool-for-linus', 'x86-cleanups-for-linus' and 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 objtool, cleanup, and apic updates from Ingo Molnar:
 "Objtool:

   - Fix a gawk 5.0 incompatibility in gen-insn-attr-x86.awk. Most
     distros are still on gawk 4.2.x.

  Cleanup:

   - Misc cleanups, plus the removal of obsolete code such as Calgary
     IOMMU support, which code hasn't seen any real testing in a long
     time and there's no known users left.

  apic:

   - Two changes: a cleanup and a fix for an (old) race for oneshot
     threaded IRQ handlers"

* 'core-objtool-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/insn: Fix awk regexp warnings

* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Remove unused asm/rio.h
  x86: Fix typos in comments
  x86/pci: Remove #ifdef __KERNEL__ guard from <asm/pci.h>
  x86/pci: Remove pci_64.h
  x86: Remove the calgary IOMMU driver
  x86/apic, x86/uprobes: Correct parameter names in kernel-doc comments
  x86/kdump: Remove the unused crash_copy_backup_region()
  x86/nmi: Remove stale EDAC include leftover

* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/ioapic: Rename misnamed functions
  x86/ioapic: Prevent inconsistent state when moving an interrupt
2019-11-26 08:21:54 -08:00
Linus Torvalds
436b2a8039 Printk changes for 5.5
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAl3bpjoACgkQUqAMR0iA
 lPJJDA/+IJT4YCRp2TwV2jvIs0QzvXZrzEsxgCLibLE85mYTJgoQBD3W1bH2eyjp
 T/9U0Zh5PGr/84cHd4qiMxzo+5Olz930weG59NcO4RJBSr671aRYs5tJqwaQAZDR
 wlwaob5S28vUmjPxKulvxv6V3FdI79ZE9xrCOCSTQvz4iCLsGOu+Dn/qtF64pImX
 M/EXzPMBrByiQ8RTM4Ege8JoBqiCZPDG9GR3KPXIXQwEeQgIoeYxwRYakxSmSzz8
 W8NduFCbWavg/yHhghHikMiyOZeQzAt+V9k9WjOBTle3TGJegRhvjgI7508q3tXe
 jQTMGATBOPkIgFaZz7eEn/iBa3jZUIIOzDY93RYBmd26aBvwKLOma/Vkg5oGYl0u
 ZK+CMe+/xXl7brQxQ6JNsQhbSTjT+746LvLJlCvPbbPK9R0HeKNhsdKpGY3ugnmz
 VAnOFIAvWUHO7qx+J+EnOo5iiPpcwXZj4AjrwVrs/x5zVhzwQ+4DSU6rbNn0O1Ak
 ELrBqCQkQzh5kqK93jgMHeWQ9EOUp1Lj6PJhTeVnOx2x8tCOi6iTQFFrfdUPlZ6K
 2DajgrFhti4LvwVsohZlzZuKRm5EuwReLRSOn7PU5qoSm5rcouqMkdlYG/viwyhf
 mTVzEfrfemrIQOqWmzPrWEXlMj2mq8oJm4JkC+jJ/+HsfK4UU8I=
 =QCEy
 -----END PGP SIGNATURE-----

Merge tag 'printk-for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk

Pull printk updates from Petr Mladek:

 - Allow to print symbolic error names via new %pe modifier.

 - Use pr_warn() instead of the remaining pr_warning() calls. Fix
   formatting of the related lines.

 - Add VSPRINTF entry to MAINTAINERS.

* tag 'printk-for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk: (32 commits)
  checkpatch: don't warn about new vsprintf pointer extension '%pe'
  MAINTAINERS: Add VSPRINTF
  tools lib api: Renaming pr_warning to pr_warn
  ASoC: samsung: Use pr_warn instead of pr_warning
  lib: cpu_rmap: Use pr_warn instead of pr_warning
  trace: Use pr_warn instead of pr_warning
  dma-debug: Use pr_warn instead of pr_warning
  vgacon: Use pr_warn instead of pr_warning
  fs: afs: Use pr_warn instead of pr_warning
  sh/intc: Use pr_warn instead of pr_warning
  scsi: Use pr_warn instead of pr_warning
  platform/x86: intel_oaktrail: Use pr_warn instead of pr_warning
  platform/x86: asus-laptop: Use pr_warn instead of pr_warning
  platform/x86: eeepc-laptop: Use pr_warn instead of pr_warning
  oprofile: Use pr_warn instead of pr_warning
  of: Use pr_warn instead of pr_warning
  macintosh: Use pr_warn instead of pr_warning
  idsn: Use pr_warn instead of pr_warning
  ide: Use pr_warn instead of pr_warning
  crypto: n2: Use pr_warn instead of pr_warning
  ...
2019-11-25 19:40:40 -08:00
Jan Beulich
fe6f85ca12 x86/apic/32: Avoid bogus LDR warnings
The removal of the LDR initialization in the bigsmp_32 APIC code unearthed
a problem in setup_local_APIC().

The code checks unconditionally for a mismatch of the logical APIC id by
comparing the early APIC id which was initialized in get_smp_config() with
the actual LDR value in the APIC.

Due to the removal of the bogus LDR initialization the check now can
trigger on bigsmp_32 APIC systems emitting a warning for every booting
CPU. This is of course a false positive because the APIC is not using
logical destination mode.

Restrict the check and the possibly resulting fixup to systems which are
actually using the APIC in logical destination mode.

[ tglx: Massaged changelog and added Cc stable ]

Fixes: bae3a8d330 ("x86/apic: Do not initialize LDR and DFR for bigsmp")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/666d8f91-b5a8-1afd-7add-821e72a35f03@suse.com
2019-11-05 00:11:00 +01:00
Yi Wang
44eb5a7e5d x86/apic, x86/uprobes: Correct parameter names in kernel-doc comments
Rename parameter names to the correct ones used in the function. No
functional changes.

 [ bp: Merge two patches into a single one. ]

Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/1571816442-22494-1-git-send-email-wang.yi59@zte.com.cn
2019-10-27 09:00:28 +01:00
Thomas Gleixner
2579a4eefc x86/ioapic: Rename misnamed functions
ioapic_irqd_[un]mask() are misnomers as both functions do way more than
masking and unmasking the interrupt line. Both deal with the moving the
affinity of the interrupt within interrupt context. The mask/unmask is just
a tiny part of the functionality.

Rename them to ioapic_prepare/finish_move(), fixup the call sites and
rename the related variables in the code to reflect what this is about.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Link: https://lkml.kernel.org/r/20191017101938.412489856@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-24 12:09:21 +02:00
Thomas Gleixner
df4393424a x86/ioapic: Prevent inconsistent state when moving an interrupt
There is an issue with threaded interrupts which are marked ONESHOT
and using the fasteoi handler:

  if (IS_ONESHOT())
    mask_irq();
  ....
  cond_unmask_eoi_irq()
    chip->irq_eoi();
      if (setaffinity_pending) {
         mask_ioapic();
         ...
	 move_affinity();
	 unmask_ioapic();
      }

So if setaffinity is pending the interrupt will be moved and then
unconditionally unmasked at the ioapic level, which is wrong in two
aspects:

 1) It should be kept masked up to the point where the threaded handler
    finished.

 2) The physical chip state and the software masked state are inconsistent

Guard both the mask and the unmask with a check for the software masked
state. If the line is marked masked then the ioapic line is also masked, so
both mask_ioapic() and unmask_ioapic() can be skipped safely.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Fixes: 3aa551c9b4 ("genirq: add threaded interrupt handler support")
Link: https://lkml.kernel.org/r/20191017101938.321393687@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-24 12:09:21 +02:00
Kefeng Wang
8d3bcc441e x86: Use pr_warn instead of pr_warning
As said in commit f2c2cbcc35 ("powerpc: Use pr_warn instead of
pr_warning"), removing pr_warning so all logging messages use a
consistent <prefix>_warn style. Let's do it.

Link: http://lkml.kernel.org/r/20191018031850.48498-7-wangkefeng.wang@huawei.com
To: linux-kernel@vger.kernel.org
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Robert Richter <rric@kernel.org>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Andy Shevchenko <andy@infradead.org>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2019-10-18 15:00:18 +02:00
Sean Christopherson
7a22e03b0c x86/apic/x2apic: Fix a NULL pointer deref when handling a dying cpu
Check that the per-cpu cluster mask pointer has been set prior to
clearing a dying cpu's bit.  The per-cpu pointer is not set until the
target cpu reaches smp_callin() during CPUHP_BRINGUP_CPU, whereas the
teardown function, x2apic_dead_cpu(), is associated with the earlier
CPUHP_X2APIC_PREPARE.  If an error occurs before the cpu is awakened,
e.g. if do_boot_cpu() itself fails, x2apic_dead_cpu() will dereference
the NULL pointer and cause a panic.

  smpboot: do_boot_cpu failed(-22) to wakeup CPU#1
  BUG: kernel NULL pointer dereference, address: 0000000000000008
  RIP: 0010:x2apic_dead_cpu+0x1a/0x30
  Call Trace:
   cpuhp_invoke_callback+0x9a/0x580
   _cpu_up+0x10d/0x140
   do_cpu_up+0x69/0xb0
   smp_init+0x63/0xa9
   kernel_init_freeable+0xd7/0x229
   ? rest_init+0xa0/0xa0
   kernel_init+0xa/0x100
   ret_from_fork+0x35/0x40

Fixes: 023a611748 ("x86/apic/x2apic: Simplify cluster management")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20191001205019.5789-1-sean.j.christopherson@intel.com
2019-10-15 10:57:09 +02:00
Mike Travis
df55029f7e x86/platform/uv: Check EFI Boot to set reboot type
Change to checking for EFI Boot type from previous check on if this
is a KDUMP kernel.  This allows for KDUMP kernels that can handle
EFI reboots.

Signed-off-by: Mike Travis <mike.travis@hpe.com>
Reviewed-by: Steve Wahl <steve.wahl@hpe.com>
Reviewed-by: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hedi Berriche <hedi.berriche@hpe.com>
Cc: Justin Ernst <justin.ernst@hpe.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russ Anderson <russ.anderson@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190910145840.215091717@stormcage.eag.rdlabs.hpecorp.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-07 13:42:11 +02:00
Mike Travis
f5a8f0ecb4 x86/platform/uv: Decode UVsystab Info
Decode the hubless UVsystab passed from BIOS to the kernel saving
pertinent info in a similar manner that hubbed UVsystabs are decoded.

Signed-off-by: Mike Travis <mike.travis@hpe.com>
Reviewed-by: Steve Wahl <steve.wahl@hpe.com>
Reviewed-by: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hedi Berriche <hedi.berriche@hpe.com>
Cc: Justin Ernst <justin.ernst@hpe.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russ Anderson <russ.anderson@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190910145840.135325066@stormcage.eag.rdlabs.hpecorp.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-07 13:42:11 +02:00
Mike Travis
8785968bce x86/platform/uv: Add UV Hubbed/Hubless Proc FS Files
Indicate to UV user utilities that UV hubless support is available on
this system via the existing /proc infterface.  The current interface is
maintained with the addition of new /proc leaves ("hubbed", "hubless",
and "oemid") that contain the specific type of UV arch this one is.

Signed-off-by: Mike Travis <mike.travis@hpe.com>
Reviewed-by: Steve Wahl <steve.wahl@hpe.com>
Reviewed-by: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hedi Berriche <hedi.berriche@hpe.com>
Cc: Justin Ernst <justin.ernst@hpe.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russ Anderson <russ.anderson@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190910145840.055590900@stormcage.eag.rdlabs.hpecorp.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-07 13:42:10 +02:00
Mike Travis
2bcf265287 x86/platform/uv: Setup UV functions for Hubless UV Systems
Add more support for UV systems that do not contain a UV Hub (AKA
"hubless").  This update adds support for additional functions required:

    Use PCH NMI handler instead of a UV Hub NMI handler.

    Initialize the UV BIOS callback interface used to support specific
    UV functions.

Signed-off-by: Mike Travis <mike.travis@hpe.com>
Reviewed-by: Steve Wahl <steve.wahl@hpe.com>
Reviewed-by: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hedi Berriche <hedi.berriche@hpe.com>
Cc: Justin Ernst <justin.ernst@hpe.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russ Anderson <russ.anderson@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190910145839.975787119@stormcage.eag.rdlabs.hpecorp.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-07 13:42:10 +02:00
Mike Travis
0959f8256a x86/platform/uv: Return UV Hubless System Type
Return the type of UV hubless system for UV specific code that depends
on that.  Add a function to convert UV system type to bit pattern needed
for is_uv_hubless().

Signed-off-by: Mike Travis <mike.travis@hpe.com>
Reviewed-by: Steve Wahl <steve.wahl@hpe.com>
Reviewed-by: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hedi Berriche <hedi.berriche@hpe.com>
Cc: Justin Ernst <justin.ernst@hpe.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russ Anderson <russ.anderson@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190910145839.814880843@stormcage.eag.rdlabs.hpecorp.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-07 13:42:10 +02:00
Mike Travis
61e5ddca9c x86/platform/uv: Save OEM_ID from ACPI MADT probe
Save the OEM_ID and OEM_TABLE_ID passed to the apic driver probe function
for later use.  Also, convert the char list arg passed from the kernel
to a true null-terminated string.

Signed-off-by: Mike Travis <mike.travis@hpe.com>
Reviewed-by: Steve Wahl <steve.wahl@hpe.com>
Reviewed-by: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hedi Berriche <hedi.berriche@hpe.com>
Cc: Justin Ernst <justin.ernst@hpe.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russ Anderson <russ.anderson@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190910145839.732237241@stormcage.eag.rdlabs.hpecorp.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-07 13:42:09 +02:00
Linus Torvalds
c5f12fdb8b Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 apic updates from Thomas Gleixner:

 - Cleanup the apic IPI implementation by removing duplicated code and
   consolidating the functions into the APIC core.

 - Implement a safe variant of the IPI broadcast mode. Contrary to
   earlier attempts this uses the core tracking of which CPUs have been
   brought online at least once so that a broadcast does not end up in
   some dead end in BIOS/SMM code when the CPU is still waiting for
   init. Once all CPUs have been brought up once, IPI broadcasting is
   enabled. Before that regular one by one IPIs are issued.

 - Drop the paravirt CR8 related functions as they have no user anymore

 - Initialize the APIC TPR to block interrupt 16-31 as they are reserved
   for CPU exceptions and should never be raised by any well behaving
   device.

 - Emit a warning when vector space exhaustion breaks the admin set
   affinity of an interrupt.

 - Make sure to use the NMI fallback when shutdown via reboot vector IPI
   fails. The original code had conditions which prevent the code path
   to be reached.

 - Annotate various APIC config variables as RO after init.

[ The ipi broadcase change came in earlier through the cpu hotplug
  branch, but I left the explanation in the commit message since it was
  shared between the two different branches    - Linus ]

* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (28 commits)
  x86/apic/vector: Warn when vector space exhaustion breaks affinity
  x86/apic: Annotate global config variables as "read-only after init"
  x86/apic/x2apic: Implement IPI shorthands support
  x86/apic/flat64: Remove the IPI shorthand decision logic
  x86/apic: Share common IPI helpers
  x86/apic: Remove the shorthand decision logic
  x86/smp: Enhance native_send_call_func_ipi()
  x86/smp: Move smp_function_call implementations into IPI code
  x86/apic: Provide and use helper for send_IPI_allbutself()
  x86/apic: Add static key to Control IPI shorthands
  x86/apic: Move no_ipi_broadcast() out of 32bit
  x86/apic: Add NMI_VECTOR wait to IPI shorthand
  x86/apic: Remove dest argument from __default_send_IPI_shortcut()
  x86/hotplug: Silence APIC and NMI when CPU is dead
  x86/cpu: Move arch_smt_update() to a neutral place
  x86/apic/uv: Make x2apic_extra_bits static
  x86/apic: Consolidate the apic local headers
  x86/apic: Move apic_flat_64 header into apic directory
  x86/apic: Move ipi header into apic directory
  x86/apic: Cleanup the include maze
  ...
2019-09-17 12:04:39 -07:00
Linus Torvalds
22331f8952 Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpu-feature updates from Ingo Molnar:

 - Rework the Intel model names symbols/macros, which were decades of
   ad-hoc extensions and added random noise. It's now a coherent, easy
   to follow nomenclature.

 - Add new Intel CPU model IDs:
    - "Tiger Lake" desktop and mobile models
    - "Elkhart Lake" model ID
    - and the "Lightning Mountain" variant of Airmont, plus support code

 - Add the new AVX512_VP2INTERSECT instruction to cpufeatures

 - Remove Intel MPX user-visible APIs and the self-tests, because the
   toolchain (gcc) is not supporting it going forward. This is the
   first, lowest-risk phase of MPX removal.

 - Remove X86_FEATURE_MFENCE_RDTSC

 - Various smaller cleanups and fixes

* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
  x86/cpu: Update init data for new Airmont CPU model
  x86/cpu: Add new Airmont variant to Intel family
  x86/cpu: Add Elkhart Lake to Intel family
  x86/cpu: Add Tiger Lake to Intel family
  x86: Correct misc typos
  x86/intel: Add common OPTDIFFs
  x86/intel: Aggregate microserver naming
  x86/intel: Aggregate big core graphics naming
  x86/intel: Aggregate big core mobile naming
  x86/intel: Aggregate big core client naming
  x86/cpufeature: Explain the macro duplication
  x86/ftrace: Remove mcount() declaration
  x86/PCI: Remove superfluous returns from void functions
  x86/msr-index: Move AMD MSRs where they belong
  x86/cpu: Use constant definitions for CPU models
  lib: Remove redundant ftrace flag removal
  x86/crash: Remove unnecessary comparison
  x86/bitops: Use __builtin_constant_p() directly instead of IS_IMMEDIATE()
  x86: Remove X86_FEATURE_MFENCE_RDTSC
  x86/mpx: Remove MPX APIs
  ...
2019-09-16 18:47:53 -07:00
Linus Torvalds
95217783b7 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "A KVM guest fix, and a kdump kernel relocation errors fix"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/timer: Force PIT initialization when !X86_FEATURE_ARAT
  x86/purgatory: Change compiler flags from -mcmodel=kernel to -mcmodel=large to fix kexec relocation errors
2019-09-12 14:47:35 +01:00
Jan Stancek
afa8b475c1 x86/timer: Force PIT initialization when !X86_FEATURE_ARAT
KVM guests with commit c8c4076723 ("x86/timer: Skip PIT initialization on
modern chipsets") applied to guest kernel have been observed to have
unusually higher CPU usage with symptoms of increase in vm exits for HLT
and MSW_WRITE (MSR_IA32_TSCDEADLINE).

This is caused by older QEMUs lacking support for X86_FEATURE_ARAT.  lapic
clock retains CLOCK_EVT_FEAT_C3STOP and nohz stays inactive.  There's no
usable broadcast device either.

Do the PIT initialization if guest CPU lacks X86_FEATURE_ARAT.  On real
hardware it shouldn't matter as ARAT and DEADLINE come together.

Fixes: c8c4076723 ("x86/timer: Skip PIT initialization on modern chipsets")
Signed-off-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2019-09-08 09:01:15 +02:00
Linus Torvalds
950b07c14e Revert "x86/apic: Include the LDR when clearing out APIC registers"
This reverts commit 558682b529.

Chris Wilson reports that it breaks his CPU hotplug test scripts.  In
particular, it breaks offlining and then re-onlining the boot CPU, which
we treat specially (and the BIOS does too).

The symptoms are that we can offline the CPU, but it then does not come
back online again:

    smpboot: CPU 0 is now offline
    smpboot: Booting Node 0 Processor 0 APIC 0x0
    smpboot: do_boot_cpu failed(-1) to wakeup CPU#0

Thomas says he knows why it's broken (my personal suspicion: our magic
handling of the "cpu0_logical_apicid" thing), but for 5.3 the right fix
is to just revert it, since we've never touched the LDR bits before, and
it's not worth the risk to do anything else at this stage.

[ Hotpluging of the boot CPU is special anyway, and should be off by
  default. See the "BOOTPARAM_HOTPLUG_CPU0" config option and the
  cpu0_hotplug kernel parameter.

  In general you should not do it, and it has various known limitations
  (hibernate and suspend require the boot CPU, for example).

  But it should work, even if the boot CPU is special and needs careful
  treatment       - Linus ]

Link: https://lore.kernel.org/lkml/156785100521.13300.14461504732265570003@skylake-alporthouse-com/
Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Bandan Das <bsd@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-07 14:25:54 -07:00
Ingo Molnar
77e5517cb5 Merge branch 'linus' into x86/cpu, to resolve conflicts
Conflicts:
	tools/power/x86/turbostat/turbostat.c

Recent turbostat changes conflicted with a pending rename of x86 model names in tip:x86/cpu,
sort it out.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-09-02 09:10:07 +02:00
Neil Horman
743dac494d x86/apic/vector: Warn when vector space exhaustion breaks affinity
On x86, CPUs are limited in the number of interrupts they can have affined
to them as they only support 256 interrupt vectors per CPU. 32 vectors are
reserved for the CPU and the kernel reserves another 22 for internal
purposes. That leaves 202 vectors for assignement to devices.

When an interrupt is set up or the affinity is changed by the kernel or the
administrator, the vector assignment code attempts to honor the requested
affinity mask. If the vector space on the CPUs in that affinity mask is
exhausted the code falls back to a wider set of CPUs and assigns a vector
on a CPU outside of the requested affinity mask silently.

While the effective affinity is reflected in the corresponding
/proc/irq/$N/effective_affinity* files the silent breakage of the requested
affinity can lead to unexpected behaviour for administrators.

Add a pr_warn() when this happens so that adminstrators get at least
informed about it in the syslog.

[ tglx: Massaged changelog and made the pr_warn() more informative ]

Reported-by: djuran@redhat.com
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: djuran@redhat.com
Link: https://lkml.kernel.org/r/20190822143421.9535-1-nhorman@tuxdriver.com
2019-08-28 14:44:08 +02:00
Peter Zijlstra
5ebb34edbe x86/intel: Aggregate microserver naming
Currently big microservers have _XEON_D while small microservers have
_X, Make it uniformly: _D.

for i in `git grep -l "\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*_\(X\|XEON_D\)"`
do
	sed -i -e 's/\(\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*ATOM.*\)_X/\1_D/g' \
	       -e 's/\(\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*\)_XEON_D/\1_D/g' ${i}
done

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: x86@kernel.org
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Link: https://lkml.kernel.org/r/20190827195122.677152989@infradead.org
2019-08-28 11:29:32 +02:00
Peter Zijlstra
5e741407ea x86/intel: Aggregate big core graphics naming
Currently big core clients with extra graphics on have:

 - _G
 - _GT3E

Make it uniformly: _G

for i in `git grep -l "\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*_GT3E"`
do
	sed -i -e 's/\(\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*\)_GT3E/\1_G/g' ${i}
done

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: x86@kernel.org
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Link: https://lkml.kernel.org/r/20190827195122.622802314@infradead.org
2019-08-28 11:29:31 +02:00
Peter Zijlstra
af239c44e3 x86/intel: Aggregate big core mobile naming
Currently big core mobile chips have either:

 - _L
 - _ULT
 - _MOBILE

Make it uniformly: _L.

for i in `git grep -l "\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*_\(MOBILE\|ULT\)"`
do
	sed -i -e 's/\(\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*\)_\(MOBILE\|ULT\)/\1_L/g' ${i}
done

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: x86@kernel.org
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190827195122.568978530@infradead.org
2019-08-28 11:29:31 +02:00
Peter Zijlstra
c66f78a6de x86/intel: Aggregate big core client naming
Currently the big core client models either have:

 - no OPTDIFF
 - _CORE
 - _DESKTOP

Make it uniformly: 'no OPTDIFF'.

for i in `git grep -l "\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*_\(CORE\|DESKTOP\)"`
do
	sed -i -e 's/\(\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*\)_\(CORE\|DESKTOP\)/\1/g' ${i}
done

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: x86@kernel.org
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190827195122.513945586@infradead.org
2019-08-28 11:29:31 +02:00
Bandan Das
558682b529 x86/apic: Include the LDR when clearing out APIC registers
Although APIC initialization will typically clear out the LDR before
setting it, the APIC cleanup code should reset the LDR.

This was discovered with a 32-bit KVM guest jumping into a kdump
kernel. The stale bits in the LDR triggered a bug in the KVM APIC
implementation which caused the destination mapping for VCPUs to be
corrupted.

Note that this isn't intended to paper over the KVM APIC bug. The kernel
has to clear the LDR when resetting the APIC registers except when X2APIC
is enabled.

This lacks a Fixes tag because missing to clear LDR goes way back into pre
git history.

[ tglx: Made x2apic_enabled a function call as required ]

Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190826101513.5080-3-bsd@redhat.com
2019-08-26 20:00:57 +02:00
Bandan Das
bae3a8d330 x86/apic: Do not initialize LDR and DFR for bigsmp
Legacy apic init uses bigsmp for smp systems with 8 and more CPUs. The
bigsmp APIC implementation uses physical destination mode, but it
nevertheless initializes LDR and DFR. The LDR even ends up incorrectly with
multiple bit being set.

This does not cause a functional problem because LDR and DFR are ignored
when physical destination mode is active, but it triggered a problem on a
32-bit KVM guest which jumps into a kdump kernel.

The multiple bits set unearthed a bug in the KVM APIC implementation. The
code which creates the logical destination map for VCPUs ignores the
disabled state of the APIC and ends up overwriting an existing valid entry
and as a result, APIC calibration hangs in the guest during kdump
initialization.

Remove the bogus LDR/DFR initialization.

This is not intended to work around the KVM APIC bug. The LDR/DFR
ininitalization is wrong on its own.

The issue goes back into the pre git history. The fixes tag is the commit
in the bitkeeper import which introduced bigsmp support in 2003.

  git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git

Fixes: db7b9e9f26b8 ("[PATCH] Clustered APIC setup for >8 CPU systems")
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190826101513.5080-2-bsd@redhat.com
2019-08-26 20:00:56 +02:00
Thomas Gleixner
3e5bedc2c2 x86/apic: Fix arch_dynirq_lower_bound() bug for DT enabled machines
Rahul Tanwar reported the following bug on DT systems:

> 'ioapic_dynirq_base' contains the virtual IRQ base number. Presently, it is
> updated to the end of hardware IRQ numbers but this is done only when IOAPIC
> configuration type is IOAPIC_DOMAIN_LEGACY or IOAPIC_DOMAIN_STRICT. There is
> a third type IOAPIC_DOMAIN_DYNAMIC which applies when IOAPIC configuration
> comes from devicetree.
>
> See dtb_add_ioapic() in arch/x86/kernel/devicetree.c
>
> In case of IOAPIC_DOMAIN_DYNAMIC (DT/OF based system), 'ioapic_dynirq_base'
> remains to zero initialized value. This means that for OF based systems,
> virtual IRQ base will get set to zero.

Such systems will very likely not even boot.

For DT enabled machines ioapic_dynirq_base is irrelevant and not
updated, so simply map the IRQ base 1:1 instead.

Reported-by: Rahul Tanwar <rahul.tanwar@linux.intel.com>
Tested-by: Rahul Tanwar <rahul.tanwar@linux.intel.com>
Tested-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: alan@linux.intel.com
Cc: bp@alien8.de
Cc: cheol.yong.kim@intel.com
Cc: qi-ming.wu@intel.com
Cc: rahul.tanwar@intel.com
Cc: rppt@linux.ibm.com
Cc: tony.luck@intel.com
Link: http://lkml.kernel.org/r/20190821081330.1187-1-rahul.tanwar@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-08-26 12:11:23 +02:00
Thomas Gleixner
f897e60a12 x86/apic: Handle missing global clockevent gracefully
Some newer machines do not advertise legacy timers. The kernel can handle
that situation if the TSC and the CPU frequency are enumerated by CPUID or
MSRs and the CPU supports TSC deadline timer. If the CPU does not support
TSC deadline timer the local APIC timer frequency has to be known as well.

Some Ryzens machines do not advertize legacy timers, but there is no
reliable way to determine the bus frequency which feeds the local APIC
timer when the machine allows overclocking of that frequency.

As there is no legacy timer the local APIC timer calibration crashes due to
a NULL pointer dereference when accessing the not installed global clock
event device.

Switch the calibration loop to a non interrupt based one, which polls
either TSC (if frequency is known) or jiffies. The latter requires a global
clockevent. As the machines which do not have a global clockevent installed
have a known TSC frequency this is a non issue. For older machines where
TSC frequency is not known, there is no known case where the legacy timers
do not exist as that would have been reported long ago.

Reported-by: Daniel Drake <drake@endlessm.com>
Reported-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Daniel Drake <drake@endlessm.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1908091443030.21433@nanos.tec.linutronix.de
Link: http://bugzilla.opensuse.org/show_bug.cgi?id=1142926#c12
2019-08-19 12:34:07 +02:00
Borislav Petkov
5785675dfe x86/apic/32: Fix yet another implicit fallthrough warning
Fix

  arch/x86/kernel/apic/probe_32.c: In function ‘default_setup_apic_routing’:
  arch/x86/kernel/apic/probe_32.c:146:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
      if (!APIC_XAPIC(version)) {
         ^
  arch/x86/kernel/apic/probe_32.c:151:3: note: here
   case X86_VENDOR_HYGON:
   ^~~~

for 32-bit builds.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190811154036.29805-1-bp@alien8.de
2019-08-12 20:35:04 +02:00
Sean Christopherson
6444b40eed x86/apic: Annotate global config variables as "read-only after init"
Mark the APIC's global config variables that are constant after boot as
__ro_after_init to help document that the majority of the APIC config is
not changed at runtime, and to harden the kernel a smidge.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190805212134.12001-1-sean.j.christopherson@intel.com
2019-08-07 15:24:21 +02:00
Thomas Gleixner
43931d350f x86/apic/x2apic: Implement IPI shorthands support
All callers of apic->send_IPI_all() and apic->send_IPI_allbutself() contain
the decision logic for shorthand invocation already and invoke
send_IPI_mask() if the prereqisites are not satisfied.

Implement shorthand support for x2apic.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20190722105221.134696837@linutronix.de
2019-07-25 16:12:02 +02:00
Thomas Gleixner
2510d09e9d x86/apic/flat64: Remove the IPI shorthand decision logic
All callers of apic->send_IPI_all() and apic->send_IPI_allbutself() contain
the decision logic for shorthand invocation already and invoke
send_IPI_mask() if the prereqisites are not satisfied.

Remove the now redundant decision logic in the APIC code and the duplicate
helper in probe_64.c.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20190722105221.042964120@linutronix.de
2019-07-25 16:12:02 +02:00
Thomas Gleixner
dea978632e x86/apic: Share common IPI helpers
The 64bit implementations need the same wrappers around
__default_send_IPI_shortcut() as 32bit.

Move them out of the 32bit section.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20190722105220.951534451@linutronix.de
2019-07-25 16:12:02 +02:00
Thomas Gleixner
1f0ad66048 x86/apic: Remove the shorthand decision logic
All callers of apic->send_IPI_all() and apic->send_IPI_allbutself() contain
the decision logic for shorthand invocation already and invoke
send_IPI_mask() if the prereqisites are not satisfied.

Remove the now redundant decision logic in the 32bit implementation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20190722105220.860244707@linutronix.de
2019-07-25 16:12:02 +02:00
Thomas Gleixner
832df3d47b x86/smp: Enhance native_send_call_func_ipi()
Nadav noticed that the cpumask allocations in native_send_call_func_ipi()
are noticeable in microbenchmarks.

Use the new cpumask_or_equal() function to simplify the decision whether
the supplied target CPU mask is either equal to cpu_online_mask or equal to
cpu_online_mask except for the CPU on which the function is invoked.

cpumask_or_equal() or's the target mask and the cpumask of the current CPU
together and compares it to cpu_online_mask.

If the result is false, use the mask based IPI function, otherwise check
whether the current CPU is set in the target mask and invoke either the
send_IPI_all() or the send_IPI_allbutselt() APIC callback.

Make the shorthand decision also depend on the static key which enables
shorthand mode. That allows to remove the extra cpumask comparison with
cpu_callout_mask.

Reported-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20190722105220.768238809@linutronix.de
2019-07-25 16:12:01 +02:00
Thomas Gleixner
d0a7166bc7 x86/smp: Move smp_function_call implementations into IPI code
Move it where it belongs. That allows to keep all the shorthand logic in
one place.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20190722105220.677835995@linutronix.de
2019-07-25 16:12:01 +02:00
Thomas Gleixner
22ca7ee933 x86/apic: Provide and use helper for send_IPI_allbutself()
To support IPI shorthands wrap invocations of apic->send_IPI_allbutself()
in a helper function, so the static key controlling the shorthand mode is
only in one place.

Fixup all callers.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20190722105220.492691679@linutronix.de
2019-07-25 16:12:00 +02:00
Thomas Gleixner
6a1cb5f5c6 x86/apic: Add static key to Control IPI shorthands
The IPI shorthand functionality delivers IPI/NMI broadcasts to all CPUs in
the system. This can have similar side effects as the MCE broadcasting when
CPUs are waiting in the BIOS or are offlined.

The kernel tracks already the state of offlined CPUs whether they have been
brought up at least once so that the CR4 MCE bit is set to make sure that
MCE broadcasts can't brick the machine.

Utilize that information and compare it to the cpu_present_mask. If all
present CPUs have been brought up at least once then the broadcast side
effect is mitigated by disabling regular interrupt/IPI delivery in the APIC
itself and by the cpu offline check at the begin of the NMI handler.

Use a static key to switch between broadcasting via shorthands or sending
the IPI/NMI one by one.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20190722105220.386410643@linutronix.de
2019-07-25 16:12:00 +02:00
Thomas Gleixner
bdda3b93e6 x86/apic: Move no_ipi_broadcast() out of 32bit
For the upcoming shorthand support for all APIC incarnations the command
line option needs to be available for 64 bit as well.

While at it, rename the control variable, make it static and mark it
__ro_after_init.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20190722105220.278327940@linutronix.de
2019-07-25 16:12:00 +02:00
Thomas Gleixner
bd82dba2fa x86/apic: Add NMI_VECTOR wait to IPI shorthand
To support NMI shorthand broadcasts add the safe wait for ICR idle for NMI
vector delivery.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20190722105220.185838026@linutronix.de
2019-07-25 16:11:59 +02:00
Thomas Gleixner
3994ff90ac x86/apic: Remove dest argument from __default_send_IPI_shortcut()
The SDM states:

  "The destination shorthand field of the ICR allows the delivery mode to be
   by-passed in favor of broadcasting the IPI to all the processors on the
   system bus and/or back to itself (see Section 10.6.1, Interrupt Command
   Register (ICR)). Three destination shorthands are supported: self, all
   excluding self, and all including self. The destination mode is ignored
   when a destination shorthand is used."

So there is no point to supply the destination mode to the shorthand
delivery function.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20190722105220.094613426@linutronix.de
2019-07-25 16:11:59 +02:00
Thomas Gleixner
60dcaad573 x86/hotplug: Silence APIC and NMI when CPU is dead
In order to support IPI/NMI broadcasting via the shorthand mechanism side
effects of shorthands need to be mitigated:

 Shorthand IPIs and NMIs hit all CPUs including unplugged CPUs

Neither of those can be handled on unplugged CPUs for obvious reasons.

It would be trivial to just fully disable the APIC via the enable bit in
MSR_APICBASE. But that's not possible because clearing that bit on systems
based on the 3 wire APIC bus would require a hardware reset to bring it
back as the APIC would lose track of bus arbitration. On systems with FSB
delivery APICBASE could be disabled, but it has to be guaranteed that no
interrupt is sent to the APIC while in that state and it's not clear from
the SDM whether it still responds to INIT/SIPI messages.

Therefore stay on the safe side and switch the APIC into soft disabled mode
so it won't deliver any regular vector to the CPU.

NMIs are still propagated to the 'dead' CPUs. To mitigate that add a check
for the CPU being offline on early nmi entry and if so bail.

Note, this cannot use the stop/restart_nmi() magic which is used in the
alternatives code. A dead CPU cannot invoke nmi_enter() or anything else
due to RCU and other reasons.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1907241723290.1791@nanos.tec.linutronix.de
2019-07-25 16:11:59 +02:00
Thomas Gleixner
82e5747823 x86/apic/uv: Make x2apic_extra_bits static
Not used outside of the UV apic source.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20190722105219.725264153@linutronix.de
2019-07-25 16:11:58 +02:00
Thomas Gleixner
c94f0718fb x86/apic: Consolidate the apic local headers
Now there are three small local headers. Some contain functions which are
only used in one source file.

Move all the inlines and declarations into a single local header and the
inlines which are only used in one source file into that.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20190722105219.618612624@linutronix.de
2019-07-25 16:11:58 +02:00
Thomas Gleixner
ba77b2a02e x86/apic: Move apic_flat_64 header into apic directory
Only used locally.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20190722105219.526508168@linutronix.de
2019-07-25 16:11:58 +02:00