Commit Graph

18085 Commits

Author SHA1 Message Date
Juergen Gross
8aa83e6395 x86/setup: Call early_reserve_memory() earlier
Commit in Fixes introduced early_reserve_memory() to do all needed
initial memblock_reserve() calls in one function. Unfortunately, the call
of early_reserve_memory() is done too late for Xen dom0, as in some
cases a Xen hook called by e820__memory_setup() will need those memory
reservations to have happened already.

Move the call of early_reserve_memory() before the call of
e820__memory_setup() in order to avoid such problems.

Fixes: a799c2bd29 ("x86/setup: Consolidate early memory reservations")
Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20210920120421.29276-1-jgross@suse.com
2021-09-21 09:52:08 +02:00
Tony Luck
a6e3cf70b7 x86/mce: Change to not send SIGBUS error during copy from user
Sending a SIGBUS for a copy from user is not the correct semantic.
System calls should return -EFAULT (or a short count for write(2)).

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210818002942.1607544-3-tony.luck@intel.com
2021-09-20 10:55:41 +02:00
Linus Torvalds
20621d2f27 Merge tag 'x86_urgent_for_v5.15_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:

 - Prevent a infinite loop in the MCE recovery on return to user space,
   which was caused by a second MCE queueing work for the same page and
   thereby creating a circular work list.

 - Make kern_addr_valid() handle existing PMD entries, which are marked
   not present in the higher level page table, correctly instead of
   blindly dereferencing them.

 - Pass a valid address to sanitize_phys(). This was caused by the
   mixture of inclusive and exclusive ranges. memtype_reserve() expect
   'end' being exclusive, but sanitize_phys() wants it inclusive. This
   worked so far, but with end being the end of the physical address
   space the fail is exposed.

 - Increase the maximum supported GPIO numbers for 64bit. Newer SoCs
   exceed the previous maximum.

* tag 'x86_urgent_for_v5.15_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Avoid infinite loop for copy from user recovery
  x86/mm: Fix kern_addr_valid() to cope with existing but not present entries
  x86/platform: Increase maximum GPIO number for X86_64
  x86/pat: Pass valid address to sanitize_phys()
2021-09-19 13:29:36 -07:00
Tim Gardner
85784470ef x86/smp: Remove unnecessary assignment to local var freq_scale
Coverity warns of an unused value in arch_scale_freq_tick():

  CID 100778 (#1 of 1): Unused value (UNUSED_VALUE)
  assigned_value: Assigning value 1024ULL to freq_scale here, but that stored
  value is overwritten before it can be used.

It was introduced by commit:

  e2b0d619b4 ("x86, sched: check for counters overflow in frequency invariant accounting")

Remove the variable initializer.

Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Giovanni Gherdovich <ggherdovich@suse.cz>
Link: https://lkml.kernel.org/r/20210910184405.24422-1-tim.gardner@canonical.com
2021-09-17 21:20:34 +02:00
Peter Zijlstra
09c413071e x86/xen: Make irq_disable() noinstr
vmlinux.o: warning: objtool: pv_ops[31]: native_irq_disable
vmlinux.o: warning: objtool: pv_ops[31]: __raw_callee_save_xen_irq_disable
vmlinux.o: warning: objtool: pv_ops[31]: xen_irq_disable_direct
vmlinux.o: warning: objtool: lock_is_held_type()+0x5b: call to pv_ops[31]() leaves .noinstr.text section

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20210624095148.933869441@infradead.org
2021-09-17 13:20:23 +02:00
Peter Zijlstra
d7bfc7d57c x86/xen: Make irq_enable() noinstr
vmlinux.o: warning: objtool: pv_ops[32]: native_irq_enable
vmlinux.o: warning: objtool: pv_ops[32]: __raw_callee_save_xen_irq_enable
vmlinux.o: warning: objtool: pv_ops[32]: xen_irq_enable_direct
vmlinux.o: warning: objtool: lock_is_held_type()+0xfe: call to pv_ops[32]() leaves .noinstr.text section

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20210624095148.872254932@infradead.org
2021-09-17 13:17:12 +02:00
Peter Zijlstra
20125c872a x86/xen: Make save_fl() noinstr
vmlinux.o: warning: objtool: pv_ops[30]: native_save_fl
vmlinux.o: warning: objtool: pv_ops[30]: __raw_callee_save_xen_save_fl
vmlinux.o: warning: objtool: pv_ops[30]: xen_save_fl_direct
vmlinux.o: warning: objtool: lockdep_hardirqs_off()+0x73: call to pv_ops[30]() leaves .noinstr.text section

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20210624095148.749712274@infradead.org
2021-09-17 13:14:44 +02:00
Peter Zijlstra
7361fac046 x86/xen: Make set_debugreg() noinstr
vmlinux.o: warning: objtool: pv_ops[2]: xen_set_debugreg
vmlinux.o: warning: objtool: pv_ops[2]: native_set_debugreg
vmlinux.o: warning: objtool: exc_debug()+0x3b: call to pv_ops[2]() leaves .noinstr.text section

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20210624095148.687755639@infradead.org
2021-09-17 13:14:39 +02:00
Peter Zijlstra
f4afb713e5 x86/xen: Make get_debugreg() noinstr
vmlinux.o: warning: objtool: pv_ops[1]: xen_get_debugreg
vmlinux.o: warning: objtool: pv_ops[1]: native_get_debugreg
vmlinux.o: warning: objtool: exc_debug()+0x25: call to pv_ops[1]() leaves .noinstr.text section

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20210624095148.625523645@infradead.org
2021-09-17 13:12:34 +02:00
Peter Zijlstra
209cfd0cbb x86/xen: Make write_cr2() noinstr
vmlinux.o: warning: objtool: pv_ops[42]: native_write_cr2
vmlinux.o: warning: objtool: pv_ops[42]: xen_write_cr2
vmlinux.o: warning: objtool: exc_nmi()+0x127: call to pv_ops[42]() leaves .noinstr.text section

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20210624095148.563524913@infradead.org
2021-09-17 13:12:16 +02:00
Peter Zijlstra
0a53c9acf4 x86/xen: Make read_cr2() noinstr
vmlinux.o: warning: objtool: pv_ops[41]: native_read_cr2
vmlinux.o: warning: objtool: pv_ops[41]: xen_read_cr2
vmlinux.o: warning: objtool: pv_ops[41]: xen_read_cr2_direct
vmlinux.o: warning: objtool: exc_double_fault()+0x15: call to pv_ops[41]() leaves .noinstr.text section

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20210624095148.500331616@infradead.org
2021-09-17 13:11:50 +02:00
Peter Zijlstra
2c36d87be4 x86/sev: Fix noinstr for vc_ghcb_invalidate()
vmlinux.o: warning: objtool: __sev_put_ghcb()+0x88: call to __memset() leaves .noinstr.text section
vmlinux.o: warning: objtool: __sev_es_nmi_complete()+0x39: call to __memset() leaves .noinstr.text section

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210624095148.250770465@infradead.org
2021-09-15 15:51:47 +02:00
Linus Torvalds
77e02cf57b memblock: introduce saner 'memblock_free_ptr()' interface
The boot-time allocation interface for memblock is a mess, with
'memblock_alloc()' returning a virtual pointer, but then you are
supposed to free it with 'memblock_free()' that takes a _physical_
address.

Not only is that all kinds of strange and illogical, but it actually
causes bugs, when people then use it like a normal allocation function,
and it fails spectacularly on a NULL pointer:

   https://lore.kernel.org/all/20210912140820.GD25450@xsang-OptiPlex-9020/

or just random memory corruption if the debug checks don't catch it:

   https://lore.kernel.org/all/61ab2d0c-3313-aaab-514c-e15b7aa054a0@suse.cz/

I really don't want to apply patches that treat the symptoms, when the
fundamental cause is this horribly confusing interface.

I started out looking at just automating a sane replacement sequence,
but because of this mix or virtual and physical addresses, and because
people have used the "__pa()" macro that can take either a regular
kernel pointer, or just the raw "unsigned long" address, it's all quite
messy.

So this just introduces a new saner interface for freeing a virtual
address that was allocated using 'memblock_alloc()', and that was kept
as a regular kernel pointer.  And then it converts a couple of users
that are obvious and easy to test, including the 'xbc_nodes' case in
lib/bootconfig.c that caused problems.

Reported-by: kernel test robot <oliver.sang@intel.com>
Fixes: 40caa127f3 ("init: bootconfig: Remove all bootconfig data when the init memory is removed")
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-14 13:23:22 -07:00
Thomas Gleixner
a2a8fd9a3e x86/fpu/signal: Change return code of restore_fpregs_from_user() to boolean
__fpu_sig_restore() only needs information about success or fail and no
real error code.

This cleans up the confusing conversion of the trap number, which is
returned by the *RSTOR() exception fixups, to an error code.

Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210908132526.084109938@linutronix.de
2021-09-14 21:10:04 +02:00
Thomas Gleixner
be00401441 x86/fpu/signal: Change return code of check_xstate_in_sigframe() to boolean
__fpu_sig_restore() only needs success/fail information and no detailed
error code.

Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210908132526.024024598@linutronix.de
2021-09-14 21:10:04 +02:00
Thomas Gleixner
1193f408cd x86/fpu/signal: Change return type of __fpu_restore_sig() to boolean
Now that fpu__restore_sig() returns a boolean get rid of the individual
error codes in __fpu_restore_sig() as well.

Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210908132525.966197097@linutronix.de
2021-09-14 21:10:03 +02:00
Thomas Gleixner
f3305be5fe x86/fpu/signal: Change return type of fpu__restore_sig() to boolean
None of the call sites cares about the error code. All they need to know is
whether the function succeeded or not.

Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210908132525.909065931@linutronix.de
2021-09-14 21:10:03 +02:00
Thomas Gleixner
ee4ecdfbd2 x86/signal: Change return type of restore_sigcontext() to boolean
None of the call sites cares about the return code. All they are interested
in is success or fail.

Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210908132525.851280949@linutronix.de
2021-09-14 21:10:03 +02:00
Thomas Gleixner
2af07f3a6e x86/fpu/signal: Change return type of copy_fpregs_to_sigframe() helpers to boolean
Now that copy_fpregs_to_sigframe() returns boolean the individual return
codes in the related helper functions do not make sense anymore. Change
them to return boolean success/fail.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210908132525.794334915@linutronix.de
2021-09-14 21:10:03 +02:00
Thomas Gleixner
052adee668 x86/fpu/signal: Change return type of copy_fpstate_to_sigframe() to boolean
None of the call sites cares about the actual return code. Change the
return type to boolean and return 'true' on success.

Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210908132525.736773588@linutronix.de
2021-09-14 21:10:03 +02:00
Thomas Gleixner
fcfb716332 x86/fpu/signal: Move xstate clearing out of copy_fpregs_to_sigframe()
When the direct saving of the FPU registers to the user space sigframe
fails, copy_fpregs_to_sigframe() attempts to clear the user buffer.

The most likely reason for such a fail is a page fault. As
copy_fpregs_to_sigframe() is invoked with pagefaults disabled the chance
that __clear_user() succeeds is minuscule.

Move the clearing out into the caller which replaces the
fault_in_pages_writeable() in that error handling path.

The return value confusion will be cleaned up separately.

Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210908132525.679356300@linutronix.de
2021-09-14 21:10:03 +02:00
Thomas Gleixner
4164a482a5 x86/fpu/signal: Move header zeroing out of xsave_to_user_sigframe()
There is no reason to have the header zeroing in the pagefault disabled
region. Do it upfront once.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210908132525.621674721@linutronix.de
2021-09-14 21:10:03 +02:00
Tony Luck
81065b35e2 x86/mce: Avoid infinite loop for copy from user recovery
There are two cases for machine check recovery:

1) The machine check was triggered by ring3 (application) code.
   This is the simpler case. The machine check handler simply queues
   work to be executed on return to user. That code unmaps the page
   from all users and arranges to send a SIGBUS to the task that
   triggered the poison.

2) The machine check was triggered in kernel code that is covered by
   an exception table entry. In this case the machine check handler
   still queues a work entry to unmap the page, etc. but this will
   not be called right away because the #MC handler returns to the
   fix up code address in the exception table entry.

Problems occur if the kernel triggers another machine check before the
return to user processes the first queued work item.

Specifically, the work is queued using the ->mce_kill_me callback
structure in the task struct for the current thread. Attempting to queue
a second work item using this same callback results in a loop in the
linked list of work functions to call. So when the kernel does return to
user, it enters an infinite loop processing the same entry for ever.

There are some legitimate scenarios where the kernel may take a second
machine check before returning to the user.

1) Some code (e.g. futex) first tries a get_user() with page faults
   disabled. If this fails, the code retries with page faults enabled
   expecting that this will resolve the page fault.

2) Copy from user code retries a copy in byte-at-time mode to check
   whether any additional bytes can be copied.

On the other side of the fence are some bad drivers that do not check
the return value from individual get_user() calls and may access
multiple user addresses without noticing that some/all calls have
failed.

Fix by adding a counter (current->mce_count) to keep track of repeated
machine checks before task_work() is called. First machine check saves
the address information and calls task_work_add(). Subsequent machine
checks before that task_work call back is executed check that the address
is in the same page as the first machine check (since the callback will
offline exactly one page).

Expected worst case is four machine checks before moving on (e.g. one
user access with page faults disabled, then a repeat to the same address
with page faults enabled ... repeat in copy tail bytes). Just in case
there is some code that loops forever enforce a limit of 10.

 [ bp: Massage commit message, drop noinstr, fix typo, extend panic
   messages. ]

Fixes: 5567d11c21 ("x86/mce: Send #MC singal from task work")
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/YT/IJ9ziLqmtqEPu@agluck-desk2.amr.corp.intel.com
2021-09-14 10:27:03 +02:00
Mario Limonciello
aa06e20f1b x86/ACPI: Don't add CPUs that are not online capable
A number of systems are showing "hotplug capable" CPUs when they
are not really hotpluggable.  This is because the MADT has extra
CPU entries to support different CPUs that may be inserted into
the socket with different numbers of cores.

Starting with ACPI 6.3 the spec has an Online Capable bit in the
MADT used to determine whether or not a CPU is hotplug capable
when the enabled bit is not set.

Link: https://uefi.org/htmlspecs/ACPI_Spec_6_4_html/05_ACPI_Software_Programming_Model/ACPI_Software_Programming_Model.html?#local-apic-flags
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-09-13 19:19:02 +02:00
Thomas Gleixner
4339d0c63c x86/fpu/signal: Clarify exception handling in restore_fpregs_from_user()
FPU restore from a signal frame can trigger various exceptions. The
exceptions are caught with an exception table entry. The handler of this
entry stores the trap number in EAX. The FPU specific fixup negates that
trap number to convert it into an negative error code.

Any other exception than #PF is fatal and recovery is not possible. This
relies on the fact that the #PF exception number is the same as EFAULT, but
that's not really obvious.

Remove the negation from the exception fixup as it really has no value and
check for X86_TRAP_PF at the call site.

There is still confusion due to the return code conversion for the error
case which will be cleaned up separately.

Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210908132525.506192488@linutronix.de
2021-09-13 18:26:05 +02:00
Thomas Gleixner
0c2e62ba04 x86/extable: Remove EX_TYPE_FAULT from MCE safe fixups
Now that the MC safe copy and FPU have been converted to use the MCE safe
fixup types remove EX_TYPE_FAULT from the list of types which MCE considers
to be safe to be recovered in kernel.

This removes the SGX exception handling of ENCLS from the #MC safe
handling, but according to the SGX wizards the current SGX implementations
cannot survive #MC on ENCLS:

  https://lore.kernel.org/r/YS+upEmTfpZub3s9@google.com

The code relies on the trap number being stored if ENCLS raised an
exception. That's still working, but it does no longer trick the MCE code
into assuming that #MC is handled correctly for ENCLS.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210908132525.445255957@linutronix.de
2021-09-13 18:23:00 +02:00
Thomas Gleixner
2cadf5248b x86/extable: Provide EX_TYPE_DEFAULT_MCE_SAFE and EX_TYPE_FAULT_MCE_SAFE
Provide exception fixup types which can be used to identify fixups which
allow in kernel #MC recovery and make them invoke the existing handlers.

These will be used at places where #MC recovery is handled correctly by the
caller.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210908132525.269689153@linutronix.de
2021-09-13 17:56:56 +02:00
Thomas Gleixner
46d28947d9 x86/extable: Rework the exception table mechanics
The exception table entries contain the instruction address, the fixup
address and the handler address. All addresses are relative. Storing the
handler address has a few downsides:

 1) Most handlers need to be exported

 2) Handlers can be defined everywhere and there is no overview about the
    handler types

 3) MCE needs to check the handler type to decide whether an in kernel #MC
    can be recovered. The functionality of the handler itself is not in any
    way special, but for these checks there need to be separate functions
    which in the worst case have to be exported.

    Some of these 'recoverable' exception fixups are pretty obscure and
    just reuse some other handler to spare code. That obfuscates e.g. the
    #MC safe copy functions. Cleaning that up would require more handlers
    and exports

Rework the exception fixup mechanics by storing a fixup type number instead
of the handler address and invoke the proper handler for each fixup
type. Also teach the extable sort to leave the type field alone.

This makes most handlers static except for special cases like the MCE
MSR fixup and the BPF fixup. This allows to add more types for cleaning up
the obscure places without adding more handler code and exports.

There is a marginal code size reduction for a production config and it
removes _eight_ exported symbols.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lkml.kernel.org/r/20210908132525.211958725@linutronix.de
2021-09-13 17:51:47 +02:00
Thomas Gleixner
083b32d6f4 x86/mce: Get rid of stray semicolons
and the random number of tabs.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210908132525.154428878@linutronix.de
2021-09-13 17:42:51 +02:00
Thomas Gleixner
e42404afc4 x86/mce: Deduplicate exception handling
Prepare code for further simplification. No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210908132525.096452100@linutronix.de
2021-09-13 17:00:23 +02:00
Thomas Gleixner
c2f4954c2d Merge branch 'linus' into smp/urgent
Ensure that all usage sites of get/put_online_cpus() except for the
struggler in drivers/thermal are gone. So the last user and the deprecated
inlines can be removed.
2021-09-11 00:38:47 +02:00
Linus Torvalds
192ad3c27a Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
 "ARM:
   - Page ownership tracking between host EL1 and EL2
   - Rely on userspace page tables to create large stage-2 mappings
   - Fix incompatibility between pKVM and kmemleak
   - Fix the PMU reset state, and improve the performance of the virtual
     PMU
   - Move over to the generic KVM entry code
   - Address PSCI reset issues w.r.t. save/restore
   - Preliminary rework for the upcoming pKVM fixed feature
   - A bunch of MM cleanups
   - a vGIC fix for timer spurious interrupts
   - Various cleanups

  s390:
   - enable interpretation of specification exceptions
   - fix a vcpu_idx vs vcpu_id mixup

  x86:
   - fast (lockless) page fault support for the new MMU
   - new MMU now the default
   - increased maximum allowed VCPU count
   - allow inhibit IRQs on KVM_RUN while debugging guests
   - let Hyper-V-enabled guests run with virtualized LAPIC as long as
     they do not enable the Hyper-V "AutoEOI" feature
   - fixes and optimizations for the toggling of AMD AVIC (virtualized
     LAPIC)
   - tuning for the case when two-dimensional paging (EPT/NPT) is
     disabled
   - bugfixes and cleanups, especially with respect to vCPU reset and
     choosing a paging mode based on CR0/CR4/EFER
   - support for 5-level page table on AMD processors

  Generic:
   - MMU notifier invalidation callbacks do not take mmu_lock unless
     necessary
   - improved caching of LRU kvm_memory_slot
   - support for histogram statistics
   - add statistics for halt polling and remote TLB flush requests"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (210 commits)
  KVM: Drop unused kvm_dirty_gfn_invalid()
  KVM: x86: Update vCPU's hv_clock before back to guest when tsc_offset is adjusted
  KVM: MMU: mark role_regs and role accessors as maybe unused
  KVM: MIPS: Remove a "set but not used" variable
  x86/kvm: Don't enable IRQ when IRQ enabled in kvm_wait
  KVM: stats: Add VM stat for remote tlb flush requests
  KVM: Remove unnecessary export of kvm_{inc,dec}_notifier_count()
  KVM: x86/mmu: Move lpage_disallowed_link further "down" in kvm_mmu_page
  KVM: x86/mmu: Relocate kvm_mmu_page.tdp_mmu_page for better cache locality
  Revert "KVM: x86: mmu: Add guest physical address check in translate_gpa()"
  KVM: x86/mmu: Remove unused field mmio_cached in struct kvm_mmu_page
  kvm: x86: Increase KVM_SOFT_MAX_VCPUS to 710
  kvm: x86: Increase MAX_VCPUS to 1024
  kvm: x86: Set KVM_MAX_VCPU_ID to 4*KVM_MAX_VCPUS
  KVM: VMX: avoid running vmx_handle_exit_irqoff in case of emulation
  KVM: x86/mmu: Don't freak out if pml5_root is NULL on 4-level host
  KVM: s390: index kvm->arch.idle_mask by vcpu_idx
  KVM: s390: Enable specification exception interpretation
  KVM: arm64: Trim guest debug exception handling
  KVM: SVM: Add 5-level page table support for SVM
  ...
2021-09-07 13:40:51 -07:00
Paolo Bonzini
e99314a340 Merge tag 'kvmarm-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for 5.15

- Page ownership tracking between host EL1 and EL2

- Rely on userspace page tables to create large stage-2 mappings

- Fix incompatibility between pKVM and kmemleak

- Fix the PMU reset state, and improve the performance of the virtual PMU

- Move over to the generic KVM entry code

- Address PSCI reset issues w.r.t. save/restore

- Preliminary rework for the upcoming pKVM fixed feature

- A bunch of MM cleanups

- a vGIC fix for timer spurious interrupts

- Various cleanups
2021-09-06 06:34:48 -04:00
Lai Jiangshan
a40b2fd064 x86/kvm: Don't enable IRQ when IRQ enabled in kvm_wait
Commit f4e61f0c9a ("x86/kvm: Fix broken irq restoration in kvm_wait")
replaced "local_irq_restore() when IRQ enabled" with "local_irq_enable()
when IRQ enabled" to suppress a warnning.

Although there is no similar debugging warnning for doing local_irq_enable()
when IRQ enabled as doing local_irq_restore() in the same IRQ situation.  But
doing local_irq_enable() when IRQ enabled is no less broken as doing
local_irq_restore() and we'd better avoid it.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20210814035129.154242-1-jiangshanlai@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-06 06:32:35 -04:00
Linus Torvalds
14726903c8 Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton:
 "173 patches.

  Subsystems affected by this series: ia64, ocfs2, block, and mm (debug,
  pagecache, gup, swap, shmem, memcg, selftests, pagemap, mremap,
  bootmem, sparsemem, vmalloc, kasan, pagealloc, memory-failure,
  hugetlb, userfaultfd, vmscan, compaction, mempolicy, memblock,
  oom-kill, migration, ksm, percpu, vmstat, and madvise)"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (173 commits)
  mm/madvise: add MADV_WILLNEED to process_madvise()
  mm/vmstat: remove unneeded return value
  mm/vmstat: simplify the array size calculation
  mm/vmstat: correct some wrong comments
  mm/percpu,c: remove obsolete comments of pcpu_chunk_populated()
  selftests: vm: add COW time test for KSM pages
  selftests: vm: add KSM merging time test
  mm: KSM: fix data type
  selftests: vm: add KSM merging across nodes test
  selftests: vm: add KSM zero page merging test
  selftests: vm: add KSM unmerge test
  selftests: vm: add KSM merge test
  mm/migrate: correct kernel-doc notation
  mm: wire up syscall process_mrelease
  mm: introduce process_mrelease system call
  memblock: make memblock_find_in_range method private
  mm/mempolicy.c: use in_task() in mempolicy_slab_node()
  mm/mempolicy: unify the create() func for bind/interleave/prefer-many policies
  mm/mempolicy: advertise new MPOL_PREFERRED_MANY
  mm/hugetlb: add support for mempolicy MPOL_PREFERRED_MANY
  ...
2021-09-03 10:08:28 -07:00
Mike Rapoport
a7259df767 memblock: make memblock_find_in_range method private
There are a lot of uses of memblock_find_in_range() along with
memblock_reserve() from the times memblock allocation APIs did not exist.

memblock_find_in_range() is the very core of memblock allocations, so any
future changes to its internal behaviour would mandate updates of all the
users outside memblock.

Replace the calls to memblock_find_in_range() with an equivalent calls to
memblock_phys_alloc() and memblock_phys_alloc_range() and make
memblock_find_in_range() private method of memblock.

This simplifies the callers, ensures that (unlikely) errors in
memblock_reserve() are handled and improves maintainability of
memblock_find_in_range().

Link: https://lkml.kernel.org/r/20210816122622.30279-1-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>		[arm64]
Acked-by: Kirill A. Shutemov <kirill.shtuemov@linux.intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	[ACPI]
Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Acked-by: Nick Kossifidis <mick@ics.forth.gr>			[riscv]
Tested-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-03 09:58:17 -07:00
Vasily Averin
ec403e2ae0 memcg: enable accounting for ldt_struct objects
Each task can request own LDT and force the kernel to allocate up to 64Kb
memory per-mm.

There are legitimate workloads with hundreds of processes and there can be
hundreds of workloads running on large machines.  The unaccounted memory
can cause isolation issues between the workloads particularly on highly
utilized machines.

It makes sense to account for this objects to restrict the host's memory
consumption from inside the memcg-limited container.

Link: https://lkml.kernel.org/r/38010594-50fe-c06d-7cb0-d1f77ca422f3@virtuozzo.com
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Acked-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jiri Slaby <jirislaby@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Yutian Yang <nglaive@gmail.com>
Cc: Zefan Li <lizefan.x@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-03 09:58:13 -07:00
Linus Torvalds
4a3bb4200a Merge tag 'dma-mapping-5.15' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping updates from Christoph Hellwig:

 - fix debugfs initialization order (Anthony Iliopoulos)

 - use memory_intersects() directly (Kefeng Wang)

 - allow to return specific errors from ->map_sg (Logan Gunthorpe,
   Martin Oliveira)

 - turn the dma_map_sg return value into an unsigned int (me)

 - provide a common global coherent pool іmplementation (me)

* tag 'dma-mapping-5.15' of git://git.infradead.org/users/hch/dma-mapping: (31 commits)
  hexagon: use the generic global coherent pool
  dma-mapping: make the global coherent pool conditional
  dma-mapping: add a dma_init_global_coherent helper
  dma-mapping: simplify dma_init_coherent_memory
  dma-mapping: allow using the global coherent pool for !ARM
  ARM/nommu: use the generic dma-direct code for non-coherent devices
  dma-direct: add support for dma_coherent_default_memory
  dma-mapping: return an unsigned int from dma_map_sg{,_attrs}
  dma-mapping: disallow .map_sg operations from returning zero on error
  dma-mapping: return error code from dma_dummy_map_sg()
  x86/amd_gart: don't set failed sg dma_address to DMA_MAPPING_ERROR
  x86/amd_gart: return error code from gart_map_sg()
  xen: swiotlb: return error code from xen_swiotlb_map_sg()
  parisc: return error code from .map_sg() ops
  sparc/iommu: don't set failed sg dma_address to DMA_MAPPING_ERROR
  sparc/iommu: return error codes from .map_sg() ops
  s390/pci: don't set failed sg dma_address to DMA_MAPPING_ERROR
  s390/pci: return error code from s390_dma_map_sg()
  powerpc/iommu: don't set failed sg dma_address to DMA_MAPPING_ERROR
  powerpc/iommu: return error code from .map_sg() ops
  ...
2021-09-02 10:32:06 -07:00
Linus Torvalds
df43d90382 Merge tag 'printk-for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux
Pull printk updates from Petr Mladek:

 - Optionally, provide an index of possible printk messages via
   <debugfs>/printk/index/. It can be used when monitoring important
   kernel messages on a farm of various hosts. The monitor has to be
   updated when some messages has changed or are not longer available by
   a newly deployed kernel.

 - Add printk.console_no_auto_verbose boot parameter. It allows to
   generate crash dump even with slow consoles in a reasonable time
   frame.

 - Remove printk_safe buffers. The messages are always stored directly
   to the main logbuffer, even in NMI or recursive context. Also it
   allows to serialize syslog operations by a mutex instead of a spin
   lock.

 - Misc clean up and build fixes.

* tag 'printk-for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
  printk/index: Fix -Wunused-function warning
  lib/nmi_backtrace: Serialize even messages about idle CPUs
  printk: Add printk.console_no_auto_verbose boot parameter
  printk: Remove console_silent()
  lib/test_scanf: Handle n_bits == 0 in random tests
  printk: syslog: close window between wait and read
  printk: convert @syslog_lock to mutex
  printk: remove NMI tracking
  printk: remove safe buffers
  printk: track/limit recursion
  lib/nmi_backtrace: explicitly serialize banner and regs
  printk: Move the printk() kerneldoc comment to its new home
  printk/index: Fix warning about missing prototypes
  MIPS/asm/printk: Fix build failure caused by printk
  printk: index: Add indexing support to dev_printk
  printk: Userspace format indexing support
  printk: Rework parse_prefix into printk_parse_prefix
  printk: Straighten out log_flags into printk_info_flags
  string_helpers: Escape double quotes in escape_special
  printk/console: Check consistent sequence number when handling race in console_unlock()
2021-09-01 18:41:13 -07:00
Linus Torvalds
c07f191907 Merge tag 'hyperv-next-signed-20210831' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv updates from Wei Liu:

 - make Hyper-V code arch-agnostic (Michael Kelley)

 - fix sched_clock behaviour on Hyper-V (Ani Sinha)

 - fix a fault when Linux runs as the root partition on MSHV (Praveen
   Kumar)

 - fix VSS driver (Vitaly Kuznetsov)

 - cleanup (Sonia Sharma)

* tag 'hyperv-next-signed-20210831' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
  hv_utils: Set the maximum packet size for VSS driver to the length of the receive buffer
  Drivers: hv: Enable Hyper-V code to be built on ARM64
  arm64: efi: Export screen_info
  arm64: hyperv: Initialize hypervisor on boot
  arm64: hyperv: Add panic handler
  arm64: hyperv: Add Hyper-V hypercall and register access utilities
  x86/hyperv: fix root partition faults when writing to VP assist page MSR
  hv: hyperv.h: Remove unused inline functions
  drivers: hv: Decouple Hyper-V clock/timer code from VMbus drivers
  x86/hyperv: add comment describing TSC_INVARIANT_CONTROL MSR setting bit 0
  Drivers: hv: Move Hyper-V misc functionality to arch-neutral code
  Drivers: hv: Add arch independent default functions for some Hyper-V handlers
  Drivers: hv: Make portions of Hyper-V init code be arch neutral
  x86/hyperv: fix for unwanted manipulation of sched_clock when TSC marked unstable
  asm-generic/hyperv: Add missing #include of nmi.h
2021-09-01 18:25:20 -07:00
Linus Torvalds
48983701a1 Merge branch 'siginfo-si_trapno-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull siginfo si_trapno updates from Eric Biederman:
 "The full set of si_trapno changes was not appropriate as a fix for the
  newly added SIGTRAP TRAP_PERF, and so I postponed the rest of the
  related cleanups.

  This is the rest of the cleanups for si_trapno that reduces it from
  being a really weird arch special case that is expect to be always
  present (but isn't) on the architectures that support it to being yet
  another field in the _sigfault union of struct siginfo.

  The changes have been reviewed and marinated in linux-next. With the
  removal of this awkward special case new code (like SIGTRAP TRAP_PERF)
  that works across architectures should be easier to write and
  maintain"

* 'siginfo-si_trapno-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  signal: Rename SIL_PERF_EVENT SIL_FAULT_PERF_EVENT for consistency
  signal: Verify the alignment and size of siginfo_t
  signal: Remove the generic __ARCH_SI_TRAPNO support
  signal/alpha: si_trapno is only used with SIGFPE and SIGTRAP TRAP_UNK
  signal/sparc: si_trapno is only used with SIGILL ILL_ILLTRP
  arm64: Add compile-time asserts for siginfo_t offsets
  arm: Add compile-time asserts for siginfo_t offsets
  sparc64: Add compile-time asserts for siginfo_t offsets
2021-09-01 14:42:36 -07:00
Linus Torvalds
477f70cd2a Merge tag 'drm-next-2021-08-31-1' of git://anongit.freedesktop.org/drm/drm
Pull drm updates from Dave Airlie:
 "Highlights:

   - i915 has seen a lot of refactoring and uAPI cleanups due to a
     change in the upstream direction going forward

     This has all been audited with known userspace, but there may be
     some pitfalls that were missed.

   - i915 now uses common TTM to enable discrete memory on DG1/2 GPUs

   - i915 enables Jasper and Elkhart Lake by default and has preliminary
     XeHP/DG2 support

   - amdgpu adds support for Cyan Skillfish

   - lots of implicit fencing rules documented and fixed up in drivers

   - msm now uses the core scheduler

   - the irq midlayer has been removed for non-legacy drivers

   - the sysfb code now works on more than x86.

  Otherwise the usual smattering of stuff everywhere, panels, bridges,
  refactorings.

  Detailed summary:

  core:
   - extract i915 eDP backlight into core
   - DP aux bus support
   - drm_device.irq_enabled removed
   - port drivers to native irq interfaces
   - export gem shadow plane handling for vgem
   - print proper driver name in framebuffer registration
   - driver fixes for implicit fencing rules
   - ARM fixed rate compression modifier added
   - updated fb damage handling
   - rmfb ioctl logging/docs
   - drop drm_gem_object_put_locked
   - define DRM_FORMAT_MAX_PLANES
   - add gem fb vmap/vunmap helpers
   - add lockdep_assert(once) helpers
   - mark drm irq midlayer as legacy
   - use offset adjusted bo mapping conversion

  vgaarb:
   - cleanups

  fbdev:
   - extend efifb handling to all arches
   - div by 0 fixes for multiple drivers

  udmabuf:
   - add hugepage mapping support

  dma-buf:
   - non-dynamic exporter fixups
   - document implicit fencing rules

  amdgpu:
   - Initial Cyan Skillfish support
   - switch virtual DCE over to vkms based atomic
   - VCN/JPEG power down fixes
   - NAVI PCIE link handling fixes
   - AMD HDMI freesync fixes
   - Yellow Carp + Beige Goby fixes
   - Clockgating/S0ix/SMU/EEPROM fixes
   - embed hw fence in job
   - rework dma-resv handling
   - ensure eviction to system ram

  amdkfd:
   - uapi: SVM address range query added
   - sysfs leak fix
   - GPUVM TLB optimizations
   - vmfault/migration counters

  i915:
   - Enable JSL and EHL by default
   - preliminary XeHP/DG2 support
   - remove all CNL support (never shipped)
   - move to TTM for discrete memory support
   - allow mixed object mmap handling
   - GEM uAPI spring cleaning
       - add I915_MMAP_OBJECT_FIXED
       - reinstate ADL-P mmap ioctls
       - drop a bunch of unused by userspace features
       - disable and remove GPU relocations
   - revert some i915 misfeatures
   - major refactoring of GuC for Gen11+
   - execbuffer object locking separate step
   - reject caching/set-domain on discrete
   - Enable pipe DMC loading on XE-LPD and ADL-P
   - add PSF GV point support
   - Refactor and fix DDI buffer translations
   - Clean up FBC CFB allocation code
   - Finish INTEL_GEN() and friends macro conversions

  nouveau:
   - add eDP backlight support
   - implicit fence fix

  msm:
   - a680/7c3 support
   - drm/scheduler conversion

  panfrost:
   - rework GPU reset

  virtio:
   - fix fencing for planes

  ast:
   - add detect support

  bochs:
   - move to tiny GPU driver

  vc4:
   - use hotplug irqs
   - HDMI codec support

  vmwgfx:
   - use internal vmware device headers

  ingenic:
   - demidlayering irq

  rcar-du:
   - shutdown fixes
   - convert to bridge connector helpers

  zynqmp-dsub:
   - misc fixes

  mgag200:
   - convert PLL handling to atomic

  mediatek:
   - MT8133 AAL support
   - gem mmap object support
   - MT8167 support

  etnaviv:
   - NXP Layerscape LS1028A SoC support
   - GEM mmap cleanups

  tegra:
   - new user API

  exynos:
   - missing unlock fix
   - build warning fix
   - use refcount_t"

* tag 'drm-next-2021-08-31-1' of git://anongit.freedesktop.org/drm/drm: (1318 commits)
  drm/amd/display: Move AllowDRAMSelfRefreshOrDRAMClockChangeInVblank to bounding box
  drm/amd/display: Remove duplicate dml init
  drm/amd/display: Update bounding box states (v2)
  drm/amd/display: Update number of DCN3 clock states
  drm/amdgpu: disable GFX CGCG in aldebaran
  drm/amdgpu: Clear RAS interrupt status on aldebaran
  drm/amdgpu: Add support for RAS XGMI err query
  drm/amdkfd: Account for SH/SE count when setting up cu masks.
  drm/amdgpu: rename amdgpu_bo_get_preferred_pin_domain
  drm/amdgpu: drop redundant cancel_delayed_work_sync call
  drm/amdgpu: add missing cleanups for more ASICs on UVD/VCE suspend
  drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend
  drm/amdkfd: map SVM range with correct access permission
  drm/amdkfd: check access permisson to restore retry fault
  drm/amdgpu: Update RAS XGMI Error Query
  drm/amdgpu: Add driver infrastructure for MCA RAS
  drm/amd/display: Add Logging for HDMI color depth information
  drm/amd/amdgpu: consolidate PSP TA init shared buf functions
  drm/amd/amdgpu: add name field back to ras_common_if
  drm/amdgpu: Fix build with missing pm_suspend_target_state module export
  ...
2021-09-01 11:26:46 -07:00
Nathan Chancellor
ea7b4244b3 x86/setup: Explicitly include acpi.h
After commit 342f43af70 ("iscsi_ibft: fix crash due to KASLR physical
memory remapping") x86_64_defconfig shows the following errors:

  arch/x86/kernel/setup.c: In function ‘setup_arch’:
  arch/x86/kernel/setup.c:916:13: error: implicit declaration of function ‘acpi_mps_check’ [-Werror=implicit-function-declaration]
    916 |         if (acpi_mps_check()) {
        |             ^~~~~~~~~~~~~~
  arch/x86/kernel/setup.c:1110:9: error: implicit declaration of function ‘acpi_table_upgrade’ [-Werror=implicit-function-declaration]
   1110 |         acpi_table_upgrade();
        |         ^~~~~~~~~~~~~~~~~~
  [... more acpi noise ...]

acpi.h was being implicitly included from iscsi_ibft.h in this
configuration so the removal of that header means these functions have
no definition or declaration.

In most other configurations, <linux/acpi.h> continued to be included
through at least <linux/tboot.h> if CONFIG_INTEL_TXT was enabled, and
there were probably other implicit include paths too.

Add acpi.h explicitly so there is no more error, and so that we don't
continue to depend on these unreliable implicit include paths.

Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Cc: Maurizio Lombardi <mlombard@redhat.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Konrad Rzeszutek Wilk <konrad@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-01 10:17:05 -07:00
Thomas Gleixner
4b92d4add5 drivers: base: cacheinfo: Get rid of DEFINE_SMP_CALL_CACHE_FUNCTION()
DEFINE_SMP_CALL_CACHE_FUNCTION() was usefel before the CPU hotplug rework
to ensure that the cache related functions are called on the upcoming CPU
because the notifier itself could run on any online CPU.

The hotplug state machine guarantees that the callbacks are invoked on the
upcoming CPU. So there is no need to have this SMP function call
obfuscation. That indirection was missed when the hotplug notifiers were
converted.

This also solves the problem of ARM64 init_cache_level() invoking ACPI
functions which take a semaphore in that context. That's invalid as SMP
function calls run with interrupts disabled. Running it just from the
callback in context of the CPU hotplug thread solves this.

Fixes: 8571890e15 ("arm64: Add support for ACPI based firmware tables")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/871r69ersb.ffs@tglx
2021-09-01 10:29:10 +02:00
Linus Torvalds
81b0b29bf7 Merge branch 'stable/for-linus-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/ibft
Pull ibft updates from Konrad Rzeszutek Wilk:
 "A fix for iBFT parsing code badly interfacing when KASLR is enabled"

* 'stable/for-linus-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/ibft:
  iscsi_ibft: fix warning in reserve_ibft_region()
  iscsi_ibft: fix crash due to KASLR physical memory remapping
2021-08-31 15:28:21 -07:00
Linus Torvalds
e7c1bbcf0c Merge tag 'hwmon-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon updates from Guenter Roeck:
 "New drivers for:

   - Aquacomputer D5 Next

   - SB-RMI power module

  Added chip support to existing drivers:

   - Support for various Zen2 and Zen3 APUs and for Yellow Carp (SMU
     v13) added to k10temp driver

   - Support for Silicom n5010 PAC added to intel-m10-bmc driver

   - Support for BPD-RS600 added to pmbus/bpa-rs600 driver

  Other notable changes:

   - In k10temp, do not display Tdie on Zen CPUs if there is no
     difference between Tdie and Tctl

   - Converted adt7470 and dell-smm drivers to use
     devm_hwmon_device_register_with_info API

   - Support for temperature/pwm tables added to axi-fan-control driver

   - Enabled fan control for Dell Precision 7510 in dell-smm driver

  Various other minor improvements and fixes in several drivers"

* tag 'hwmon-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: (41 commits)
  hwmon: add driver for Aquacomputer D5 Next
  hwmon: (adt7470) Convert to devm_hwmon_device_register_with_info API
  hwmon: (adt7470) Convert to use regmap
  hwmon: (adt7470) Fix some style issues
  hwmon: (k10temp) Add support for yellow carp
  hwmon: (k10temp) Rework the temperature offset calculation
  hwmon: (k10temp) Don't show Tdie for all Zen/Zen2/Zen3 CPU/APU
  hwmon: (k10temp) Add additional missing Zen2 and Zen3 APUs
  hwmon: remove amd_energy driver in Makefile
  hwmon: (dell-smm) Rework SMM function debugging
  hwmon: (dell-smm) Mark i8k_get_fan_nominal_speed as __init
  hwmon: (dell-smm) Mark tables as __initconst
  hwmon: (pmbus/bpa-rs600) Add workaround for incorrect Pin max
  hwmon: (pmbus/bpa-rs600) Don't use rated limits as warn limits
  hwmon: (axi-fan-control) Support temperature vs pwm points
  hwmon: (axi-fan-control) Handle irqs in natural order
  hwmon: (axi-fan-control) Make sure the clock is enabled
  hwmon: (pmbus/ibm-cffps) Fix write bits for LED control
  hwmon: (w83781d) Match on device tree compatibles
  dt-bindings: hwmon: Add bindings for Winbond W83781D
  ...
2021-08-31 14:37:41 -07:00
Linus Torvalds
bed9166741 Merge tag 'x86-misc-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 updates from Thomas Gleixner:
 "A set of updates for the x86 reboot code:

   - Limit the Dell Optiplex 990 quirk to early BIOS versions to avoid
     the full 'power cycle' alike reboot which is required for the buggy
     BIOSes.

   - Update documentation for the reboot=pci command line option and
     document how DMI platform quirks can be overridden"

* tag 'x86-misc-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/reboot: Limit Dell Optiplex 990 quirk to early BIOS versions
  x86/reboot: Document how to override DMI platform quirks
  x86/reboot: Document the "reboot=pci" option
2021-08-30 15:27:15 -07:00
Linus Torvalds
ccd8ec4a3f Merge tag 'x86-irq-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 PIRQ updates from Thomas Gleixner:
 "A set of updates to support port 0x22/0x23 based PCI configuration
  space which can be found on various ALi chipsets and is also available
  on older Intel systems which expose a PIRQ router.

  While the Intel support is more or less nostalgia, the ALi chips are
  still in use on popular embedded boards used for routers"

* tag 'x86-irq-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Fix typo s/ECLR/ELCR/ for the PIC register
  x86: Avoid magic number with ELCR register accesses
  x86/PCI: Add support for the Intel 82426EX PIRQ router
  x86/PCI: Add support for the Intel 82374EB/82374SB (ESC) PIRQ router
  x86/PCI: Add support for the ALi M1487 (IBC) PIRQ router
  x86: Add support for 0x22/0x23 port I/O configuration space
2021-08-30 15:20:05 -07:00
Linus Torvalds
0a096f240a Merge tag 'x86-cpu-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cache flush updates from Thomas Gleixner:
 "A reworked version of the opt-in L1D flush mechanism.

  This is a stop gap for potential future speculation related hardware
  vulnerabilities and a mechanism for truly security paranoid
  applications.

  It allows a task to request that the L1D cache is flushed when the
  kernel switches to a different mm. This can be requested via prctl().

  Changes vs the previous versions:

   - Get rid of the software flush fallback

   - Make the handling consistent with other mitigations

   - Kill the task when it ends up on a SMT enabled core which defeats
     the purpose of L1D flushing obviously"

* tag 'x86-cpu-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Documentation: Add L1D flushing Documentation
  x86, prctl: Hook L1D flushing in via prctl
  x86/mm: Prepare for opt-in based L1D flush in switch_mm()
  x86/process: Make room for TIF_SPEC_L1D_FLUSH
  sched: Add task_work callback for paranoid L1D flush
  x86/mm: Refactor cond_ibpb() to support other use cases
  x86/smp: Add a per-cpu view of SMT state
2021-08-30 15:00:33 -07:00
Linus Torvalds
4a2b88eb02 Merge tag 'perf-core-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 perf event updates from Ingo Molnar:

 - Add support for Intel Sapphire Rapids server CPU uncore events

 - Allow the AMD uncore driver to be built as a module

 - Misc cleanups and fixes

* tag 'perf-core-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
  perf/x86/amd/ibs: Add bitfield definitions in new <asm/amd-ibs.h> header
  perf/amd/uncore: Allow the driver to be built as a module
  x86/cpu: Add get_llc_id() helper function
  perf/amd/uncore: Clean up header use, use <linux/ include paths instead of <asm/
  perf/amd/uncore: Simplify code, use free_percpu()'s built-in check for NULL
  perf/hw_breakpoint: Replace deprecated CPU-hotplug functions
  perf/x86/intel: Replace deprecated CPU-hotplug functions
  perf/x86: Remove unused assignment to pointer 'e'
  perf/x86/intel/uncore: Fix IIO cleanup mapping procedure for SNR/ICX
  perf/x86/intel/uncore: Support IMC free-running counters on Sapphire Rapids server
  perf/x86/intel/uncore: Support IIO free-running counters on Sapphire Rapids server
  perf/x86/intel/uncore: Factor out snr_uncore_mmio_map()
  perf/x86/intel/uncore: Add alias PMU name
  perf/x86/intel/uncore: Add Sapphire Rapids server MDF support
  perf/x86/intel/uncore: Add Sapphire Rapids server M3UPI support
  perf/x86/intel/uncore: Add Sapphire Rapids server UPI support
  perf/x86/intel/uncore: Add Sapphire Rapids server M2M support
  perf/x86/intel/uncore: Add Sapphire Rapids server IMC support
  perf/x86/intel/uncore: Add Sapphire Rapids server PCU support
  perf/x86/intel/uncore: Add Sapphire Rapids server M2PCIe support
  ...
2021-08-30 13:50:20 -07:00