linux

Author	SHA1	Message	Date
Borislav Petkov	1e66e2b862	x86/apic: Use dead_cpu instead of current CPU when cleaning up x2apic_dead_cpu() cleans up the leftovers of a CPU which got unplugged, but instead of clearing the dead cpu bit in the cluster mask it clears the current (alive) cpu bit. Noticed because smp_processor_id() is called in preemptible code and triggers a debug warning. [ tglx: Rewrote changelog ] Fixes: `023a611748` ("x86/apic/x2apic: Simplify cluster management") Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20170926170845.13955-1-bp@alien8.de	2017-09-27 09:37:41 +02:00
Ingo Molnar	8474c532b5	Merge branch 'WIP.x86/fpu' into x86/fpu, because it's ready Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-09-26 10:17:43 +02:00
Eric Biggers	738f48cb5f	x86/fpu: Use using_compacted_format() instead of open coded X86_FEATURE_XSAVES This is the canonical method to use. Signed-off-by: Eric Biggers <ebiggers@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kevin Hao <haokexin@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Halcrow <mhalcrow@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Cc: kernel-hardening@lists.openwall.com Link: http://lkml.kernel.org/r/20170924105913.9157-11-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-09-26 09:43:48 +02:00
Eric Biggers	98c0fad9d6	x86/fpu: Use validate_xstate_header() to validate the xstate_header in copy_user_to_xstate() Tighten the checks in copy_user_to_xstate(). Signed-off-by: Eric Biggers <ebiggers@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kevin Hao <haokexin@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Halcrow <mhalcrow@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Cc: kernel-hardening@lists.openwall.com Link: http://lkml.kernel.org/r/20170924105913.9157-10-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-09-26 09:43:48 +02:00
Eric Biggers	3d703477bc	x86/fpu: Eliminate the 'xfeatures' local variable in copy_user_to_xstate() We now have this field in hdr.xfeatures. Signed-off-by: Eric Biggers <ebiggers@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kevin Hao <haokexin@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Halcrow <mhalcrow@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Cc: kernel-hardening@lists.openwall.com Link: http://lkml.kernel.org/r/20170924105913.9157-9-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-09-26 09:43:48 +02:00
Eric Biggers	af2c4322d9	x86/fpu: Copy the full header in copy_user_to_xstate() This is in preparation to verify the full xstate header as supplied by user-space. Signed-off-by: Eric Biggers <ebiggers@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kevin Hao <haokexin@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Halcrow <mhalcrow@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Cc: kernel-hardening@lists.openwall.com Link: http://lkml.kernel.org/r/20170924105913.9157-8-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-09-26 09:43:47 +02:00
Eric Biggers	af95774b3c	x86/fpu: Use validate_xstate_header() to validate the xstate_header in copy_kernel_to_xstate() Tighten the checks in copy_kernel_to_xstate(). Signed-off-by: Eric Biggers <ebiggers@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kevin Hao <haokexin@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Halcrow <mhalcrow@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Cc: kernel-hardening@lists.openwall.com Link: http://lkml.kernel.org/r/20170924105913.9157-7-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-09-26 09:43:47 +02:00
Eric Biggers	b89eda482d	x86/fpu: Eliminate the 'xfeatures' local variable in copy_kernel_to_xstate() We have this information in the xstate_header. Signed-off-by: Eric Biggers <ebiggers@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kevin Hao <haokexin@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Halcrow <mhalcrow@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Cc: kernel-hardening@lists.openwall.com Link: http://lkml.kernel.org/r/20170924105913.9157-6-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-09-26 09:43:46 +02:00
Eric Biggers	80d8ae86b3	x86/fpu: Copy the full state_header in copy_kernel_to_xstate() This is in preparation to verify the full xstate header as supplied by user-space. Signed-off-by: Eric Biggers <ebiggers@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kevin Hao <haokexin@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Halcrow <mhalcrow@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Cc: kernel-hardening@lists.openwall.com Link: http://lkml.kernel.org/r/20170924105913.9157-5-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-09-26 09:43:46 +02:00
Eric Biggers	b11e2e18a7	x86/fpu: Use validate_xstate_header() to validate the xstate_header in __fpu__restore_sig() Tighten the checks in __fpu__restore_sig() and update comments. Signed-off-by: Eric Biggers <ebiggers@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kevin Hao <haokexin@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Halcrow <mhalcrow@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Cc: kernel-hardening@lists.openwall.com Link: http://lkml.kernel.org/r/20170924105913.9157-4-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-09-26 09:43:46 +02:00
Eric Biggers	cf9df81b13	x86/fpu: Use validate_xstate_header() to validate the xstate_header in xstateregs_set() Tighten the checks in xstateregs_set(). Signed-off-by: Eric Biggers <ebiggers@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kevin Hao <haokexin@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Halcrow <mhalcrow@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Cc: kernel-hardening@lists.openwall.com Link: http://lkml.kernel.org/r/20170924105913.9157-3-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-09-26 09:43:45 +02:00
Eric Biggers	e63e5d5c15	x86/fpu: Introduce validate_xstate_header() Move validation of user-supplied xstate_header into a helper function, in preparation of calling it from both the ptrace and sigreturn syscall paths. The new function also considers it to be an error if any reserved bits are set, whereas before we were just clearing most of them silently. This should reduce the chance of bugs that fail to correctly validate user-supplied XSAVE areas. It also will expose any broken userspace programs that set the other reserved bits; this is desirable because such programs will lose compatibility with future CPUs and kernels if those bits are ever used for anything. (There shouldn't be any such programs, and in fact in the case where the compacted format is in use we were already validating xfeatures. But you never know...) Signed-off-by: Eric Biggers <ebiggers@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kevin Hao <haokexin@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Halcrow <mhalcrow@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Cc: kernel-hardening@lists.openwall.com Link: http://lkml.kernel.org/r/20170924105913.9157-2-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-09-26 09:43:45 +02:00
Ingo Molnar	369a036de2	x86/fpu: Rename fpu__activate_fpstate_read/write() to fpu__prepare_[read\|write]() As per the new nomenclature we don't 'activate' the FPU state anymore, we initialize it. So drop the _activate_fpstate name from these functions, which were a bit of a mouthful anyway, and name them: fpu__prepare_read() fpu__prepare_write() Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-09-26 09:43:44 +02:00
Ingo Molnar	2ce03d850b	x86/fpu: Rename fpu__activate_curr() to fpu__initialize() Rename this function to better express that it's all about initializing the FPU state of a task which goes hand in hand with the fpu::initialized field. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/20170923130016.21448-33-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-09-26 09:43:44 +02:00
Ingo Molnar	e10078eba6	x86/fpu: Simplify and speed up fpu__copy() fpu__copy() has a preempt_disable()/enable() pair, which it had to do to be able to atomically unlazy the current task when doing an FNSAVE. But we don't unlazy tasks anymore, we always do direct saves/restores of FPU context. So remove both the unnecessary critical section, and update the comments. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/20170923130016.21448-32-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-09-26 09:43:44 +02:00
Ingo Molnar	7f1487c59b	x86/fpu: Fix stale comments about lazy FPU logic We don't do any lazy restore anymore, what we have are two pieces of optimization: - no-FPU tasks that don't save/restore the FPU context (kernel threads are such) - cached FPU registers maintained via the fpu->last_cpu field. This means that if an FPU task context switches to a non-FPU task then we can maintain the FPU registers as an in-FPU copies (cache), and skip the restoration of them once we switch back to the original FPU-using task. Update all the comments that still referred to old 'lazy' and 'unlazy' concepts. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/20170923130016.21448-31-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-09-26 09:43:43 +02:00
Ingo Molnar	e4a81bfcaa	x86/fpu: Rename fpu::fpstate_active to fpu::initialized The x86 FPU code used to have a complex state machine where both the FPU registers and the FPU state context could be 'active' (or inactive) independently of each other - which enabled features like lazy FPU restore. Much of this complexity is gone in the current code: now we basically can have FPU-less tasks (kernel threads) that don't use (and save/restore) FPU state at all, plus full FPU users that save/restore directly with no laziness whatsoever. But the fpu::fpstate_active still carries bits of the old complexity - meanwhile this flag has become a simple flag that shows whether the FPU context saving area in the thread struct is initialized and used, or not. Rename it to fpu::initialized to express this simplicity in the name as well. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/20170923130016.21448-30-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-09-26 09:43:36 +02:00
Ingo Molnar	685c930d6e	x86/fpu: Remove fpu__current_fpstate_write_begin/end() These functions are not used anymore, so remove them. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/20170923130016.21448-29-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-09-26 09:42:20 +02:00
Ingo Molnar	4618e90965	x86/fpu: Fix fpu__activate_fpstate_read() and update comments fpu__activate_fpstate_read() can be called for the current task when coredumping - or for stopped tasks when ptrace-ing. Implement this properly in the code and update the comments. This also fixes an incorrect (but harmless) warning introduced by one of the earlier patches. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/20170923130016.21448-28-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-09-26 09:41:09 +02:00
Thomas Gleixner	d6ffc6ac83	x86/vector: Respect affinity mask in irq descriptor The interrupt descriptor has a preset affinity mask at allocation time, which is usually the default affinity mask. The current code does not respect that mask and places the vector at some random CPU, which gets corrected later by a set_affinity() call. That's silly because the vector allocation can respect the mask upfront and place the interrupt on a CPU which is in the mask. If that fails, then the affinity is broken and a interrupt assigned on any online CPU. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Tested-by: Yu Chen <yu.c.chen@intel.com> Acked-by: Juergen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Alok Kataria <akataria@vmware.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Rui Zhang <rui.zhang@intel.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Len Brown <lenb@kernel.org> Link: https://lkml.kernel.org/r/20170913213156.431670325@linutronix.de	2017-09-25 20:52:03 +02:00
Thomas Gleixner	2cffad7bad	x86/irq: Simplify hotplug vector accounting Before a CPU is taken offline the number of active interrupt vectors on the outgoing CPU and the number of vectors which are available on the other online CPUs are counted and compared. If the active vectors are more than the available vectors on the other CPUs then the CPU hot-unplug operation is aborted. This again uses loop based search and is inaccurate. The bitmap matrix allocator has accurate accounting information and can tell exactly whether the vector space is sufficient or not. Emit a message when the number of globaly reserved (unallocated) vectors is larger than the number of available vectors after offlining a CPU because after that point request_irq() might fail. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Tested-by: Yu Chen <yu.c.chen@intel.com> Acked-by: Juergen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Alok Kataria <akataria@vmware.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Rui Zhang <rui.zhang@intel.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Len Brown <lenb@kernel.org> Link: https://lkml.kernel.org/r/20170913213156.351193962@linutronix.de	2017-09-25 20:52:02 +02:00
Thomas Gleixner	464d12309e	x86/vector: Switch IOAPIC to global reservation mode IOAPICs install and allocate vectors for inactive interrupts. This results in problems on CPU offline and wastes vector resources for nothing. Handle inactive IOAPIC interrupts in the same way as inactive MSI interrupts and switch them to the global reservation mode. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Tested-by: Yu Chen <yu.c.chen@intel.com> Acked-by: Juergen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Alok Kataria <akataria@vmware.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Rui Zhang <rui.zhang@intel.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Len Brown <lenb@kernel.org> Link: https://lkml.kernel.org/r/20170913213156.273454591@linutronix.de	2017-09-25 20:52:02 +02:00
Thomas Gleixner	4900be8360	x86/vector/msi: Switch to global reservation mode Devices with many queues allocate a huge number of interrupts and get assigned a vector for each of them, even if the queues are not active and the interrupts never requested. This causes problems with the decision whether the global vector space is sufficient for CPU hot unplug operations. Change it to a reservation scheme, which allows overcommitment. When the interrupt is allocated and initialized the vector assignment merily updates the reservation request counter in the matrix allocator. This counter is used to emit warnings when the reservation exceeds the available vector space, but does not affect CPU offline operations. Like the managed interrupts the corresponding MSI/DMAR/IOAPIC entries are directed to the special shutdown vector. When the interrupt is requested, then the activation code tries to assign a real vector. If that succeeds the interrupt is started up and functional. If that fails, then subsequently request_irq() fails with -ENOSPC. This allows a clear separation of inactive and active modes and simplifies the final decisions whether the global vector space is sufficient for CPU offline operations. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Tested-by: Yu Chen <yu.c.chen@intel.com> Acked-by: Juergen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Alok Kataria <akataria@vmware.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Rui Zhang <rui.zhang@intel.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Len Brown <lenb@kernel.org> Link: https://lkml.kernel.org/r/20170913213156.184211133@linutronix.de	2017-09-25 20:52:02 +02:00
Thomas Gleixner	2db1f959d9	x86/vector: Handle managed interrupts proper Managed interrupts need to reserve interrupt vectors permanently, but as long as the interrupt is deactivated, the vector should not be active. Reserve a new system vector, which can be used to initially initialize MSI/DMAR/IOAPIC entries. In that situation the interrupts are disabled in the corresponding MSI/DMAR/IOAPIC devices. So the vector should never be sent to any CPU. When the managed interrupt is started up, a real vector is assigned from the managed vector space and configured in MSI/DMAR/IOAPIC. This allows a clear separation of inactive and active modes and simplifies the final decisions whether the global vector space is sufficient for CPU offline operations. The vector space can be reserved even on offline CPUs and will survive CPU offline/online operations. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Tested-by: Yu Chen <yu.c.chen@intel.com> Acked-by: Juergen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Alok Kataria <akataria@vmware.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Rui Zhang <rui.zhang@intel.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Len Brown <lenb@kernel.org> Link: https://lkml.kernel.org/r/20170913213156.104616625@linutronix.de	2017-09-25 20:52:01 +02:00
Thomas Gleixner	90ad9e2d91	x86/io_apic: Reevaluate vector configuration on activate() With the upcoming reservation/management scheme, early activation will assign a special vector. The final activation at request_irq() assigns a real vector, which needs to be updated in the ioapic. Split out the reconfiguration code in set_affinity and use it for reactivation. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Tested-by: Yu Chen <yu.c.chen@intel.com> Acked-by: Juergen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Alok Kataria <akataria@vmware.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Rui Zhang <rui.zhang@intel.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Len Brown <lenb@kernel.org> Link: https://lkml.kernel.org/r/20170913213156.025232175@linutronix.de	2017-09-25 20:52:01 +02:00
Thomas Gleixner	2a85386a73	x86/apic/msi: Force reactivation of interrupts at startup time MSI(X) interrupts need a valid vector configuration early at allocation time, i.e. before the PCI core enables MSI(X). With managed interrupts and the new global reservation scheme, the early configuration will not assign a real device vector, but a special shutdown vector. When the irq is started up, then the interrupt must be reconfigured. Tell the MSI irqdomain core about it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Tested-by: Yu Chen <yu.c.chen@intel.com> Acked-by: Juergen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Alok Kataria <akataria@vmware.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Rui Zhang <rui.zhang@intel.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Len Brown <lenb@kernel.org> Link: https://lkml.kernel.org/r/20170913213155.774066582@linutronix.de	2017-09-25 20:52:00 +02:00
Thomas Gleixner	ba224feac8	x86/vector: Untangle internal state from irq_cfg The vector management state is not required to live in irq_cfg. irq_cfg is only relevant for the depending irq domains (IOAPIC, DMAR, MSI ...). The seperation of the vector management status allows to direct a shut down interrupt to a special shutdown vector w/o confusing the internal state of the vector management. Preparatory change for the rework of managed interrupts and the global vector reservation scheme. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Tested-by: Yu Chen <yu.c.chen@intel.com> Acked-by: Juergen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Alok Kataria <akataria@vmware.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Rui Zhang <rui.zhang@intel.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Len Brown <lenb@kernel.org> Link: https://lkml.kernel.org/r/20170913213155.683712356@linutronix.de	2017-09-25 20:51:59 +02:00
Thomas Gleixner	ba801640b1	x86/vector: Compile SMP only code conditionally No point in compiling this for UP. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Tested-by: Yu Chen <yu.c.chen@intel.com> Acked-by: Juergen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Alok Kataria <akataria@vmware.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Rui Zhang <rui.zhang@intel.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Len Brown <lenb@kernel.org> Link: https://lkml.kernel.org/r/20170913213155.603191841@linutronix.de	2017-09-25 20:51:59 +02:00
Thomas Gleixner	baab1e84b1	x86/apic: Remove unused callbacks Now that the old allocator is gone, these apic functions are unused. Remove them. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Tested-by: Yu Chen <yu.c.chen@intel.com> Acked-by: Juergen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Alok Kataria <akataria@vmware.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Rui Zhang <rui.zhang@intel.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Len Brown <lenb@kernel.org> Link: https://lkml.kernel.org/r/20170913213155.524662349@linutronix.de	2017-09-25 20:51:58 +02:00
Thomas Gleixner	69cde0004a	x86/vector: Use matrix allocator for vector assignment Replace the magic vector allocation code by a simple bitmap matrix allocator. This avoids loops and hoops over CPUs and vector arrays, so in case of densly used vector spaces it's way faster. This also gets rid of the magic 'spread the vectors accross priority levels' heuristics in the current allocator: The comment in __asign_irq_vector says: * NOTE! The local APIC isn't very good at handling * multiple interrupts at the same interrupt level. * As the interrupt level is determined by taking the * vector number and shifting that right by 4, we * want to spread these out a bit so that they don't * all fall in the same interrupt level. After doing some palaeontological research the following was found the following in the PPro Developer Manual Volume 3: "7.4.2. Valid Interrupts The local and I/O APICs support 240 distinct vectors in the range of 16 to 255. Interrupt priority is implied by its vector, according to the following relationship: priority = vector / 16 One is the lowest priority and 15 is the highest. Vectors 16 through 31 are reserved for exclusive use by the processor. The remaining vectors are for general use. The processor's local APIC includes an in-service entry and a holding entry for each priority level. To avoid losing inter- rupts, software should allocate no more than 2 interrupt vectors per priority." The current SDM tells nothing about that, instead it states: "If more than one interrupt is generated with the same vector number, the local APIC can set the bit for the vector both in the IRR and the ISR. This means that for the Pentium 4 and Intel Xeon processors, the IRR and ISR can queue two interrupts for each interrupt vector: one in the IRR and one in the ISR. Any additional interrupts issued for the same interrupt vector are collapsed into the single bit in the IRR. For the P6 family and Pentium processors, the IRR and ISR registers can queue no more than two interrupts per interrupt vector and will reject other interrupts that are received within the same vector." Which means, that on P6/Pentium the APIC will reject a new message and tell the sender to retry, which increases the load on the APIC bus and nothing more. There is no affirmative answer from Intel on that, but it's a sane approach to remove that for the following reasons: 1) No other (relevant Open Source) operating systems bothers to implement this or mentiones this at all. 2) The current allocator has no enforcement for this and especially the legacy interrupts, which are the main source of interrupts on these P6 and older systmes, are allocated linearly in the same priority level and just work. 3) The current machines have no problem with that at all as verified with some experiments. 4) AMD at least confirmed that such an issue is unknown. 5) P6 and older are dinosaurs almost 20 years EOL, so there is really no reason to worry about that too much. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Tested-by: Yu Chen <yu.c.chen@intel.com> Acked-by: Juergen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Alok Kataria <akataria@vmware.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Rui Zhang <rui.zhang@intel.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Len Brown <lenb@kernel.org> Link: https://lkml.kernel.org/r/20170913213155.443678104@linutronix.de	2017-09-25 20:51:58 +02:00
Thomas Gleixner	8d1e3dca7d	x86/vector: Add tracepoints for vector management Add tracepoints for analysing the new vector management Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Tested-by: Yu Chen <yu.c.chen@intel.com> Acked-by: Juergen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Alok Kataria <akataria@vmware.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Rui Zhang <rui.zhang@intel.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Len Brown <lenb@kernel.org> Link: https://lkml.kernel.org/r/20170913213155.357986795@linutronix.de	2017-09-25 20:51:58 +02:00
Thomas Gleixner	8ed4f3e666	x86/smpboot: Set online before setting up vectors There is no reason to set the CPU online after establishing the vectors on the upcoming CPU. The vector space is protected by the vector lock so no changes can happen. Marking the CPU online before setting up the vector space makes tracing work in the early vector management cpu online code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Tested-by: Yu Chen <yu.c.chen@intel.com> Acked-by: Juergen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Alok Kataria <akataria@vmware.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Rui Zhang <rui.zhang@intel.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Len Brown <lenb@kernel.org> Link: https://lkml.kernel.org/r/20170913213155.264311994@linutronix.de	2017-09-25 20:51:57 +02:00
Thomas Gleixner	65d7ed57bd	x86/vector: Add vector domain debugfs support Add the debug callback for the vector domain, which gives a detailed information about vector usage if invoked for the domain by using rhe matrix allocator debug function and vector/target information when invoked for a particular interrupt. Extra information foir the Vector domain: Online bitmaps: 32 Global available: 6352 Global reserved: 5 Total allocated: 20 System: 41: 0-19,32,50,128,238-255 \| CPU \| avl \| man \| act \| vectors 0 183 4 19 33-48,51-53 1 199 4 1 33 2 199 4 0 Extra information for interrupts: Vector: 42 Target: 4 This allows a detailed analysis of the vector usage and the association to interrupts and devices. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Tested-by: Yu Chen <yu.c.chen@intel.com> Acked-by: Juergen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Alok Kataria <akataria@vmware.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Rui Zhang <rui.zhang@intel.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Len Brown <lenb@kernel.org> Link: https://lkml.kernel.org/r/20170913213155.188137174@linutronix.de	2017-09-25 20:51:57 +02:00
Thomas Gleixner	0fa115da40	x86/irq/vector: Initialize matrix allocator Initialize the matrix allocator and add the proper accounting points to the code. No functional change, just preparation. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Tested-by: Yu Chen <yu.c.chen@intel.com> Acked-by: Juergen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Alok Kataria <akataria@vmware.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Rui Zhang <rui.zhang@intel.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Len Brown <lenb@kernel.org> Link: https://lkml.kernel.org/r/20170913213155.108410660@linutronix.de	2017-09-25 20:51:56 +02:00
Thomas Gleixner	9f9e3bb1cf	x86/apic: Add replacement for cpu_mask_to_apicid() As preparation for replacing the vector allocator, provide a new function which takes a cpu number instead of a cpu mask to calculate/lookup the resulting APIC destination id. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Tested-by: Yu Chen <yu.c.chen@intel.com> Acked-by: Juergen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Alok Kataria <akataria@vmware.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Rui Zhang <rui.zhang@intel.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Len Brown <lenb@kernel.org>	2017-09-25 20:51:56 +02:00
Thomas Gleixner	99a1482d8a	x86/vector: Move helper functions around Move the helper functions to a different place as they would end up in the middle of management functions. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Tested-by: Yu Chen <yu.c.chen@intel.com> Acked-by: Juergen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Alok Kataria <akataria@vmware.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Rui Zhang <rui.zhang@intel.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Len Brown <lenb@kernel.org> Link: https://lkml.kernel.org/r/20170913213154.949581934@linutronix.de	2017-09-25 20:51:56 +02:00
Thomas Gleixner	258d86eef9	x86/vector: Remove pointless pointer checks The info pointer checks in assign_irq_vector_policy() are pointless because the pointer cannot be NULL, otherwise the calling code would already crash. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Tested-by: Yu Chen <yu.c.chen@intel.com> Acked-by: Juergen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Alok Kataria <akataria@vmware.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Rui Zhang <rui.zhang@intel.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Len Brown <lenb@kernel.org> Link: https://lkml.kernel.org/r/20170913213154.859484148@linutronix.de	2017-09-25 20:51:55 +02:00
Thomas Gleixner	4ef76eb6de	x86/apic: Get rid of the legacy irq data storage Now that the legacy PIC takeover by the IOAPIC is marked accordingly the early boot allocation of APIC data is not longer necessary. Use the regular allocation mechansim as it is used by non legacy interrupts and fill in the known information (vector and affinity) so the allocator reuses the vector, This is important as the timer check might move the timer interrupt 0 back to the PIC in case the delivery through the IOAPIC fails. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Tested-by: Yu Chen <yu.c.chen@intel.com> Acked-by: Juergen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Alok Kataria <akataria@vmware.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Rui Zhang <rui.zhang@intel.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Len Brown <lenb@kernel.org> Link: https://lkml.kernel.org/r/20170913213154.780521549@linutronix.de	2017-09-25 20:51:55 +02:00
Thomas Gleixner	3534be05e4	x86/ioapic: Mark legacy vectors at reallocation time When the legacy PIC vectors are taken over by the IO APIC the current vector assignement code is tricked to reuse the vector by allocating the apic data in the early boot process. This can be avoided by marking the allocation as legacy PIC take over. Preparatory patch for further cleanups. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Tested-by: Yu Chen <yu.c.chen@intel.com> Acked-by: Juergen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Alok Kataria <akataria@vmware.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Rui Zhang <rui.zhang@intel.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Len Brown <lenb@kernel.org> Link: https://lkml.kernel.org/r/20170913213154.700501979@linutronix.de	2017-09-25 20:51:54 +02:00
Thomas Gleixner	dccfe3147b	x86/vector: Simplify vector move cleanup The vector move cleanup needs to walk the vector space and do a lot of sanity checks to find a vector to cleanup. With single CPU affinities this can be simplified and made more robust by queueing the vector configuration which needs to be cleaned up in a hlist on the CPU which was the previous target. That removes all the race conditions because the cleanup either finds a valid list entry or not. The latter happens when the interrupt was torn down before the cleanup handler was able to run. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Tested-by: Yu Chen <yu.c.chen@intel.com> Acked-by: Juergen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Alok Kataria <akataria@vmware.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Rui Zhang <rui.zhang@intel.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Len Brown <lenb@kernel.org> Link: https://lkml.kernel.org/r/20170913213154.622727892@linutronix.de	2017-09-25 20:51:54 +02:00
Thomas Gleixner	029c6e1c9d	x86/vector: Store the single CPU targets in apic data Now that the interrupt affinities are targeted at single CPUs storing them in a cpumask is overkill. Store them in a dedicated variable. This does not yet remove the domain cpumasks because the current allocator relies on them. Preparatory change for the allocator rework. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Tested-by: Yu Chen <yu.c.chen@intel.com> Acked-by: Juergen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Alok Kataria <akataria@vmware.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Rui Zhang <rui.zhang@intel.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Len Brown <lenb@kernel.org> Link: https://lkml.kernel.org/r/20170913213154.544867277@linutronix.de	2017-09-25 20:51:54 +02:00
Thomas Gleixner	86ba65514f	x86/vector: Cleanup variable names The naming convention of variables with the types irq_data and apic_chip_data are inconsistent and confusing. Before reworking the whole vector management make them consistent so irq_data pointers are named 'irqd' and apic_chip_data are named 'apicd' all over the place. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Tested-by: Yu Chen <yu.c.chen@intel.com> Acked-by: Juergen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Alok Kataria <akataria@vmware.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Rui Zhang <rui.zhang@intel.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Len Brown <lenb@kernel.org> Link: https://lkml.kernel.org/r/20170913213154.465731667@linutronix.de	2017-09-25 20:51:53 +02:00
Thomas Gleixner	f0cc6ccaf7	x86/vector: Simplify the CPU hotplug vector update With single CPU affinities it's not longer required to scan all interrupts for potential destination masks which contain the newly booting CPU. Reduce it to install the active legacy PIC vectors on the newly booting CPU as those cannot be affinity controlled by the kernel and potentially end up at any CPU in the system. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Tested-by: Yu Chen <yu.c.chen@intel.com> Acked-by: Juergen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Alok Kataria <akataria@vmware.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Rui Zhang <rui.zhang@intel.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Len Brown <lenb@kernel.org> Link: https://lkml.kernel.org/r/20170913213154.388040204@linutronix.de	2017-09-25 20:51:53 +02:00
Thomas Gleixner	ef9e56d894	x86/ioapic: Remove obsolete post hotplug update With single CPU affinities the post SMP boot vector update is pointless as it will just leave the affinities on the same vectors and the same CPUs. Remove it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Tested-by: Yu Chen <yu.c.chen@intel.com> Acked-by: Juergen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Alok Kataria <akataria@vmware.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Rui Zhang <rui.zhang@intel.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Len Brown <lenb@kernel.org> Link: https://lkml.kernel.org/r/20170913213154.308697243@linutronix.de	2017-09-25 20:51:52 +02:00
Thomas Gleixner	fdba46ffb4	x86/apic: Get rid of multi CPU affinity Setting the interrupt affinity of a single interrupt to multiple CPUs has a dubious value. 1) This only works on machines where the APIC uses logical destination mode. If the APIC uses physical destination mode then it is already restricted to a single CPU 2) Experiments have shown, that the benefit of multi CPU affinity is close to zero and in some test even worse than setting the affinity to a single CPU. The reason for this is that the delivery targets the APIC with the lowest ID first and only if that APIC is busy (servicing an interrupt, i.e. ISR is not empty) it hands it over to the next APIC. In the conducted tests the vast majority of interrupts ends up on the APIC with the lowest ID anyway, so there is no natural spreading of the interrupts possible. Supporting multi CPU affinities adds a lot of complexity to the code, which can turn the allocation search into a worst case of nr_vectors * nr_online_cpus * nr_bits_in_target_mask As a first step disable it by restricting the vector search to a single CPU. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Tested-by: Yu Chen <yu.c.chen@intel.com> Acked-by: Juergen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Alok Kataria <akataria@vmware.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Rui Zhang <rui.zhang@intel.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Len Brown <lenb@kernel.org> Link: https://lkml.kernel.org/r/20170913213154.228824430@linutronix.de	2017-09-25 20:51:52 +02:00
Thomas Gleixner	7854f82293	x86/vector: Rename used_vectors to system_vectors used_vectors is a nisnomer as it only has the system vectors which are excluded from the regular vector allocation marked. It's not what the name suggests storage for the actually used vectors. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Tested-by: Yu Chen <yu.c.chen@intel.com> Acked-by: Juergen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Alok Kataria <akataria@vmware.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Rui Zhang <rui.zhang@intel.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Len Brown <lenb@kernel.org> Link: https://lkml.kernel.org/r/20170913213154.150209009@linutronix.de	2017-09-25 20:51:52 +02:00
Thomas Gleixner	c1d1ee9ac1	x86/apic: Get rid of apic->target_cpus The target_cpus() callback of the apic struct is not really useful. Some APICs return cpu_online_mask and others cpus_all_mask. The latter is bogus as it does not take holes in the cpus_possible_mask into account. Replace it with cpus_online_mask which makes the most sense and remove the callback. The usage sites will be removed in a later step anyway, so get rid of it now to have incremental changes. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Tested-by: Yu Chen <yu.c.chen@intel.com> Acked-by: Juergen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Alok Kataria <akataria@vmware.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Rui Zhang <rui.zhang@intel.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Len Brown <lenb@kernel.org> Link: https://lkml.kernel.org/r/20170913213154.070850916@linutronix.de	2017-09-25 20:51:51 +02:00
Thomas Gleixner	023a611748	x86/apic/x2apic: Simplify cluster management The cluster management code creates a cluster mask per cpu, which requires that on cpu on/offline all cluster masks have to be iterated and updated. Other information about the cluster is in different per cpu variables. Create a data structure which holds all information about a cluster and fill it in when the first CPU of a cluster comes online. If another CPU of a cluster comes online it just finds the pointer to the existing cluster structure and reuses it. That simplifies all usage sites and gets rid of quite some pointless iterations over the online cpus to find the cpus which belong to the cluster. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Tested-by: Yu Chen <yu.c.chen@intel.com> Acked-by: Juergen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Alok Kataria <akataria@vmware.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Rui Zhang <rui.zhang@intel.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Len Brown <lenb@kernel.org> Link: https://lkml.kernel.org/r/20170913213153.992629420@linutronix.de	2017-09-25 20:51:51 +02:00
Thomas Gleixner	83a105229c	x86/apic: Move common APIC callbacks Move more apic struct specific functions out of the header and the apic management code into the common source file. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Tested-by: Yu Chen <yu.c.chen@intel.com> Acked-by: Juergen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Alok Kataria <akataria@vmware.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Rui Zhang <rui.zhang@intel.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Len Brown <lenb@kernel.org> Link: https://lkml.kernel.org/r/20170913213153.834421893@linutronix.de	2017-09-25 20:51:50 +02:00
Thomas Gleixner	6406350583	x86/apic: Sanitize 32/64bit APIC callbacks The 32bit and the 64bit implementation of default_cpu_present_to_apicid() and default_check_phys_apicid_present() are exactly the same, but implemented and located differently. Move them to common apic code and get rid of the pointless difference. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Tested-by: Yu Chen <yu.c.chen@intel.com> Acked-by: Juergen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Alok Kataria <akataria@vmware.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Rui Zhang <rui.zhang@intel.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Len Brown <lenb@kernel.org> Link: https://lkml.kernel.org/r/20170913213153.757329991@linutronix.de	2017-09-25 20:51:50 +02:00
Thomas Gleixner	1da91779e1	x86/apic: Move APIC noop specific functions Move more inlines to the place where they belong. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Tested-by: Yu Chen <yu.c.chen@intel.com> Acked-by: Juergen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Alok Kataria <akataria@vmware.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Rui Zhang <rui.zhang@intel.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Len Brown <lenb@kernel.org> Link: https://lkml.kernel.org/r/20170913213153.677743545@linutronix.de	2017-09-25 20:51:49 +02:00
Thomas Gleixner	0801bbaac0	x86/apic: Move probe32 specific APIC functions The apic functions which are used in probe_32.c are implemented as inlines or in apic.c. There is no reason to have them at random places. Move them to the actual usage site and make them static. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Tested-by: Yu Chen <yu.c.chen@intel.com> Acked-by: Juergen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Alok Kataria <akataria@vmware.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Rui Zhang <rui.zhang@intel.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Len Brown <lenb@kernel.org> Link: https://lkml.kernel.org/r/20170913213153.596768194@linutronix.de	2017-09-25 20:51:49 +02:00
Thomas Gleixner	57e0aa4461	x86/apic: Sanitize return value of check_apicid_used() The check is boolean, but the function returns unsigned long for no value. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Tested-by: Yu Chen <yu.c.chen@intel.com> Acked-by: Juergen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Alok Kataria <akataria@vmware.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Rui Zhang <rui.zhang@intel.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Len Brown <lenb@kernel.org> Link: https://lkml.kernel.org/r/20170913213153.516730518@linutronix.de	2017-09-25 20:51:49 +02:00
Thomas Gleixner	727657e620	x86/apic: Sanitize return value of apic.set_apic_id() The set_apic_id() callback returns an unsigned long value which is handed in to apic_write() as the value argument u32. Adjust the return value so it returns u32 right away. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Tested-by: Yu Chen <yu.c.chen@intel.com> Acked-by: Juergen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Alok Kataria <akataria@vmware.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Rui Zhang <rui.zhang@intel.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Len Brown <lenb@kernel.org> Link: https://lkml.kernel.org/r/20170913213153.437208268@linutronix.de	2017-09-25 20:51:48 +02:00
Thomas Gleixner	981c2eac1c	x86/apic: Deinline x2apic functions These inline functions are used in both the cluster and the physical x2apic code to fill in the function pointers of the apic structure. That means the code is generated twice for no reason. Move it to a C code and reuse it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Tested-by: Yu Chen <yu.c.chen@intel.com> Acked-by: Juergen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Alok Kataria <akataria@vmware.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Rui Zhang <rui.zhang@intel.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Len Brown <lenb@kernel.org> Link: https://lkml.kernel.org/r/20170913213153.358954066@linutronix.de	2017-09-25 20:51:48 +02:00
Thomas Gleixner	e4ae4c8ea7	Merge branch 'irq/core' into x86/apic Pick up the dependencies for the vector management rework series.	2017-09-25 20:39:01 +02:00
Thomas Gleixner	42e1cc2dc5	genirq/irqdomain: Propagate early activation Propagate the early activation mode to the irqdomain activate() callbacks. This is required for the upcoming reservation, late vector assignment scheme, so that the early activation call can act accordingly. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Tested-by: Yu Chen <yu.c.chen@intel.com> Acked-by: Juergen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Alok Kataria <akataria@vmware.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Rui Zhang <rui.zhang@intel.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Len Brown <lenb@kernel.org> Link: https://lkml.kernel.org/r/20170913213153.028353660@linutronix.de	2017-09-25 20:38:25 +02:00
Thomas Gleixner	7249164346	genirq/irqdomain: Update irq_domain_ops.activate() signature The irq_domain_ops.activate() callback has no return value and no way to tell the function that the activation is early. The upcoming changes to support a reservation scheme which allows to assign interrupt vectors on x86 only when the interrupt is actually requested requires: - A return value, so activation can fail at request_irq() time - Information that the activate invocation is early, i.e. before request_irq(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Tested-by: Yu Chen <yu.c.chen@intel.com> Acked-by: Juergen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Alok Kataria <akataria@vmware.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Rui Zhang <rui.zhang@intel.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Len Brown <lenb@kernel.org> Link: https://lkml.kernel.org/r/20170913213152.848490816@linutronix.de	2017-09-25 20:38:24 +02:00
Boris Ostrovsky	ccb64941f3	x86/timers: Move simple_udelay_calibration() past kvmclock_init() simple_udelay_calibration() relies on x86_platform's calibration ops. For KVM these ops are set late in setup_arch() and so simple_udelay_calibration() ends up using native version. Besides being possibly incorrect, this significantly increases kernel boot time. For example, on my laptop executing start_kernel() by a guest takes ~10 times more than when KVM's ops are used. Since early_xdbc_setup_hardware() relies on calibration having been performed move it too. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: baolu.lu@linux.intel.com Link: https://lkml.kernel.org/r/20170911185111.20636-1-boris.ostrovsky@oracle.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-09-25 15:22:45 +02:00
Dou Liyang	af5768507c	x86/timers: Make recalibrate_cpu_khz() void recalibrate_cpu_khz() is called from powernow K7 and Pentium 4/Xeon CPU freq driver. It recalibrates cpu frequency in case of SMP = n and doesn't need to return anything. Mark it void, also remove the #else branch. Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1500003247-17368-2-git-send-email-douly.fnst@cn.fujitsu.com	2017-09-25 15:22:44 +02:00
Dou Liyang	eb496063c9	x86/timers: Move the simple udelay calibration to tsc.h Commit `dd759d93f4` ("x86/timers: Add simple udelay calibration") adds an static function in x86 boot-time initializations. But, this function is actually related to TSC, so it should be maintained in tsc.c, not in setup.c. Move simple_udelay_calibration() from setup.c to tsc.c and rename it to tsc_early_delay_calibrate for more readability. Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1500003247-17368-1-git-send-email-douly.fnst@cn.fujitsu.com	2017-09-25 15:22:44 +02:00
Dou Liyang	ae41a2a40e	x86/apic: Use lapic_is_integrated() consistently lapic_is_integrated() is a wrapper around APIC_INTEGRATED(), but not used consistently. Replace the direct usage of APIC_INTEGRATED() and fixup a hard to read tail comment. No functional change. [ tglx: Made it compile and work .... ] Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: bhe@redhat.com Link: https://lkml.kernel.org/r/1504774161-7137-2-git-send-email-douly.fnst@cn.fujitsu.com	2017-09-25 15:19:43 +02:00
Dou Liyang	e3cccbce14	x86/apic: Remove duplicate X86_64 conditional in lapic_is_integrated() The macro APIC_INTEGRATED(x) is already wrapped by CONFIG_X86_32. So it can be invoked unconditionally. Remove the extra "#ifdef CONFIG_X86_64...". No functional change. Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: bhe@redhat.com Link: https://lkml.kernel.org/r/1504774161-7137-1-git-send-email-douly.fnst@cn.fujitsu.com	2017-09-25 15:19:43 +02:00
Dou Liyang	b371ae0d4a	x86/apic: Remove init_bsp_APIC() init_bsp_APIC() which works for the virtual wire mode is used in ISA irq initialization at boot time. With the new APIC interrupt delivery mode scheme, which initializes the APIC before the first interrupt is expected, init_bsp_APIC() is not longer required and can be removed. Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: yinghai@kernel.org Cc: bhe@redhat.com Link: https://lkml.kernel.org/r/1505293975-26005-13-git-send-email-douly.fnst@cn.fujitsu.com	2017-09-25 15:12:37 +02:00
Dou Liyang	935356cecd	x86/apic: Initialize interrupt mode after timer init A cold or warm boot through BIOS sets the APIC in default interrupt delivery mode. A dump-capture kernel will not go through a BIOS reset and leave the interrupt delivery mode in the state which was active on the crashed kernel, but the dump kernel startup code assumes default delivery mode which can result in interrupt delivery/handling to fail. To solve this problem, it's required to set up the final interrupt delivery mode as soon as possible. As IOAPIC setup needs the timer initialized for verifying the timer interrupt delivery mode, the earliest point is right after timer setup in late_time_init(). That results in the following init order: 1) Set up the legacy timer, if applicable on the platform 2) Set up APIC/IOAPIC which includes the verification of the legacy timer interrupt delivery. 3) TSC calibration 4) Local APIC timer setup Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: yinghai@kernel.org Cc: bhe@redhat.com Link: https://lkml.kernel.org/r/1505293975-26005-12-git-send-email-douly.fnst@cn.fujitsu.com	2017-09-25 15:03:17 +02:00
Dou Liyang	34fba3e6b1	x86/init: Add intr_mode_init to x86_init_ops X86 and XEN initialize interrupt delivery mode in different way. To avoid conditionals, add a new x86_init_ops function which defaults to the standard function and can be overridden by the early XEN platform code. [ tglx: Folded the XEN part which was a separate patch to preserve bisectability ] Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: yinghai@kernel.org Cc: bhe@redhat.com Link: https://lkml.kernel.org/r/1505293975-26005-10-git-send-email-douly.fnst@cn.fujitsu.com	2017-09-25 15:03:17 +02:00
Dou Liyang	ca7c6076ba	x86/ioapic: Refactor the delay logic in timer_irq_works() timer_irq_works() is used to detects the timer IRQs. It calls mdelay(10) to delay ten ticks and check whether the timer IRQ work or not. mdelay() depends on the loops_per_jiffy which is set up in calibrate_delay(), but the delay calibration depends on a working timer interrupt, which causes a chicken and egg problem. The correct solution is to set up the interrupt mode and making sure that the timer interrupt is delivered correctly before invoking calibrate_delay(). That means that mdelay() cannot be used in timer_irq_works(). Provide helper functions to make a rough delay estimate which is good enough to prove that the timer interrupt is working. Either use TSC or a simple delay loop and assume that 4GHz is the maximum CPU frequency to base the delay calculation on. Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: yinghai@kernel.org Cc: bhe@redhat.com Link: https://lkml.kernel.org/r/1505293975-26005-9-git-send-email-douly.fnst@cn.fujitsu.com	2017-09-25 15:03:16 +02:00
Dou Liyang	0c759131ae	x86/apic: Unify interrupt mode setup for UP system In UniProcessor kernel with UP_LATE_INIT=y, the interrupt delivery mode is initialized in up_late_init(). Use the new unified apic_intr_mode_init() function and remove APIC_init_uniprocessor(). Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: yinghai@kernel.org Cc: bhe@redhat.com Link: https://lkml.kernel.org/r/1505293975-26005-8-git-send-email-douly.fnst@cn.fujitsu.com	2017-09-25 15:03:16 +02:00
Dou Liyang	4f45ed9f84	x86/apic: Mark the apic_intr_mode extern for sanity check cleanup Calling native_smp_prepare_cpus() to prepare for SMP bootup, does some sanity checking, enables APIC mode and disables SMP feature. Now, APIC mode setup has been unified to apic_intr_mode_init(), some sanity checks are redundant and need to be cleanup. Mark the apic_intr_mode extern to refine the switch and remove the redundant sanity check. Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: yinghai@kernel.org Cc: bhe@redhat.com Link: https://lkml.kernel.org/r/1505293975-26005-7-git-send-email-douly.fnst@cn.fujitsu.com	2017-09-25 15:03:16 +02:00
Dou Liyang	3e730dad3b	x86/apic: Unify interrupt mode setup for SMP-capable system On a SMP-capable system, the kernel enables and sets up the APIC interrupt delivery mode in native_smp_prepare_cpus(). The decision how to setup the APIC is intermingled with the decision of setting up SMP or not. Split the initialization of the APIC interrupt mode independent from other decisions and have a separate apic_intr_mode_init() function for it. The invocation time stays the same for now. Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: yinghai@kernel.org Cc: bhe@redhat.com Link: https://lkml.kernel.org/r/1505293975-26005-6-git-send-email-douly.fnst@cn.fujitsu.com	2017-09-25 15:03:15 +02:00
Dou Liyang	4b1244b45c	x86/apic: Move logical APIC ID away from apic_bsp_setup() apic_bsp_setup() sets and returns logical APIC ID for initializing cpu0_logical_apicid in a SMP-capable system. The id has nothing to do with the initialization of local APIC and I/O APIC. And apic_bsp_setup() should be called for interrupt mode setup only. Move the id setup into a separate helper function for cleanup and mark apic_bsp_setup() void. Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: yinghai@kernel.org Cc: bhe@redhat.com Link: https://lkml.kernel.org/r/1505293975-26005-5-git-send-email-douly.fnst@cn.fujitsu.com	2017-09-25 15:03:15 +02:00
Dou Liyang	a2510d156e	x86/apic: Split local APIC timer setup from the APIC setup apic_bsp_setup() sets up the local APIC, I/O APIC and APIC timer. The local APIC and I/O APIC setup belongs to interrupt delivery mode setup. Setting up the local APIC timer for booting CPU is another job and has nothing to do with interrupt delivery mode setup. Split local APIC timer setup from the APIC setup, keep it in the original position for SMP and UP kernel for now. Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: yinghai@kernel.org Cc: bhe@redhat.com Link: https://lkml.kernel.org/r/1505293975-26005-4-git-send-email-douly.fnst@cn.fujitsu.com	2017-09-25 15:03:14 +02:00
Dou Liyang	4b1669e8d1	x86/apic: Prepare for unifying the interrupt delivery modes setup There are three places which initialize the interrupt delivery modes: 1) init_bsp_APIC() which is called early might setup the through-local-APIC virtual wire mode on non SMP systems. 2) In an SMP-capable system, native_smp_prepare_cpus() tries to switch to symmetric I/O model. 3) In UP system with UP_LATE_INIT=y, the local APIC and I/O APIC are set up in smp_init(). There is no technical reason to make these initializations at random places and run the kernel with the potentially wrong mode through the early boot stage, but it has a problematic side effect: The late switch to symmetric I/O mode causes dump-capture kernel to hang when the kernel command line option 'notsc' is active. Provide a new function to unify that three positions. Preparatory patch to initialize an interrupt mode directly. Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: yinghai@kernel.org Cc: bhe@redhat.com Link: https://lkml.kernel.org/r/1505293975-26005-3-git-send-email-douly.fnst@cn.fujitsu.com	2017-09-25 15:03:14 +02:00
Dou Liyang	0114a8e877	x86/apic: Construct a selector for the interrupt delivery mode There are quite some switches which are used to determine the final interrupt delivery mode, as shown below: 1) Kconfig: CONFIG_X86_64; CONFIG_X86_LOCAL_APIC; CONFIG_x86_IO_APIC 2) Command line options: disable_apic; skip_ioapic_setup 3) CPU Capability: boot_cpu_has(X86_FEATURE_APIC) 4) MP table: smp_found_config 5) ACPI: acpi_lapic; acpi_ioapic; nr_ioapic These switches are disordered and scattered and there are also some dependencies between them. These make the code difficult to maintain and read. Construct a selector to unify them into a single function, then, Use this selector to get an interrupt delivery mode directly. Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: yinghai@kernel.org Cc: bhe@redhat.com Link: https://lkml.kernel.org/r/1505293975-26005-2-git-send-email-douly.fnst@cn.fujitsu.com	2017-09-25 15:03:14 +02:00
Sean Fu	7d7099433d	x86/sysfs: Fix off-by-one error in loop termination An off-by-one error in loop terminantion conditions in create_setup_data_nodes() will lead to memory leak when create_setup_data_node() failed. Signed-off-by: Sean Fu <fxinrong@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1505090001-1157-1-git-send-email-fxinrong@gmail.com	2017-09-25 09:36:16 +02:00
Eric Biggers	814fb7bb7d	x86/fpu: Don't let userspace set bogus xcomp_bv On x86, userspace can use the ptrace() or rt_sigreturn() system calls to set a task's extended state (xstate) or "FPU" registers. ptrace() can set them for another task using the PTRACE_SETREGSET request with NT_X86_XSTATE, while rt_sigreturn() can set them for the current task. In either case, registers can be set to any value, but the kernel assumes that the XSAVE area itself remains valid in the sense that the CPU can restore it. However, in the case where the kernel is using the uncompacted xstate format (which it does whenever the XSAVES instruction is unavailable), it was possible for userspace to set the xcomp_bv field in the xstate_header to an arbitrary value. However, all bits in that field are reserved in the uncompacted case, so when switching to a task with nonzero xcomp_bv, the XRSTOR instruction failed with a #GP fault. This caused the WARN_ON_FPU(err) in copy_kernel_to_xregs() to be hit. In addition, since the error is otherwise ignored, the FPU registers from the task previously executing on the CPU were leaked. Fix the bug by checking that the user-supplied value of xcomp_bv is 0 in the uncompacted case, and returning an error otherwise. The reason for validating xcomp_bv rather than simply overwriting it with 0 is that we want userspace to see an error if it (incorrectly) provides an XSAVE area in compacted format rather than in uncompacted format. Note that as before, in case of error we clear the task's FPU state. This is perhaps non-ideal, especially for PTRACE_SETREGSET; it might be better to return an error before changing anything. But it seems the "clear on error" behavior is fine for now, and it's a little tricky to do otherwise because it would mean we couldn't simply copy the full userspace state into kernel memory in one __copy_from_user(). This bug was found by syzkaller, which hit the above-mentioned WARN_ON_FPU(): WARNING: CPU: 1 PID: 0 at ./arch/x86/include/asm/fpu/internal.h:373 __switch_to+0x5b5/0x5d0 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.13.0 #453 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 task: ffff9ba2bc8e42c0 task.stack: ffffa78cc036c000 RIP: 0010:__switch_to+0x5b5/0x5d0 RSP: 0000:ffffa78cc08bbb88 EFLAGS: 00010082 RAX: 00000000fffffffe RBX: ffff9ba2b8bf2180 RCX: 00000000c0000100 RDX: 00000000ffffffff RSI: 000000005cb10700 RDI: ffff9ba2b8bf36c0 RBP: ffffa78cc08bbbd0 R08: 00000000929fdf46 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: ffff9ba2bc8e42c0 R13: 0000000000000000 R14: ffff9ba2b8bf3680 R15: ffff9ba2bf5d7b40 FS: 00007f7e5cb10700(0000) GS:ffff9ba2bf400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000004005cc CR3: 0000000079fd5000 CR4: 00000000001406e0 Call Trace: Code: 84 00 00 00 00 00 e9 11 fd ff ff 0f ff 66 0f 1f 84 00 00 00 00 00 e9 e7 fa ff ff 0f ff 66 0f 1f 84 00 00 00 00 00 e9 c2 fa ff ff <0f> ff 66 0f 1f 84 00 00 00 00 00 e9 d4 fc ff ff 66 66 2e 0f 1f Here is a C reproducer. The expected behavior is that the program spin forever with no output. However, on a buggy kernel running on a processor with the "xsave" feature but without the "xsaves" feature (e.g. Sandy Bridge through Broadwell for Intel), within a second or two the program reports that the xmm registers were corrupted, i.e. were not restored correctly. With CONFIG_X86_DEBUG_FPU=y it also hits the above kernel warning. #define _GNU_SOURCE #include <stdbool.h> #include <inttypes.h> #include <linux/elf.h> #include <stdio.h> #include <sys/ptrace.h> #include <sys/uio.h> #include <sys/wait.h> #include <unistd.h> int main(void) { int pid = fork(); uint64_t xstate[512]; struct iovec iov = { .iov_base = xstate, .iov_len = sizeof(xstate) }; if (pid == 0) { bool tracee = true; for (int i = 0; i < sysconf(_SC_NPROCESSORS_ONLN) && tracee; i++) tracee = (fork() != 0); uint32_t xmm0[4] = { [0 ... 3] = tracee ? 0x00000000 : 0xDEADBEEF }; asm volatile(" movdqu %0, %%xmm0\n" " mov %0, %%rbx\n" "1: movdqu %%xmm0, %0\n" " mov %0, %%rax\n" " cmp %%rax, %%rbx\n" " je 1b\n" : "+m" (xmm0) : : "rax", "rbx", "xmm0"); printf("BUG: xmm registers corrupted! tracee=%d, xmm0=%08X%08X%08X%08X\n", tracee, xmm0[0], xmm0[1], xmm0[2], xmm0[3]); } else { usleep(100000); ptrace(PTRACE_ATTACH, pid, 0, 0); wait(NULL); ptrace(PTRACE_GETREGSET, pid, NT_X86_XSTATE, &iov); xstate[65] = -1; ptrace(PTRACE_SETREGSET, pid, NT_X86_XSTATE, &iov); ptrace(PTRACE_CONT, pid, 0, 0); wait(NULL); } return 1; } Note: the program only tests for the bug using the ptrace() system call. The bug can also be reproduced using the rt_sigreturn() system call, but only when called from a 32-bit program, since for 64-bit programs the kernel restores the FPU state from the signal frame by doing XRSTOR directly from userspace memory (with proper error checking). Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: <stable@vger.kernel.org> [v3.17+] Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Kevin Hao <haokexin@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Halcrow <mhalcrow@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Cc: kernel-hardening@lists.openwall.com Fixes: `0b29643a58` ("x86/xsaves: Change compacted format xsave area header") Link: http://lkml.kernel.org/r/20170922174156.16780-2-ebiggers3@gmail.com Link: http://lkml.kernel.org/r/20170923130016.21448-25-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-09-25 09:26:32 +02:00
kbuild test robot	4f8cef59ba	x86/fpu: Fix boolreturn.cocci warnings arch/x86/kernel/fpu/xstate.c:931:9-10: WARNING: return of 0/1 in function 'xfeatures_mxcsr_quirk' with return type bool Return statements in functions returning bool should use true/false instead of 1/0. Generated by: scripts/coccinelle/misc/boolreturn.cocci Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Cc: kbuild-all@01.org Cc: tipbuild@zytor.com Link: http://lkml.kernel.org/r/20170306004553.GA25764@lkp-wsm-ep1 Link: http://lkml.kernel.org/r/20170923130016.21448-23-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-09-24 13:04:35 +02:00
Rik van Riel	0852b37417	x86/fpu: Add FPU state copying quirk to handle XRSTOR failure on Intel Skylake CPUs On Skylake CPUs I noticed that XRSTOR is unable to deal with states created by copyout_from_xsaves() if the xstate has only SSE/YMM state, and no FP state. That is, xfeatures had XFEATURE_MASK_SSE set, but not XFEATURE_MASK_FP. The reason is that part of the SSE/YMM state lives in the MXCSR and MXCSR_FLAGS fields of the FP state. Ensure that whenever we copy SSE or YMM state around, the MXCSR and MXCSR_FLAGS fields are also copied around. Signed-off-by: Rik van Riel <riel@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/20170210085445.0f1cc708@annuminas.surriel.com Link: http://lkml.kernel.org/r/20170923130016.21448-22-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-09-24 13:04:34 +02:00
Ingo Molnar	99dc26bda2	x86/fpu: Remove struct fpu::fpregs_active The previous changes paved the way for the removal of the fpu::fpregs_active state flag - we now only have the fpu::fpstate_active and fpu::last_cpu fields left. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/20170923130016.21448-21-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-09-24 13:04:34 +02:00
Ingo Molnar	6cf4edbe05	x86/fpu: Decouple fpregs_activate()/fpregs_deactivate() from fpu->fpregs_active The fpregs_activate()/fpregs_deactivate() are currently called in such a pattern: if (!fpu->fpregs_active) fpregs_activate(fpu); ... if (fpu->fpregs_active) fpregs_deactivate(fpu); But note that it's actually safe to call them without checking the flag first. This further decouples the fpu->fpregs_active flag from actual FPU logic. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/20170923130016.21448-20-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-09-24 13:04:34 +02:00
Ingo Molnar	f1c8cd0176	x86/fpu: Change fpu->fpregs_active users to fpu->fpstate_active We want to simplify the FPU state machine by eliminating fpu->fpregs_active, and we can do that because the two state flags (::fpregs_active and ::fpstate_active) are set essentially together. The old lazy FPU switching code used to make a distinction - but there's no lazy switching code anymore, we always switch in an 'eager' fashion. Do this by first changing all substantial uses of fpu->fpregs_active to fpu->fpstate_active and adding a few debug checks to double check our assumption is correct. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/20170923130016.21448-19-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-09-24 13:04:34 +02:00
Ingo Molnar	b6aa85558d	x86/fpu: Split the state handling in fpu__drop() Prepare fpu__drop() to use fpu->fpregs_active. There are two distinct usecases for fpu__drop() in this context: exit_thread() when called for 'current' in exit(), and when called for another task in fork(). This patch does not change behavior, it only adds a couple of debug checks and structures the code to make the ->fpregs_active change more obviously correct. All the complications will be removed later on. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/20170923130016.21448-18-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-09-24 13:04:34 +02:00
Ingo Molnar	a10b6a16cd	x86/fpu: Make the fpu state change in fpu__clear() scheduler-atomic Do this temporarily only, to make it easier to change the FPU state machine, in particular this change couples the fpu->fpregs_active and fpu->fpstate_active states: they are only set/cleared together (as far as the scheduler sees them). This will be removed by later patches. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/20170923130016.21448-17-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-09-24 13:04:33 +02:00
Ingo Molnar	b3a163081c	x86/fpu: Simplify fpu->fpregs_active use The fpregs_active() inline function is pretty pointless - in almost all the callsites it can be replaced with a direct fpu->fpregs_active access. Do so and eliminate the extra layer of obfuscation. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/20170923130016.21448-16-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-09-24 13:04:33 +02:00
Ingo Molnar	6d7f7da553	x86/fpu: Flip the parameter order in copy_*_to_xstate() Make it more consistent with regular memcpy() semantics, where the destination argument comes first. No change in functionality. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/20170923130016.21448-15-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-09-24 13:04:33 +02:00
Ingo Molnar	7b9094c688	x86/fpu: Remove 'kbuf' parameter from the copy_user_to_xstate() API No change in functionality. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/20170923130016.21448-14-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-09-24 13:04:33 +02:00
Ingo Molnar	59dffa4edb	x86/fpu: Remove 'ubuf' parameter from the copy_kernel_to_xstate() API No change in functionality. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/20170923130016.21448-13-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-09-24 13:04:33 +02:00
Ingo Molnar	79fecc2b75	x86/fpu: Split copy_user_to_xstate() into copy_kernel_to_xstate() & copy_user_to_xstate() Similar to: x86/fpu: Split copy_xstate_to_user() into copy_xstate_to_kernel() & copy_xstate_to_user() No change in functionality. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/20170923130016.21448-12-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-09-24 13:04:32 +02:00
Ingo Molnar	8c0817f4a3	x86/fpu: Simplify __copy_xstate_to_kernel() return values __copy_xstate_to_kernel() can only return 0 (because kernel copies cannot fail), simplify the code throughout. No change in functionality. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/20170923130016.21448-11-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-09-24 13:04:32 +02:00
Ingo Molnar	6ff15f8db7	x86/fpu: Change 'size_total' parameter to unsigned and standardize the size checks in copy_xstate_to_*() 'size_total' is derived from an unsigned input parameter - and then converted to 'int' and checked for negative ranges: if (size_total < 0 \|\| offset < size_total) { This conversion and the checks are unnecessary obfuscation, reject overly large requested copy sizes outright and simplify the underlying code. Reported-by: Rik van Riel <riel@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/20170923130016.21448-10-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-09-24 13:04:32 +02:00
Ingo Molnar	56583c9a14	x86/fpu: Clarify parameter names in the copy_xstate_to_() methods Right now there's a confusing mixture of 'offset' and 'size' parameters: - __copy_xstate_to_() input parameter 'end_pos' not not really an offset, but the full size of the copy to be performed. - input parameter 'count' to copy_xstate_to_() shadows that of __copy_xstate_to_()'s 'count' parameter name - but the roles are different: the first one is the total number of bytes to be copied, while the second one is a partial copy size. To unconfuse all this, use a consistent set of parameter names: - 'size' is the partial copy size within a single xstate component - 'size_total' is the total copy requested - 'offset_start' is the requested starting offset. - 'offset' is the offset within an xstate component. No change in functionality. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/20170923130016.21448-9-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-09-24 13:04:32 +02:00
Ingo Molnar	8a5b731889	x86/fpu: Remove the 'start_pos' parameter from the __copy_xstate_to_*() functions 'start_pos' is always 0, so remove it and remove the pointless check of 'pos < 0' which can not ever be true as 'pos' is unsigned ... No change in functionality. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/20170923130016.21448-8-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-09-24 13:04:31 +02:00
Ingo Molnar	becb2bb72f	x86/fpu: Clean up the parameter definitions of copy_xstate_to_*() Remove pointless 'const' of non-pointer input parameter. Remove unnecessary parenthesis that shows uncertainty about arithmetic operator precedence. Clarify copy_xstate_to_user() description. No change in functionality. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/20170923130016.21448-7-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-09-24 13:04:31 +02:00
Ingo Molnar	d7eda6c99c	x86/fpu: Clean up parameter order in the copy_xstate_to_() APIs Parameter ordering is weird: int copy_xstate_to_kernel(unsigned int pos, unsigned int count, void kbuf, struct xregs_state xsave); int copy_xstate_to_user(unsigned int pos, unsigned int count, void __user ubuf, struct xregs_state *xsave); 'pos' and 'count', which are attributes of the destination buffer, are listed before the destination buffer itself ... List them after the primary arguments instead. This makes the code more similar to regular memcpy() variant APIs. No change in functionality. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/20170923130016.21448-6-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-09-24 13:04:31 +02:00
Ingo Molnar	a69c158fb3	x86/fpu: Remove 'kbuf' parameter from the copy_xstate_to_user() APIs The 'kbuf' parameter is unused in the _user() side of the API, remove it. This simplifies the code and makes it easier to think about. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/20170923130016.21448-5-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-09-24 13:04:31 +02:00
Ingo Molnar	4d981cf2d9	x86/fpu: Remove 'ubuf' parameter from the copy_xstate_to_kernel() APIs The 'ubuf' parameter is unused in the _kernel() side of the API, remove it. This simplifies the code and makes it easier to think about. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/20170923130016.21448-4-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-09-24 13:04:31 +02:00
Ingo Molnar	f0d4f30a7f	x86/fpu: Split copy_xstate_to_user() into copy_xstate_to_kernel() & copy_xstate_to_user() copy_xstate_to_user() is a weird API - in part due to a bad API inherited from the regset APIs. But don't propagate that bad API choice into the FPU code - so as a first step split the API into kernel and user buffer handling routines. (Also split the xstate_copyout() internal helper.) The split API is a dumb duplication that should be obviously correct, the real splitting will be done in the next patch. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/20170923130016.21448-3-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-09-24 13:04:30 +02:00
Ingo Molnar	656f083116	x86/fpu: Rename copyin_to_xsaves()/copyout_from_xsaves() to copy_user_to_xstate()/copy_xstate_to_user() The 'copyin/copyout' nomenclature needlessly departs from what the modern FPU code uses, which is: copy_fpregs_to_fpstate() copy_fpstate_to_sigframe() copy_fregs_to_user() copy_fxregs_to_kernel() copy_fxregs_to_user() copy_kernel_to_fpregs() copy_kernel_to_fregs() copy_kernel_to_fxregs() copy_kernel_to_xregs() copy_user_to_fregs() copy_user_to_fxregs() copy_user_to_xregs() copy_xregs_to_kernel() copy_xregs_to_user() I.e. according to this pattern, the following rename should be done: copyin_to_xsaves() -> copy_user_to_xstate() copyout_from_xsaves() -> copy_xstate_to_user() or, if we want to be pedantic, denote that that the user-space format is ptrace: copyin_to_xsaves() -> copy_user_ptrace_to_xstate() copyout_from_xsaves() -> copy_xstate_to_user_ptrace() But I'd suggest the shorter, non-pedantic name. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/20170923130016.21448-2-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-09-24 13:04:30 +02:00
Andy Lutomirski	4ba55e65f4	x86/mm/32: Load a sane CR3 before cpu_init() on secondary CPUs For unknown historical reasons (i.e. Borislav doesn't recall), 32-bit kernels invoke cpu_init() on secondary CPUs with initial_page_table loaded into CR3. Then they set current->active_mm to &init_mm and call enter_lazy_tlb() before fixing CR3. This means that the x86 TLB code gets invoked while CR3 is inconsistent, and, with the improved PCID sanity checks I added, we warn. Fix it by loading swapper_pg_dir (i.e. init_mm.pgd) earlier. Reported-by: Paul Menzel <pmenzel@molgen.mpg.de> Reported-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: `72c0098d92` ("x86/mm: Reinitialize TLB state on hotplug and resume") Link: http://lkml.kernel.org/r/30cdfea504682ba3b9012e77717800a91c22097f.1505663533.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-09-17 18:59:09 +02:00
Andy Lutomirski	b8b7abaed7	x86/mm/32: Move setup_clear_cpu_cap(X86_FEATURE_PCID) earlier Otherwise we might have the PCID feature bit set during cpu_init(). This is just for robustness. I haven't seen any actual bugs here. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: `cba4671af7` ("x86/mm: Disable PCID on 32-bit kernels") Link: http://lkml.kernel.org/r/b16dae9d6b0db5d9801ddbebbfd83384097c61f3.1505663533.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-09-17 18:59:08 +02:00
Linus Torvalds	c44d1ac0c3	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Thomas Gleixner: "A single fix addressing the missing CP8 feature bit in CPUID for a range of AMD ZEN models/mask revisions" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu/AMD: Fix erratum 1076 (CPB bit)	2017-09-17 08:05:54 -07:00
Linus Torvalds	9db59599ae	* PPC bugfixes * RCU splat fix * swait races fix * pointless userspace-triggerable BUG() fix * misc fixes for KVM_RUN corner cases * nested virt correctness fixes + one host DoS * some cleanups * clang build fix * fix AMD AVIC with default QEMU command line options * x86 bugfixes -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJZvANtAAoJEL/70l94x66DtcIH/0i4fenYamxdq2xiWtsZdbcy yfk7mWKEzWGZhP+2X8SSOeetd5mqnIcf2cc4m68UCXpt0zoPEjY0i0D4xrYJHZ03 R3ifqvtpHByodfT7dOKQPEisO8PdJ5tvecaCMnK3u6SNaNLjAZfhobuLppQHOwQO eBvpm0jROpA7ENlDgXtsti8MEdsoWtnmGGrRBY77EGW+t24OpNuGB1EMC0nvcs65 eChwZ3u8xeU5Ws3Y/DiC8tK8t628znknd8ay02LTZjA303Ftoe192jPpS33V4v15 kqS6vUFy2lpr9L6wicZtcnnSLtKv+LqecK6o8cxNjzlkOeaZuo9D8UMYsWQfj6w= =Ma23 -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull more KVM updates from Paolo Bonzini: - PPC bugfixes - RCU splat fix - swait races fix - pointless userspace-triggerable BUG() fix - misc fixes for KVM_RUN corner cases - nested virt correctness fixes + one host DoS - some cleanups - clang build fix - fix AMD AVIC with default QEMU command line options - x86 bugfixes * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (28 commits) kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly kvm: vmx: Handle VMLAUNCH/VMRESUME failure properly kvm: nVMX: Remove nested_vmx_succeed after successful VM-entry kvm,mips: Fix potential swait_active() races kvm,powerpc: Serialize wq active checks in ops->vcpu_kick kvm: Serialize wq active checks in kvm_vcpu_wake_up() kvm,x86: Fix apf_task_wake_one() wq serialization kvm,lapic: Justify use of swait_active() kvm,async_pf: Use swq_has_sleeper() sched/wait: Add swq_has_sleeper() KVM: VMX: Do not BUG() on out-of-bounds guest IRQ KVM: Don't accept obviously wrong gsi values via KVM_IRQFD kvm: nVMX: Don't allow L2 to access the hardware CR8 KVM: trace events: update list of exit reasons KVM: async_pf: Fix #DF due to inject "Page not Present" and "Page Ready" exceptions simultaneously KVM: X86: Don't block vCPU if there is pending exception KVM: SVM: Add irqchip_split() checks before enabling AVIC KVM: Add struct kvm_vcpu pointer parameter to get_enable_apicv() KVM: SVM: Refactor AVIC vcpu initialization into avic_init_vcpu() KVM: x86: fix clang build ...	2017-09-15 15:43:55 -07:00
Davidlohr Bueso	a0cff57bb2	kvm,x86: Fix apf_task_wake_one() wq serialization During code inspection, the following potential race was seen: CPU0 CPU1 kvm_async_pf_task_wait apf_task_wake_one [L] swait_active(&n->wq) [S] prepare_to_swait(&n.wq) [L] if (!hlist_unhahed(&n.link)) schedule() [S] hlist_del_init(&n->link); Properly serialize swait_active() checks such that a wakeup is not missed. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-09-15 16:57:12 +02:00
Borislav Petkov	f7f3dc00f6	x86/cpu/AMD: Fix erratum 1076 (CPB bit) CPUID Fn8000_0007_EDX[CPB] is wrongly 0 on models up to B1. But they do support CPB (AMD's Core Performance Boosting cpufreq CPU feature), so fix that. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sherry Hurwitz <sherry.hurwitz@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170907170821.16021-1-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-09-15 11:30:53 +02:00
Christoph Hellwig	6faadbbb7f	dmi: Mark all struct dmi_system_id instances const ... and __initconst if applicable. Based on similar work for an older kernel in the Grsecurity patch. [JD: fix toshiba-wmi build] [JD: add htcpen] [JD: move __initconst where checkscript wants it] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jean Delvare <jdelvare@suse.de>	2017-09-14 11:59:30 +02:00
K. Y. Srinivasan	213ff44ae4	x86/hyper-V: Allocate the IDT entry early in boot Allocate the hypervisor callback IDT entry early in the boot sequence. The previous code would allocate the entry as part of registering the handler when the vmbus driver loaded, and this caused a problem for the IDT cleanup that Thomas is working on for v4.15. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: apw@canonical.com Cc: devel@linuxdriverproject.org Cc: gregkh@linuxfoundation.org Cc: jasowang@redhat.com Cc: olaf@aepfle.de Link: http://lkml.kernel.org/r/20170908231557.2419-1-kys@exchange.microsoft.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-09-13 11:02:26 +02:00
Juergen Gross	87930019c7	x86/paravirt: Remove no longer used paravirt functions With removal of lguest some of the paravirt functions are no longer needed: ->read_cr4() ->store_idt() ->set_pmd_at() ->set_pud_at() ->pte_update() Remove them. Signed-off-by: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: akataria@vmware.com Cc: boris.ostrovsky@oracle.com Cc: chrisw@sous-sol.org Cc: jeremy@goop.org Cc: rusty@rustcorp.com.au Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/20170904102527.25409-1-jgross@suse.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-09-13 10:55:15 +02:00
Andy Lutomirski	c7ad5ad297	x86/mm/64: Initialize CR4.PCIDE early cpu_init() is weird: it's called rather late (after early identification and after most MMU state is initialized) on the boot CPU but is called extremely early (before identification) on secondary CPUs. It's called just late enough on the boot CPU that its CR4 value isn't propagated to mmu_cr4_features. Even if we put CR4.PCIDE into mmu_cr4_features, we'd hit two problems. First, we'd crash in the trampoline code. That's fixable, and I tried that. It turns out that mmu_cr4_features is totally ignored by secondary_start_64(), though, so even with the trampoline code fixed, it wouldn't help. This means that we don't currently have CR4.PCIDE reliably initialized before we start playing with cpu_tlbstate. This is very fragile and tends to cause boot failures if I make even small changes to the TLB handling code. Make it more robust: initialize CR4.PCIDE earlier on the boot CPU and propagate it to secondary CPUs in start_secondary(). ( Yes, this is ugly. I think we should have improved mmu_cr4_features to actually control CR4 during secondary bootup, but that would be fairly intrusive at this stage. ) Signed-off-by: Andy Lutomirski <luto@kernel.org> Reported-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Tested-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Fixes: `660da7c922` ("x86/mm: Enable CR4.PCIDE on supported systems") Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-09-13 09:54:43 +02:00
Linus Torvalds	680352bda5	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Two fixes: dead code removal, plus a SME memory encryption fix on 32-bit kernels that crashed Xen guests" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu: Remove unused and undefined __generic_processor_info() declaration x86/mm: Make the SME mask a u64	2017-09-12 11:34:39 -07:00
Linus Torvalds	dd198ce714	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull namespace updates from Eric Biederman: "Life has been busy and I have not gotten half as much done this round as I would have liked. I delayed it so that a minor conflict resolution with the mips tree could spend a little time in linux-next before I sent this pull request. This includes two long delayed user namespace changes from Kirill Tkhai. It also includes a very useful change from Serge Hallyn that allows the security capability attribute to be used inside of user namespaces. The practical effect of this is people can now untar tarballs and install rpms in user namespaces. It had been suggested to generalize this and encode some of the namespace information information in the xattr name. Upon close inspection that makes the things that should be hard easy and the things that should be easy more expensive. Then there is my bugfix/cleanup for signal injection that removes the magic encoding of the siginfo union member from the kernel internal si_code. The mips folks reported the case where I had used FPE_FIXME me is impossible so I have remove FPE_FIXME from mips, while at the same time including a return statement in that case to keep gcc from complaining about unitialized variables. I almost finished the work to get make copy_siginfo_to_user a trivial copy to user. The code is available at: git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git neuter-copy_siginfo_to_user-v3 But I did not have time/energy to get the code posted and reviewed before the merge window opened. I was able to see that the security excuse for just copying fields that we know are initialized doesn't work in practice there are buggy initializations that don't initialize the proper fields in siginfo. So we still sometimes copy unitialized data to userspace" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: Introduce v3 namespaced file capabilities mips/signal: In force_fcr31_sig return in the impossible case signal: Remove kernel interal si_code magic fcntl: Don't use ambiguous SIG_POLL si_codes prctl: Allow local CAP_SYS_ADMIN changing exe_file security: Use user_namespace::level to avoid redundant iterations in cap_capable() userns,pidns: Verify the userns for new pid namespaces signal/testing: Don't look for __SI_FAULT in userspace signal/mips: Document a conflict with SI_USER with SIGFPE signal/sparc: Document a conflict with SI_USER with SIGFPE signal/ia64: Document a conflict with SI_USER with SIGFPE signal/alpha: Document a conflict with SI_USER for SIGTRAP	2017-09-11 18:34:47 -07:00
Dou Liyang	e2329b4252	x86/cpu: Remove unused and undefined __generic_processor_info() declaration The following revert: `2b85b3d229` ("x86/acpi: Restore the order of CPU IDs") ... got rid of __generic_processor_info(), but forgot to remove its declaration in mpspec.h. Remove the declaration and update the comments as well. Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: lenb@kernel.org Link: http://lkml.kernel.org/r/1505101403-29100-1-git-send-email-douly.fnst@cn.fujitsu.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-09-11 08:16:37 +02:00
Alexey Dobriyan	9b130ad5bb	treewide: make "nr_cpu_ids" unsigned First, number of CPUs can't be negative number. Second, different signnnedness leads to suboptimal code in the following cases: 1) kmalloc(nr_cpu_ids * sizeof(X)); "int" has to be sign extended to size_t. 2) while (loff_t *pos < nr_cpu_ids) MOVSXD is 1 byte longed than the same MOV. Other cases exist as well. Basically compiler is told that nr_cpu_ids can't be negative which can't be deduced if it is "int". Code savings on allyesconfig kernel: -3KB add/remove: 0/0 grow/shrink: 25/264 up/down: 261/-3631 (-3370) function old new delta coretemp_cpu_online 450 512 +62 rcu_init_one 1234 1272 +38 pci_device_probe 374 399 +25 ... pgdat_reclaimable_pages 628 556 -72 select_fallback_rq 446 369 -77 task_numa_find_cpu 1923 1807 -116 Link: http://lkml.kernel.org/r/20170819114959.GA30580@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-09-08 18:26:48 -07:00
Linus Torvalds	57e88b43b8	Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 platform updates from Ingo Molnar: "The main changes include various Hyper-V optimizations such as faster hypercalls and faster/better TLB flushes - and there's also some Intel-MID cleanups" * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tracing/hyper-v: Trace hyperv_mmu_flush_tlb_others() x86/hyper-v: Support extended CPU ranges for TLB flush hypercalls x86/platform/intel-mid: Make several arrays static, to make code smaller MAINTAINERS: Add missed file for Hyper-V x86/hyper-v: Use hypercall for remote TLB flush hyper-v: Globalize vp_index x86/hyper-v: Implement rep hypercalls hyper-v: Use fast hypercall for HVCALL_SIGNAL_EVENT x86/hyper-v: Introduce fast hypercall implementation x86/hyper-v: Make hv_do_hypercall() inline x86/hyper-v: Include hyperv/ only when CONFIG_HYPERV is set x86/platform/intel-mid: Make 'bt_sfi_data' const x86/platform/intel-mid: Make IRQ allocation a bit more flexible x86/platform/intel-mid: Group timers callbacks together	2017-09-07 09:25:15 -07:00
Andy Lutomirski	1c9fe4409c	x86/mm: Document how CR4.PCIDE restore works While debugging a problem, I thought that using cr4_set_bits_and_update_boot() to restore CR4.PCIDE would be helpful. It turns out to be counterproductive. Add a comment documenting how this works. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-09-06 20:12:57 -07:00
Andy Lutomirski	72c0098d92	x86/mm: Reinitialize TLB state on hotplug and resume When Linux brings a CPU down and back up, it switches to init_mm and then loads swapper_pg_dir into CR3. With PCID enabled, this has the side effect of masking off the ASID bits in CR3. This can result in some confusion in the TLB handling code. If we bring a CPU down and back up with any ASID other than 0, we end up with the wrong ASID active on the CPU after resume. This could cause our internal state to become corrupt, although major corruption is unlikely because init_mm doesn't have any user pages. More obviously, if CONFIG_DEBUG_VM=y, we'll trip over an assertion in the next context switch. The result of that is a failure to resume from suspend with probability 1 - 1/6^(cpus-1). Fix it by reinitializing cpu_tlbstate on resume and CPU bringup. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Jiri Kosina <jikos@kernel.org> Fixes: `10af6235e0` ("x86/mm: Implement PCID based optimization: try to preserve old TLB entries using PCID") Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-09-06 20:12:57 -07:00
Linus Torvalds	53ac64aac9	ACPI updates for v4.14-rc1 - Update the ACPICA code in the kernel to upstream revision 20170728 including: * Alias operator handling update (Bob Moore). * Deferred resolution of reference package elements (Bob Moore). * Support for the _DMA method in walk resources (Bob Moore). * Tables handling update and support for deferred table verification (Lv Zheng). * Update of SMMU models for IORT (Robin Murphy). * Compiler and disassembler updates (Alex James, Erik Schmauss, Ganapatrao Kulkarni, James Morse). * Tools updates (Erik Schmauss, Lv Zheng). * Assorted minor fixes and cleanups (Bob Moore, Kees Cook, Lv Zheng, Shao Ming). - Rework the initialization of non-wakeup GPEs with method handlers in order to address a boot crash on some systems with Thunderbolt devices connected at boot time where we miss an early hotplug event due to a delay in GPE enabling (Rafael Wysocki). - Rework the handling of PCI bridges when setting up ACPI-based device wakeup in order to avoid disabling wakeup for bridges prematurely (Rafael Wysocki). - Consolidate Apple DMI checks throughout the tree, add support for Apple device properties to the device properties framework and use these properties for the handling of I2C and SPI devices on Apple systems (Lukas Wunner). - Add support for _DMA to the ACPI-based device properties lookup code and make it possible to use the information from there to configure DMA regions on ARM64 systems (Lorenzo Pieralisi). - Fix several issues in the APEI code, add support for exporting the BERT error region over sysfs and update APEI MAINTAINERS entry with reviewers information (Borislav Petkov, Dongjiu Geng, Loc Ho, Punit Agrawal, Tony Luck, Yazen Ghannam). - Fix a potential initialization ordering issue in the ACPI EC driver and clean it up somewhat (Lv Zheng). - Update the ACPI SPCR driver to extend the existing XGENE 8250 workaround in it to a new platform (m400) and to work around an Xgene UART clock issue (Graeme Gregory). - Add a new utility function to the ACPI core to support using ACPI OEM ID / OEM Table ID / Revision for system identification in blacklisting or similar and switch over the existing code already using this information to this new interface (Toshi Kani). - Fix an xpower PMIC issue related to GPADC reads that always return 0 without extra pin manipulations (Hans de Goede). - Add statements to print debug messages in a couple of places in the ACPI core for easier diagnostics (Rafael Wysocki). - Clean up the ACPI processor driver slightly (Colin Ian King, Hanjun Guo). - Clean up the ACPI x86 boot code somewhat (Andy Shevchenko). - Add a quirk for Dell OptiPlex 9020M to the ACPI backlight driver (Alex Hung). - Assorted fixes, cleanups and updates related to ACPI (Amitoj Kaur Chawla, Bhumika Goyal, Frank Rowand, Jean Delvare, Punit Agrawal, Ronald Tschalär, Sumeet Pawnikar). -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJZrcE+AAoJEILEb/54YlRxVGAP/RKzkJlYlOIXtMjf4XWg5ZfJ RKZA68E9DW179KoBoTCVPD6/eD5UoEJ7fsWXFU2Hgp2xL3N1mZMAJHgAE4GoAwCx uImoYvQgdPna7DawzRIFkvkfceYxNyh+KaV9s7xne4hAwsB7JzP9yf5Ywll53+oF Le27/r6lDOaWhG7uYcxSabnQsWZQkBF5mj2GPzEpKDIHcLA1Vii0URzm7mAHdZsz vGjYhxrshKYEVdkLSRn536m1rEfp2fqsRJ5wqNAazZJr6Cs1WIfNVuv/RfduRJpG /zHIRAmgKV+3jp39cBpjdnexLczb1rGiCV1yZOvwCNM7jy4evL8vbL7VgcUCopaj fHbF34chNG/hKJd3Zn3RRCTNzCs6bv+txslOMARxji5eyr2Q4KuVnvg5LM4hxOUP 23FvcYkBYWu4QCNLOTnC7y2OqK6WzOvDpfi7hf13Z42iNzeAUbwt1sVF0/OCwL51 Og6blSy2x8FidKp8oaBBboBzHEiKWnXBj/Hw8KEHVcsqZv1ZC6igNRAL3tjxamU8 98/Z2NSZHYPrrrn13tT9ywISYXReXzUF85787+0ofugvDe8/QyBH6UhzzZc/xKVA t329JEjEFZZSLgxMIIa9bXoQANxkeZEGsxN6FfwvQhyIVdagLF3UvCjZl/q2NScC 9n++s32qfUBRHetGODWc =6Ke9 -----END PGP SIGNATURE----- Merge tag 'acpi-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI updates from Rafael Wysocki: "These include a usual ACPICA code update (this time to upstream revision 20170728), a fix for a boot crash on some systems with Thunderbolt devices connected at boot time, a rework of the handling of PCI bridges when setting up device wakeup, new support for Apple device properties, support for DMA configurations reported via ACPI on ARM64, APEI-related updates, ACPI EC driver updates and assorted minor modifications in several places. Specifics: - Update the ACPICA code in the kernel to upstream revision 20170728 including: * Alias operator handling update (Bob Moore). * Deferred resolution of reference package elements (Bob Moore). * Support for the _DMA method in walk resources (Bob Moore). * Tables handling update and support for deferred table verification (Lv Zheng). * Update of SMMU models for IORT (Robin Murphy). * Compiler and disassembler updates (Alex James, Erik Schmauss, Ganapatrao Kulkarni, James Morse). * Tools updates (Erik Schmauss, Lv Zheng). * Assorted minor fixes and cleanups (Bob Moore, Kees Cook, Lv Zheng, Shao Ming). - Rework the initialization of non-wakeup GPEs with method handlers in order to address a boot crash on some systems with Thunderbolt devices connected at boot time where we miss an early hotplug event due to a delay in GPE enabling (Rafael Wysocki). - Rework the handling of PCI bridges when setting up ACPI-based device wakeup in order to avoid disabling wakeup for bridges prematurely (Rafael Wysocki). - Consolidate Apple DMI checks throughout the tree, add support for Apple device properties to the device properties framework and use these properties for the handling of I2C and SPI devices on Apple systems (Lukas Wunner). - Add support for _DMA to the ACPI-based device properties lookup code and make it possible to use the information from there to configure DMA regions on ARM64 systems (Lorenzo Pieralisi). - Fix several issues in the APEI code, add support for exporting the BERT error region over sysfs and update APEI MAINTAINERS entry with reviewers information (Borislav Petkov, Dongjiu Geng, Loc Ho, Punit Agrawal, Tony Luck, Yazen Ghannam). - Fix a potential initialization ordering issue in the ACPI EC driver and clean it up somewhat (Lv Zheng). - Update the ACPI SPCR driver to extend the existing XGENE 8250 workaround in it to a new platform (m400) and to work around an Xgene UART clock issue (Graeme Gregory). - Add a new utility function to the ACPI core to support using ACPI OEM ID / OEM Table ID / Revision for system identification in blacklisting or similar and switch over the existing code already using this information to this new interface (Toshi Kani). - Fix an xpower PMIC issue related to GPADC reads that always return 0 without extra pin manipulations (Hans de Goede). - Add statements to print debug messages in a couple of places in the ACPI core for easier diagnostics (Rafael Wysocki). - Clean up the ACPI processor driver slightly (Colin Ian King, Hanjun Guo). - Clean up the ACPI x86 boot code somewhat (Andy Shevchenko). - Add a quirk for Dell OptiPlex 9020M to the ACPI backlight driver (Alex Hung). - Assorted fixes, cleanups and updates related to ACPI (Amitoj Kaur Chawla, Bhumika Goyal, Frank Rowand, Jean Delvare, Punit Agrawal, Ronald Tschalär, Sumeet Pawnikar)" * tag 'acpi-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (75 commits) ACPI / APEI: Suppress message if HEST not present intel_pstate: convert to use acpi_match_platform_list() ACPI / blacklist: add acpi_match_platform_list() ACPI, APEI, EINJ: Subtract any matching Register Region from Trigger resources ACPI: make device_attribute const ACPI / sysfs: Extend ACPI sysfs to provide access to boot error region ACPI: APEI: fix the wrong iteration of generic error status block ACPI / processor: make function acpi_processor_check_duplicates() static ACPI / EC: Clean up EC GPE mask flag ACPI: EC: Fix possible issues related to EC initialization order ACPI / PM: Add debug statements to acpi_pm_notify_handler() ACPI: Add debug statements to acpi_global_event_handler() ACPI / scan: Enable GPEs before scanning the namespace ACPICA: Make it possible to enable runtime GPEs earlier ACPICA: Dispatch active GPEs at init time ACPI: SPCR: work around clock issue on xgene UART ACPI: SPCR: extend XGENE 8250 workaround to m400 ACPI / LPSS: Don't abort ACPI scan on missing mem resource mailbox: pcc: Drop uninformative output during boot ACPI/IORT: Add IORT named component memory address limits ...	2017-09-05 12:45:03 -07:00
Linus Torvalds	bafb0762cb	Char/Misc drivers for 4.14-rc1 Here is the big char/misc driver update for 4.14-rc1. Lots of different stuff in here, it's been an active development cycle for some reason. Highlights are: - updated binder driver, this brings binder up to date with what shipped in the Android O release, plus some more changes that happened since then that are in the Android development trees. - coresight updates and fixes - mux driver file renames to be a bit "nicer" - intel_th driver updates - normal set of hyper-v updates and changes - small fpga subsystem and driver updates - lots of const code changes all over the driver trees - extcon driver updates - fmc driver subsystem upadates - w1 subsystem minor reworks and new features and drivers added - spmi driver updates Plus a smattering of other minor driver updates and fixes. All of these have been in linux-next with no reported issues for a while. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWa1+Ew8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+yl26wCgquufNylfhxr65NbJrovduJYzRnUAniCivXg8 bePIh/JI5WxWoHK+wEbY =hYWx -----END PGP SIGNATURE----- Merge tag 'char-misc-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver updates from Greg KH: "Here is the big char/misc driver update for 4.14-rc1. Lots of different stuff in here, it's been an active development cycle for some reason. Highlights are: - updated binder driver, this brings binder up to date with what shipped in the Android O release, plus some more changes that happened since then that are in the Android development trees. - coresight updates and fixes - mux driver file renames to be a bit "nicer" - intel_th driver updates - normal set of hyper-v updates and changes - small fpga subsystem and driver updates - lots of const code changes all over the driver trees - extcon driver updates - fmc driver subsystem upadates - w1 subsystem minor reworks and new features and drivers added - spmi driver updates Plus a smattering of other minor driver updates and fixes. All of these have been in linux-next with no reported issues for a while" * tag 'char-misc-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (244 commits) ANDROID: binder: don't queue async transactions to thread. ANDROID: binder: don't enqueue death notifications to thread todo. ANDROID: binder: Don't BUG_ON(!spin_is_locked()). ANDROID: binder: Add BINDER_GET_NODE_DEBUG_INFO ioctl ANDROID: binder: push new transactions to waiting threads. ANDROID: binder: remove proc waitqueue android: binder: Add page usage in binder stats android: binder: fixup crash introduced by moving buffer hdr drivers: w1: add hwmon temp support for w1_therm drivers: w1: refactor w1_slave_show to make the temp reading functionality separate drivers: w1: add hwmon support structures eeprom: idt_89hpesx: Support both ACPI and OF probing mcb: Fix an error handling path in 'chameleon_parse_cells()' MCB: add support for SC31 to mcb-lpc mux: make device_type const char: virtio: constify attribute_group structures. Documentation/ABI: document the nvmem sysfs files lkdtm: fix spelling mistake: "incremeted" -> "incremented" perf: cs-etm: Fix ETMv4 CONFIGR entry in perf.data file nvmem: include linux/err.h from header ...	2017-09-05 11:08:17 -07:00
Linus Torvalds	24e700e291	Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 apic updates from Thomas Gleixner: "This update provides: - Cleanup of the IDT management including the removal of the extra tracing IDT. A first step to cleanup the vector management code. - The removal of the paravirt op adjust_exception_frame. This is a XEN specific issue, but merged through this branch to avoid nasty merge collisions - Prevent dmesg spam about the TSC DEADLINE bug, when the CPU has disabled the TSC DEADLINE timer in CPUID. - Adjust a debug message in the ioapic code to print out the information correctly" * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits) x86/idt: Fix the X86_TRAP_BP gate x86/xen: Get rid of paravirt op adjust_exception_frame x86/eisa: Add missing include x86/idt: Remove superfluous ALIGNment x86/apic: Silence "FW_BUG TSC_DEADLINE disabled due to Errata" on CPUs without the feature x86/idt: Remove the tracing IDT leftovers x86/idt: Hide set_intr_gate() x86/idt: Simplify alloc_intr_gate() x86/idt: Deinline setup functions x86/idt: Remove unused functions/inlines x86/idt: Move interrupt gate initialization to IDT code x86/idt: Move APIC gate initialization to tables x86/idt: Move regular trap init to tables x86/idt: Move IST stack based traps to table init x86/idt: Move debug stack init to table based x86/idt: Switch early trap init to IDT tables x86/idt: Prepare for table based init x86/idt: Move early IDT setup out of 32-bit asm x86/idt: Move early IDT handler setup to IDT code x86/idt: Consolidate IDT invalidation ...	2017-09-04 17:43:56 -07:00
Linus Torvalds	f57091767a	Merge branch 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cache quality monitoring update from Thomas Gleixner: "This update provides a complete rewrite of the Cache Quality Monitoring (CQM) facility. The existing CQM support was duct taped into perf with a lot of issues and the attempts to fix those turned out to be incomplete and horrible. After lengthy discussions it was decided to integrate the CQM support into the Resource Director Technology (RDT) facility, which is the obvious choise as in hardware CQM is part of RDT. This allowed to add Memory Bandwidth Monitoring support on top. As a result the mechanisms for allocating cache/memory bandwidth and the corresponding monitoring mechanisms are integrated into a single management facility with a consistent user interface" * 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits) x86/intel_rdt: Turn off most RDT features on Skylake x86/intel_rdt: Add command line options for resource director technology x86/intel_rdt: Move special case code for Haswell to a quirk function x86/intel_rdt: Remove redundant ternary operator on return x86/intel_rdt/cqm: Improve limbo list processing x86/intel_rdt/mbm: Fix MBM overflow handler during CPU hotplug x86/intel_rdt: Modify the intel_pqr_state for better performance x86/intel_rdt/cqm: Clear the default RMID during hotcpu x86/intel_rdt: Show bitmask of shareable resource with other executing units x86/intel_rdt/mbm: Handle counter overflow x86/intel_rdt/mbm: Add mbm counter initialization x86/intel_rdt/mbm: Basic counting of MBM events (total and local) x86/intel_rdt/cqm: Add CPU hotplug support x86/intel_rdt/cqm: Add sched_in support x86/intel_rdt: Introduce rdt_enable_key for scheduling x86/intel_rdt/cqm: Add mount,umount support x86/intel_rdt/cqm: Add rmdir support x86/intel_rdt: Separate the ctrl bits from rmdir x86/intel_rdt/cqm: Add mon_data x86/intel_rdt: Prepare for RDT monitor data support ...	2017-09-04 13:56:37 -07:00
Linus Torvalds	b1b6f83ac9	Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm changes from Ingo Molnar: "PCID support, 5-level paging support, Secure Memory Encryption support The main changes in this cycle are support for three new, complex hardware features of x86 CPUs: - Add 5-level paging support, which is a new hardware feature on upcoming Intel CPUs allowing up to 128 PB of virtual address space and 4 PB of physical RAM space - a 512-fold increase over the old limits. (Supercomputers of the future forecasting hurricanes on an ever warming planet can certainly make good use of more RAM.) Many of the necessary changes went upstream in previous cycles, v4.14 is the first kernel that can enable 5-level paging. This feature is activated via CONFIG_X86_5LEVEL=y - disabled by default. (By Kirill A. Shutemov) - Add 'encrypted memory' support, which is a new hardware feature on upcoming AMD CPUs ('Secure Memory Encryption', SME) allowing system RAM to be encrypted and decrypted (mostly) transparently by the CPU, with a little help from the kernel to transition to/from encrypted RAM. Such RAM should be more secure against various attacks like RAM access via the memory bus and should make the radio signature of memory bus traffic harder to intercept (and decrypt) as well. This feature is activated via CONFIG_AMD_MEM_ENCRYPT=y - disabled by default. (By Tom Lendacky) - Enable PCID optimized TLB flushing on newer Intel CPUs: PCID is a hardware feature that attaches an address space tag to TLB entries and thus allows to skip TLB flushing in many cases, even if we switch mm's. (By Andy Lutomirski) All three of these features were in the works for a long time, and it's coincidence of the three independent development paths that they are all enabled in v4.14 at once" * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (65 commits) x86/mm: Enable RCU based page table freeing (CONFIG_HAVE_RCU_TABLE_FREE=y) x86/mm: Use pr_cont() in dump_pagetable() x86/mm: Fix SME encryption stack ptr handling kvm/x86: Avoid clearing the C-bit in rsvd_bits() x86/CPU: Align CR3 defines x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages acpi, x86/mm: Remove encryption mask from ACPI page protection type x86/mm, kexec: Fix memory corruption with SME on successive kexecs x86/mm/pkeys: Fix typo in Documentation/x86/protection-keys.txt x86/mm/dump_pagetables: Speed up page tables dump for CONFIG_KASAN=y x86/mm: Implement PCID based optimization: try to preserve old TLB entries using PCID x86: Enable 5-level paging support via CONFIG_X86_5LEVEL=y x86/mm: Allow userspace have mappings above 47-bit x86/mm: Prepare to expose larger address space to userspace x86/mpx: Do not allow MPX if we have mappings above 47-bit x86/mm: Rename tasksize_32bit/64bit to task_size_32bit/64bit() x86/xen: Redefine XEN_ELFNOTE_INIT_P2M using PUD_SIZE * PTRS_PER_PUD x86/mm/dump_pagetables: Fix printout of p4d level x86/mm/dump_pagetables: Generalize address normalization x86/boot: Fix memremap() related build failure ...	2017-09-04 12:21:28 -07:00
Linus Torvalds	e0a195b522	Merge branch 'x86-spinlocks-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 spinlock update from Ingo Molnar: "Convert an NMI lock to raw" [ Clarification: it's not that the lock itself is NMI-safe, it's about NMI registration called from RT contexts - Linus ] * 'x86-spinlocks-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/nmi: Use raw lock	2017-09-04 11:13:56 -07:00
Linus Torvalds	0098410dd6	Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 microcode loading updates from Ingo Molnar: "Update documentation, improve robustness and fix a memory leak" * 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/microcode/intel: Improve microcode patches saving flow x86/microcode: Document the three loading methods x86/microcode/AMD: Free unneeded patch before exit from update_cache()	2017-09-04 11:11:57 -07:00
Linus Torvalds	a1400cdb77	Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpuid updates from Ingo Molnar: "AMD F17h related updates" * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu/amd: Hide unused legacy_fixup_core_id() function x86/cpu/amd: Derive L3 shared_cpu_map from cpu_llc_shared_mask x86/cpu/amd: Limit cpu_core_id fixup to families older than F17h	2017-09-04 11:05:23 -07:00
Linus Torvalds	b0c79f49c3	Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 asm updates from Ingo Molnar: - Introduce the ORC unwinder, which can be enabled via CONFIG_ORC_UNWINDER=y. The ORC unwinder is a lightweight, Linux kernel specific debuginfo implementation, which aims to be DWARF done right for unwinding. Objtool is used to generate the ORC unwinder tables during build, so the data format is flexible and kernel internal: there's no dependency on debuginfo created by an external toolchain. The ORC unwinder is almost two orders of magnitude faster than the (out of tree) DWARF unwinder - which is important for perf call graph profiling. It is also significantly simpler and is coded defensively: there has not been a single ORC related kernel crash so far, even with early versions. (knock on wood!) But the main advantage is that enabling the ORC unwinder allows CONFIG_FRAME_POINTERS to be turned off - which speeds up the kernel measurably: With frame pointers disabled, GCC does not have to add frame pointer instrumentation code to every function in the kernel. The kernel's .text size decreases by about 3.2%, resulting in better cache utilization and fewer instructions executed, resulting in a broad kernel-wide speedup. Average speedup of system calls should be roughly in the 1-3% range - measurements by Mel Gorman [1] have shown a speedup of 5-10% for some function execution intense workloads. The main cost of the unwinder is that the unwinder data has to be stored in RAM: the memory cost is 2-4MB of RAM, depending on kernel config - which is a modest cost on modern x86 systems. Given how young the ORC unwinder code is it's not enabled by default - but given the performance advantages the plan is to eventually make it the default unwinder on x86. See Documentation/x86/orc-unwinder.txt for more details. - Remove lguest support: its intended role was that of a temporary proof of concept for virtualization, plus its removal will enable the reduction (removal) of the paravirt API as well, so Rusty agreed to its removal. (Juergen Gross) - Clean up and fix FSGS related functionality (Andy Lutomirski) - Clean up IO access APIs (Andy Shevchenko) - Enhance the symbol namespace (Jiri Slaby) * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (47 commits) objtool: Handle GCC stack pointer adjustment bug x86/entry/64: Use ENTRY() instead of ALIGN+GLOBAL for stub32_clone() x86/fpu/math-emu: Add ENDPROC to functions x86/boot/64: Extract efi_pe_entry() from startup_64() x86/boot/32: Extract efi_pe_entry() from startup_32() x86/lguest: Remove lguest support x86/paravirt/xen: Remove xen_patch() objtool: Fix objtool fallthrough detection with function padding x86/xen/64: Fix the reported SS and CS in SYSCALL objtool: Track DRAP separately from callee-saved registers objtool: Fix validate_branch() return codes x86: Clarify/fix no-op barriers for text_poke_bp() x86/switch_to/64: Rewrite FS/GS switching yet again to fix AMD CPUs selftests/x86/fsgsbase: Test selectors 1, 2, and 3 x86/fsgsbase/64: Report FSBASE and GSBASE correctly in core dumps x86/fsgsbase/64: Fully initialize FS and GS state in start_thread_common x86/asm: Fix UNWIND_HINT_REGS macro for older binutils x86/asm/32: Fix regs_get_register() on segment registers x86/xen/64: Rearrange the SYSCALL entries x86/asm/32: Remove a bunch of '& 0xffff' from pt_regs segment reads ...	2017-09-04 09:52:57 -07:00
Linus Torvalds	621bee34f6	Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS fix from Ingo Molnar: "A single change fixing SMCA bank initialization on systems that don't have CPU0 enabled" * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce/AMD: Allow any CPU to initialize the smca_banks array	2017-09-04 09:08:56 -07:00
Linus Torvalds	9657752cb5	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "Kernel side changes: - Add branch type profiling/tracing support. (Jin Yao) - Add the PERF_SAMPLE_PHYS_ADDR ABI to allow the tracing/profiling of physical memory addresses, where the PMU supports it. (Kan Liang) - Export some PMU capability details in the new /sys/bus/event_source/devices/cpu/caps/ sysfs directory. (Andi Kleen) - Aux data fixes and updates (Will Deacon) - kprobes fixes and updates (Masami Hiramatsu) - AMD uncore PMU driver fixes and updates (Janakarajan Natarajan) On the tooling side, here's a (limited!) list of highlights - there were many other changes that I could not list, see the shortlog and git history for details: UI improvements: - Implement a visual marker for fused x86 instructions in the annotate TUI browser, available now in 'perf report', more work needed to have it available as well in 'perf top' (Jin Yao) Further explanation from one of Jin's patches: │ ┌──cmpl $0x0,argp_program_version_hook 81.93 │ ├──je 20 │ │ lock cmpxchg %esi,0x38a9a4(%rip) │ │↓ jne 29 │ │↓ jmp 43 11.47 │20:└─→cmpxch %esi,0x38a999(%rip) That means the cmpl+je is a fused instruction pair and they should be considered together. - Record the branch type and then show statistics and info about in callchain entries (Jin Yao) Example from one of Jin's patches: # perf record -g -j any,save_type # perf report --branch-history --stdio --no-children 38.50% div.c:45 [.] main div \| ---main div.c:42 (RET CROSS_2M cycles:2) compute_flag div.c:28 (cycles:2) compute_flag div.c:27 (RET CROSS_2M cycles:1) rand rand.c:28 (cycles:1) rand rand.c:28 (RET CROSS_2M cycles:1) __random random.c:298 (cycles:1) __random random.c:297 (COND_BWD CROSS_2M cycles:1) __random random.c:295 (cycles:1) __random random.c:295 (COND_BWD CROSS_2M cycles:1) __random random.c:295 (cycles:1) __random random.c:295 (RET CROSS_2M cycles:9) namespaces support: - Add initial support for namespaces, using setns to access files in namespaces, grabbing their build-ids, etc. (Krister Johansen) perf trace enhancements: - Beautify pkey_{alloc,free,mprotect} arguments in 'perf trace' (Arnaldo Carvalho de Melo) - Add initial 'clone' syscall args beautifier in 'perf trace' (Arnaldo Carvalho de Melo) - Ignore 'fd' and 'offset' args for MAP_ANONYMOUS in 'perf trace' (Arnaldo Carvalho de Melo) - Beautifiers for the 'cmd' arg of several ioctl types, including: sound, DRM, KVM, vhost virtio and perf_events. (Arnaldo Carvalho de Melo) - Add PERF_SAMPLE_CALLCHAIN and PERF_RECORD_MMAP[2] to 'perf data' CTF conversion, allowing CTF trace visualization tools to show callchains and to resolve symbols (Geneviève Bastien) - Beautify the fcntl syscall, which is an interesting one in the sense that infrastructure had to be put in place to change the formatters of some arguments according to the value in a previous one, i.e. cmd dictates how arg and the syscall return will be formatted. (Arnaldo Carvalho de Melo perf stat enhancements: - Use group read for event groups in 'perf stat', reducing overhead when groups are defined in the event specification, i.e. when using {} to enclose a list of events, asking them to be read at the same time, e.g.: "perf stat -e '{cycles,instructions}'" (Jiri Olsa) pipe mode improvements: - Process tracing data in 'perf annotate' pipe mode (David Carrillo-Cisneros) - Add header record types to pipe-mode, now this command: $ perf record -o - -e cycles sleep 1 \| perf report --stdio --header Will show the same as in non-pipe mode, i.e. involving a perf.data file (David Carrillo-Cisneros) Vendor specific hardware event support updates/enhancements: - Update POWER9 vendor events tables (Sukadev Bhattiprolu) - Add POWER9 PMU events Sukadev (Bhattiprolu) - Support additional POWER8+ PVR in PMU mapfile (Shriya) - Add Skylake server uncore JSON vendor events (Andi Kleen) - Support exporting Intel PT data to sqlite3 with python perf scripts, this is in addition to the postgresql support that was already there (Adrian Hunter)" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (253 commits) perf symbols: Fix plt entry calculation for ARM and AARCH64 perf probe: Fix kprobe blacklist checking condition perf/x86: Fix caps/ for !Intel perf/core, x86: Add PERF_SAMPLE_PHYS_ADDR perf/core, pt, bts: Get rid of itrace_started perf trace beauty: Beautify pkey_{alloc,free,mprotect} arguments tools headers: Sync cpu features kernel ABI headers with tooling headers perf tools: Pass full path of FEATURES_DUMP perf tools: Robustify detection of clang binary tools lib: Allow external definition of CC, AR and LD perf tools: Allow external definition of flex and bison binary names tools build tests: Don't hardcode gcc name perf report: Group stat values on global event id perf values: Zero value buffers perf values: Fix allocation check perf values: Fix thread index bug perf report: Add dump_read function perf record: Set read_format for inherit_stat perf c2c: Fix remote HITM detection for Skylake perf tools: Fix static build with newer toolchains ...	2017-09-04 08:39:02 -07:00
Linus Torvalds	906dde0f35	main drm pull request for 4.14 merge window -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJZpRPIAAoJEAx081l5xIa+kCIP/2m2q0jBmCATvXXwrMBH0zNk 4lm9yIfl9pmluJP97aklvkeKF77chhost76+hv+0sQ9ZsJD8koHWv5WyTHEs7Cfn NpmtGPqYlIZsWNSwW0OFF/XzllgLCVEWa+W/7ryYzPZrSEZr6Ge4HE0qS3LfuLJv K89amZWHkP5ysPZ1uxRBzHtZfNAhdyjYVTUntCR7gj3DYv3yNdeZu+/epfcWK2w/ Q+ggoy644vX/yzy5L5zCGL/J1BjStDuec7sgAKTlNx4TwBUmp2wsfhEdovQBGFiu t5PHMajvrBRqSJWDIAZSUfjQzIMSz517J9LWeChU7KtAClNJQJEabbu4CoX4aEmG UbSzEe0IxnxQ4842jcqQXZ+mevlNIEIBVSNR7dXi17jL3Ts+APQgrYjRJYVk2ipg uQ9TwkeVVu2WRGyU8iRQrXAZI7+O3p4UnbNPjeG2qACD2Ur7Z3n7b0mhNFPOLzO4 gbIv4D6CcUB/vltl+vhZTW3P50oMCVSq8ScCpY8CGo29mZ5vypj5PTS+W8FsyY3Z ypyMqWg/DyxKlOoO+aK8EmXuZmgtDR4kb8asltH/S1A0NZkzjrFkKgs10Cp6EjJy Zz1BWa1KKEpdN6yp+jrbJKjf9MJ7K2RPGv3bxWnCCdNv4j49rk4t3IHqvcihddsd XXFQB5zE7Pz0ROi/VkXR =5fxW -----END PGP SIGNATURE----- Merge tag 'drm-for-v4.14' of git://people.freedesktop.org/~airlied/linux Pull drm updates from Dave Airlie: "This is the main drm pull request for 4.14 merge window. I'm sending this early, as my continuing journey into fatherhood is occurring really soon now, I'm going to be mostly useless for the next couple of weeks, though I may be able to read email, I doubt I'll be doing much patch applications or git sending. If anything urgent pops up I've asked Daniel/Jani/Alex/Sean to try and direct stuff towards you. Outside drm changes: Some rcar-du updates that touch the V4L tree, all acks should be in place. It adds one export to the radix tree code for new i915 use case. There are some minor AGP cleanups (don't see that too often). Changes to the vbox driver in staging to avoid breaking compilation. Summary: core: - Atomic helper fixes - Atomic UAPI fixes - Add YCBCR 4:2:0 support - Drop set_busid hook - Refactor fb_helper locking - Remove a bunch of internal APIs - Add a bunch of better default handlers - Format modifier/blob plane property added - More internal header refactoring - Make more internal API names consistent - Enhanced syncobj APIs (wait/signal/reset/create signalled) bridge: - Add Synopsys Designware MIPI DSI host bridge driver tiny: - Add Pervasive Displays RePaper displays - Add support for LEGO MINDSTORMS EV3 LCD i915: - Lots of GEN10/CNL support patches - drm syncobj support - Skylake+ watermark refactoring - GVT vGPU 48-bit ppgtt support - GVT performance improvements - NOA change ioctl - CCS (color compression) scanout support - GPU reset improvements amdgpu: - Initial hugepage support - BO migration logic rework - Vega10 improvements - Powerplay fixes - Stop reprogramming the MC - Fixes for ACP audio on stoney - SR-IOV fixes/improvements - Command submission overhead improvements amdkfd: - Non-dGPU upstreaming patches - Scratch VA ioctl - Image tiling modes - Update PM4 headers for new firmware - Drop all BUG_ONs. nouveau: - GP108 modesetting support. - Disable MSI on big endian. vmwgfx: - Add fence fd support. msm: - Runtime PM improvements exynos: - NV12MT support - Refactor KMS drivers imx-drm: - Lock scanout channel to improve memory bw - Cleanups etnaviv: - GEM object population fixes tegra: - Prep work for Tegra186 support - PRIME mmap support sunxi: - HDMI support improvements - HDMI CEC support omapdrm: - HDMI hotplug IRQ support - Big driver cleanup - OMAP5 DSI support rcar-du: - vblank fixes - VSP1 updates arcgpu: - Minor fixes stm: - Add STM32 DSI controller driver dw_hdmi: - Add support for Rockchip RK3399 - HDMI CEC support atmel-hlcdc: - Add 8-bit color support vc4: - Atomic fixes - New ioctl to attach a label to a buffer object - HDMI CEC support - Allow userspace to dictate rendering order on submit ioctl" * tag 'drm-for-v4.14' of git://people.freedesktop.org/~airlied/linux: (1074 commits) drm/syncobj: Add a signal ioctl (v3) drm/syncobj: Add a reset ioctl (v3) drm/syncobj: Add a syncobj_array_find helper drm/syncobj: Allow wait for submit and signal behavior (v5) drm/syncobj: Add a CREATE_SIGNALED flag drm/syncobj: Add a callback mechanism for replace_fence (v3) drm/syncobj: add sync obj wait interface. (v8) i915: Use drm_syncobj_fence_get drm/syncobj: Add a race-free drm_syncobj_fence_get helper (v2) drm/syncobj: Rename fence_get to find_fence drm: kirin: Add mode_valid logic to avoid mode clocks we can't generate drm/vmwgfx: Bump the version for fence FD support drm/vmwgfx: Add export fence to file descriptor support drm/vmwgfx: Add support for imported Fence File Descriptor drm/vmwgfx: Prepare to support fence fd drm/vmwgfx: Fix incorrect command header offset at restart drm/vmwgfx: Support the NOP_ERROR command drm/vmwgfx: Restart command buffers after errors drm/vmwgfx: Move irq bottom half processing to threads drm/vmwgfx: Don't use drm_irq_[un]install ...	2017-09-03 17:02:26 -07:00
Rafael J. Wysocki	01d2f105a4	Merge branches 'acpi-x86', 'acpi-soc', 'acpi-pmic' and 'acpi-apple' * acpi-x86: ACPI / boot: Add number of legacy IRQs to debug output ACPI / boot: Correct address space of __acpi_map_table() ACPI / boot: Don't define unused variables * acpi-soc: ACPI / LPSS: Don't abort ACPI scan on missing mem resource * acpi-pmic: ACPI / PMIC: xpower: Do pinswitch magic when reading GPADC * acpi-apple: spi: Use Apple device properties in absence of ACPI resources ACPI / scan: Recognize Apple SPI and I2C slaves ACPI / property: Support Apple _DSM properties ACPI / property: Don't evaluate objects for devices w/o handle treewide: Consolidate Apple DMI checks	2017-09-03 23:54:03 +02:00
Ingo Molnar	c6ef89421e	x86/idt: Fix the X86_TRAP_BP gate Andrei Vagin reported a CRIU regression and bisected it back to: `90f6225fba` ("x86/idt: Move IST stack based traps to table init") This table init conversion loses the system-gate property of X86_TRAP_BP and erroneously moves it from DPL3 to DPL0. Fix it. Reported-by: Andrei Vagin <avagin@virtuozzo.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: dvlasenk@redhat.com Cc: linux-tip-commits@vger.kernel.org Cc: peterz@infradead.org Cc: brgerst@gmail.com Cc: rostedt@goodmis.org Cc: bp@alien8.de Cc: luto@kernel.org Cc: jpoimboe@redhat.com Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: torvalds@linux-foundation.org Cc: tip-bot for Jacob Shin <tipbot@zytor.com> Link: http://lkml.kernel.org/r/20170901082630.xvyi5bwk6etmppqc@gmail.com	2017-09-01 11:04:56 +02:00
Juergen Gross	5878d5d6fd	x86/xen: Get rid of paravirt op adjust_exception_frame When running as Xen pv-guest the exception frame on the stack contains %r11 and %rcx additional to the other data pushed by the processor. Instead of having a paravirt op being called for each exception type prepend the Xen specific code to each exception entry. When running as Xen pv-guest just use the exception entry with prepended instructions, otherwise use the entry without the Xen specific code. [ tglx: Merged through tip to avoid ugly merge conflict ] Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: xen-devel@lists.xenproject.org Cc: boris.ostrovsky@oracle.com Cc: luto@amacapital.net Link: http://lkml.kernel.org/r/20170831174249.26853-1-jg@pfupf.net	2017-08-31 21:35:10 +02:00
Thomas Gleixner	ef1d4deab9	x86/eisa: Add missing include The seperation of the EISA init missed to include linux/io.h which breaks the build with some special configurations. Reported-by: Ingo Molnar <mingo@kernel.org> Fixes: `f7eaf6e00f` ("x86/boot: Move EISA setup to a separate file") Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-08-31 21:34:48 +02:00
Jiri Slaby	04b5de3a8f	x86/idt: Remove superfluous ALIGNment Commit `87e81786b1` ("x86/idt: Move early IDT setup out of 32-bit asm") switched early_ignore_irq to use ENTRY. ENTRY aligns the code, so there is no need for one more ALIGN right before the function. And add one \n after the function to separate it from the data. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Link: http://lkml.kernel.org/r/20170831121653.28917-1-jslaby@suse.cz	2017-08-31 15:47:02 +02:00
Ingo Molnar	3e83dfd5d8	Merge branch 'x86/mm' into x86/platform, to pick up TLB flush dependency Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-31 14:20:06 +02:00
Hans de Goede	594a30fb12	x86/apic: Silence "FW_BUG TSC_DEADLINE disabled due to Errata" on CPUs without the feature When booting 4.13 on a VirtualBox VM on a Skylake host the following error shows up in the logs: [ 0.000000] [Firmware Bug]: TSC_DEADLINE disabled due to Errata; please update microcode to version: 0xb2 (or later) This is caused by apic_check_deadline_errata() only checking CPU model and not the X86_FEATURE_TSC_DEADLINE_TIMER flag (which VirtualBox does NOT export to the guest), combined with VirtualBox not exporting the micro-code version to the guest. This commit adds a check for X86_FEATURE_TSC_DEADLINE_TIMER to apic_check_deadline_errata(), silencing this error on VirtualBox VMs. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Frank Mehnert <frank.mehnert@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Thayer <michael.thayer@oracle.com> Cc: Michal Necasek <michal.necasek@oracle.com> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: `bd9240a18e` ("x86/apic: Add TSC_DEADLINE quirk due to errata") Link: http://lkml.kernel.org/r/20170830105811.27539-1-hdegoede@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-31 11:17:27 +02:00
Thomas Gleixner	facaa3e3c8	x86/idt: Hide set_intr_gate() set_intr_gate() is an internal function of the IDT code. The only user left is the KVM code which replaces the pagefault handler eventually. Provide an explicit update_intr_gate() function and make set_intr_gate() static. While at it replace the magic number 14 in the KVM code with the proper trap define. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064959.663008004@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-29 12:07:29 +02:00
Thomas Gleixner	4447ac1195	x86/idt: Simplify alloc_intr_gate() The only users of alloc_intr_gate() are hypervisors, which both check the used_vectors bitmap whether they have allocated the gate already. Move that check into alloc_intr_gate() and simplify the users. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Juergen Gross <jgross@suse.com> Reviewed-by: K. Y. Srinivasan <kys@microsoft.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064959.580830286@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-29 12:07:28 +02:00
Thomas Gleixner	db18da78f9	x86/idt: Deinline setup functions None of this is performance sensitive in any way - so debloat the kernel. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064959.502052875@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-29 12:07:28 +02:00
Thomas Gleixner	dc20b2d526	x86/idt: Move interrupt gate initialization to IDT code Move the gate intialization from interrupt init to the IDT code so all IDT related operations are at a single place. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064959.340209198@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-29 12:07:28 +02:00
Thomas Gleixner	636a7598f6	x86/idt: Move APIC gate initialization to tables Replace the APIC/SMP vector gate initialization with the table based mechanism. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064959.260177013@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-29 12:07:28 +02:00
Thomas Gleixner	b70543a0b2	x86/idt: Move regular trap init to tables Initialize the regular traps with a table. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064959.182128165@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-29 12:07:27 +02:00
Thomas Gleixner	90f6225fba	x86/idt: Move IST stack based traps to table init Initialize the IST based traps via a table. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064959.091328949@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-29 12:07:27 +02:00
Thomas Gleixner	0a30908b91	x86/idt: Move debug stack init to table based Add the debug_idt init table and make use of it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064959.006502252@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-29 12:07:27 +02:00
Thomas Gleixner	433f8924fa	x86/idt: Switch early trap init to IDT tables Add the initialization table for the early trap setup and replace the early trap init code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064958.929139008@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-29 12:07:27 +02:00
Thomas Gleixner	3318e97442	x86/idt: Prepare for table based init The IDT setup code is handled in several places. All of them use variants of set_intr_gate() inlines. This can be done with a table based initialization, which allows to reduce the inline zoo and puts all IDT related code and information into a single place. Add the infrastructure. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064958.849877032@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-29 12:07:27 +02:00
Thomas Gleixner	87e81786b1	x86/idt: Move early IDT setup out of 32-bit asm The early IDT setup can be done in C code like it's done on 64-bit kernels. Reuse the 64-bit version. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064958.757980775@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-29 12:07:26 +02:00
Thomas Gleixner	588787fde7	x86/idt: Move early IDT handler setup to IDT code The early IDT handler setup is done in C entry code on 64-bit kernels and in ASM entry code on 32-bit kernels. Move the 64-bit variant to the IDT code so it can be shared with 32-bit in the next step. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064958.679561404@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-29 12:07:26 +02:00
Thomas Gleixner	e802a51ede	x86/idt: Consolidate IDT invalidation kexec and reboot have both code to invalidate IDT. Create a common function and use it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064958.600953282@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-29 12:07:26 +02:00
Thomas Gleixner	16bc18d895	x86/idt: Move 32-bit idt_descr to C code 32-bit kernels have the idt_descr defined in the low level assembly entry code, but there is no good reason for that. Move it into the C file and use the 64-bit version of it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064958.445862201@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-29 12:07:26 +02:00
Thomas Gleixner	d8ed9d4826	x86/idt: Create file for IDT related code IDT related code lives scattered around in various places. Create a new source file in arch/x86/kernel/idt.c to hold it. Move the idt_tables and descriptors to it for a start. Follow up patches will gradually move more code over. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064958.367081121@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-29 12:07:25 +02:00
Thomas Gleixner	9a98e77800	x86/asm: Replace access to desc_struct:a/b fields The union inside of desc_struct which allows access to the raw u32 parts of the descriptors. This raw access part is about to go away. Replace the few code parts which access those fields. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064958.120214366@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-29 12:07:25 +02:00
Thomas Gleixner	1dd439fe97	x86/percpu: Use static initializer for GDT entry The IDT cleanup is about to remove pack_descriptor(). The GDT setup for the per-cpu storage can be achieved with the static initializer as well. Replace it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064957.954214927@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-29 12:07:24 +02:00
Thomas Gleixner	a45525b5b4	x86/irq_work: Make it depend on APIC The irq work interrupt vector is only installed when CONFIG_X86_LOCAL_APIC is enabled, but the interrupt handler is compiled in unconditionally. Compile the cruft out when the APIC is disabled. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064957.691909010@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-29 11:42:30 +02:00
Thomas Gleixner	0428e01a2f	x86/ipi: Make platform IPI depend on APIC The platform IPI vector is only installed when the local APIC is enabled. All users of it depend on the local APIC anyway. Make the related code conditional on CONFIG_X86_LOCAL_APIC=y. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064957.615286163@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-29 11:42:29 +02:00
Thomas Gleixner	809547472e	x86/tracing: Disentangle pagefault and resched IPI tracing key The pagefault and the resched IPI handler are the only ones where it is worth to optimize the code further in case tracepoints are disabled. But it makes no sense to have a single static key for both. Seperate the static keys so the facilities are handled seperately. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064957.536699116@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-29 11:42:29 +02:00
Thomas Gleixner	4b9a8dca0e	x86/idt: Remove the tracing IDT completely No more users of the tracing IDT. All exception tracepoints have been moved into the regular handlers. Get rid of the mess which shouldn't have been created in the first place. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064957.378851687@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-29 11:42:28 +02:00
Thomas Gleixner	3cd788c1ee	x86/smp: Use static key for reschedule interrupt tracing It's worth to avoid the extra irq_enter()/irq_exit() pair in the case that the reschedule interrupt tracepoints are disabled. Use the static key which indicates that exception tracing is enabled. For now this key is global. It will be optimized in a later step. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20170828064957.299808677@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-29 11:42:27 +02:00
Thomas Gleixner	85b77cdd8f	x86/smp: Remove pointless duplicated interrupt code Two NOP5s are really a good tradeoff vs. the unholy IDT switching mess, which duplicates code all over the place. The rescheduling interrupt gets optimized in a later step. Make the ordering of function call and statistics increment the same as in other places. Calculate stats first, then do the function call. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064957.222101344@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-29 11:42:27 +02:00
Thomas Gleixner	0f42ae283c	x86/mce: Remove duplicated tracing interrupt code Machine checks are not really high frequency events. The extra two NOP5s for the disabled tracepoints are noise vs. the heavy lifting which needs to be done in the MCE handler. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064957.144301907@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-29 11:42:26 +02:00
Thomas Gleixner	daabb8eb9a	x86/irqwork: Get rid of duplicated tracing interrupt code Two NOP5s are a reasonable tradeoff to avoid duplicated code and the requirement to switch the IDT. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064957.064746737@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-29 11:42:26 +02:00
Thomas Gleixner	61069de7a3	x86/apic: Remove the duplicated tracing versions of interrupts The error and the spurious interrupt are really rare events and not at all performance sensitive: two NOP5s can be tolerated when tracing is disabled. Remove the complication. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20170828064956.986009402@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-29 11:42:25 +02:00
Thomas Gleixner	8a17116b1f	x86/irq: Get rid of duplicated trace_x86_platform_ipi() code Two NOP5s are really a good tradeoff vs. the unholy IDT switching mess, which duplicates code all over the place. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20170828064956.907209383@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-29 11:42:25 +02:00
Thomas Gleixner	3bec6def39	x86/apic: Use this_cpu_ptr() in local_timer_interrupt() Accessing the per cpu data via per_cpu(, smp_processor_id()) is pointless. Use this_cpu_ptr() instead. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064956.829552757@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-29 11:42:24 +02:00
Thomas Gleixner	302a98f896	x86/apic: Remove the duplicated tracing version of local_timer_interrupt() The two NOP5s are noise in the rest of the work which is done by the timer interrupt and modern CPUs are pretty good in optimizing NOPs anyway. Get rid of the interrupt handler duplication and move the tracepoints into the regular handler. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20170828064956.751247330@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-29 11:42:24 +02:00
Thomas Gleixner	11a7ffb017	x86/traps: Simplify pagefault tracing logic Make use of the new irqvector tracing static key and remove the duplicated trace_do_pagefault() implementation. If irq vector tracing is disabled, then the overhead of this is a single NOP5, which is a reasonable tradeoff to avoid duplicated code and the unholy macro mess. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064956.672965407@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-29 11:42:23 +02:00
Thomas Gleixner	2feb1b316d	x86/tracing: Introduce a static key for exception tracing Switching the IDT just for avoiding tracepoints creates a completely impenetrable macro/inline/ifdef mess. There is no point in avoiding tracepoints for most of the traps/exceptions. For the more expensive tracepoints, like pagefaults, this can be handled with an explicit static key. Preparatory patch to remove the tracing IDT. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064956.593094539@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-29 11:42:23 +02:00
Thomas Gleixner	f7eaf6e00f	x86/boot: Move EISA setup to a separate file EISA has absolutely nothing to do with traps, so move it out of traps.c into its own eisa.c file. Furthermore, the EISA bus detection does not need to run during very early boot, it's good enough to run it before the EISA bus and drivers are initialized. I.e. instead of calling it from the very early trap_init() code, make it a subsys_initcall(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064956.515322409@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-29 11:42:22 +02:00
Thomas Gleixner	05161b9cbe	x86/irq: Get rid of the 'first_system_vector' indirection bogosity This variable is beyond pointless. Nothing allocates a vector via alloc_gate() below FIRST_SYSTEM_VECTOR. So nothing can change first_system_vector. If there is a need for a gate below FIRST_SYSTEM_VECTOR then it can be added to the vector defines and FIRST_SYSTEM_VECTOR can be adjusted accordingly. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064956.357109735@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-29 11:42:21 +02:00
Thomas Gleixner	fa4ab5774d	x86/irq: Unexport used_vectors[] No modular users. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064956.278375986@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-29 11:42:20 +02:00
Thomas Gleixner	69de72ec6d	x86/irq: Remove vector_used_by_percpu_irq() Last user (lguest) is gone. Remove it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064956.201432430@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-29 11:42:20 +02:00
Borislav Petkov	aa78c1ccfa	x86/microcode/intel: Improve microcode patches saving flow Avoid potentially dereferencing a NULL pointer when saving a microcode patch for early loading on the application processors. While at it, drop the IS_ERR() checking in favor of simpler, NULL-ptr checks which are sufficient and rename __alloc_microcode_buf() to memdup_patch() to more precisely denote what it does. No functionality change. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: kernel-janitors@vger.kernel.org Link: http://lkml.kernel.org/r/20170825100456.n236w3jebteokfd6@pd.tnic	2017-08-29 10:59:28 +02:00
Greg Kroah-Hartman	9749c37275	Merge 4.13-rc7 into char-misc-next We want the binder fix in here as well for testing and merge issues. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-08-28 10:19:01 +02:00
Ingo Molnar	413d63d71b	Merge branch 'linus' into x86/mm to pick up fixes and to fix conflicts Conflicts: arch/x86/kernel/head64.c arch/x86/mm/mmap.c Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-26 09:19:13 +02:00
Tony Luck	d56593eb5e	x86/intel_rdt: Turn off most RDT features on Skylake Errata list is included in this document: https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/6th-gen-x-series-spec-update.pdf with more details in: https://www.intel.com/content/www/us/en/processors/xeon/scalable/xeon-scalable-spec-update.html But the tl;dr summary (using tags from first of the documents) is: SKZ4 MBM does not accurately track write bandwidth SKZ17 CMT counters may not count accurately SKZ18 CAT may not restrict cacheline allocation under certain conditions SKZ19 MBM counters may undercount Disable all these features on Skylake models. Users who understand the errata may re-enable using boot command line options. Signed-off-by: Tony Luck <tony.luck@intel.com> Cc: Fenghua" <fenghua.yu@intel.com> Cc: Ravi V" <ravi.v.shankar@intel.com> Cc: "Peter Zijlstra" <peterz@infradead.org> Cc: "Stephane Eranian" <eranian@google.com> Cc: "Andi Kleen" <ak@linux.intel.com> Cc: "David Carrillo-Cisneros" <davidcc@google.com> Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com> Link: http://lkml.kernel.org/r/3aea0a3bae219062c812668bd9b7b8f1a25003ba.1503512900.git.tony.luck@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-08-25 22:00:45 +02:00
Tony Luck	1d9807fc64	x86/intel_rdt: Add command line options for resource director technology Command line options allow us to ignore features that we don't want. Also we can re-enable options that have been disabled on a platform (so long as the underlying h/w actually supports the option). [ tglx: Marked the option array __initdata and the helper function __init ] Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Fenghua" <fenghua.yu@intel.com> Cc: Ravi V" <ravi.v.shankar@intel.com> Cc: "Peter Zijlstra" <peterz@infradead.org> Cc: "Stephane Eranian" <eranian@google.com> Cc: "Andi Kleen" <ak@linux.intel.com> Cc: "David Carrillo-Cisneros" <davidcc@google.com> Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com> Link: http://lkml.kernel.org/r/0c37b0d4dbc30977a3c1cee08b66420f83662694.1503512900.git.tony.luck@intel.com	2017-08-25 22:00:45 +02:00
Tony Luck	0576113a38	x86/intel_rdt: Move special case code for Haswell to a quirk function No functional change, but lay the ground work for other per-model quirks. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Fenghua" <fenghua.yu@intel.com> Cc: Ravi V" <ravi.v.shankar@intel.com> Cc: "Peter Zijlstra" <peterz@infradead.org> Cc: "Stephane Eranian" <eranian@google.com> Cc: "Andi Kleen" <ak@linux.intel.com> Cc: "David Carrillo-Cisneros" <davidcc@google.com> Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com> Link: http://lkml.kernel.org/r/f195a83751b5f8b1d8a78bd3c1914300c8fa3142.1503512900.git.tony.luck@intel.com	2017-08-25 22:00:44 +02:00
Thomas Gleixner	c0bb80cfa3	Merge branch 'x86/asm' into x86/apic Pick up dependent changes to avoid merge conflicts	2017-08-25 08:56:22 +02:00
Ingo Molnar	93da8b221d	Merge branch 'linus' into perf/core, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-24 10:12:33 +02:00
Juergen Gross	ecda85e702	x86/lguest: Remove lguest support Lguest seems to be rather unused these days. It has seen only patches ensuring it still builds the last two years and its official state is "Odd Fixes". Remove it in order to be able to clean up the paravirt code. Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: boris.ostrovsky@oracle.com Cc: lguest@lists.ozlabs.org Cc: rusty@rustcorp.com.au Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/20170816173157.8633-3-jgross@suse.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-24 09:57:28 +02:00
raymond pang	adfaf18334	x86/ioapic: Print the IRTE's index field correctly when enabling INTR When enabling interrupt remap, IOAPIC's RTE contains the interrupt_index field of IRTE. This field is composed of the ->index and the ->index2 members of 'struct IR_IO_APIC_route_entry' - but what we print out currently only uses ->index. Fix it. Signed-off-by: Raymond Pang <raymondpangxd@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: joro@8bytes.org Cc: linux-arch@vger.kernel.org Link: http://lkml.kernel.org/r/CAHG4imNDzpDyOVi7MByVrLQ%3DQFuOVqpzJ5F-Xs5z6OZphubj-Q@mail.gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-23 10:17:17 +02:00
Linus Torvalds	7f680d7ec3	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "Another pile of small fixes and updates for x86: - Plug a hole in the SMAP implementation which misses to clear AC on NMI entry - Fix the norandmaps/ADDR_NO_RANDOMIZE logic so the command line parameter works correctly again - Use the proper accessor in the startup64 code for next_early_pgt to prevent accessing of invalid addresses and faulting in the early boot code. - Prevent CPU hotplug lock recursion in the MTRR code - Unbreak CPU0 hotplugging - Rename overly long CPUID bits which got introduced in this cycle - Two commits which mark data 'const' and restrict the scope of data and functions to file scope by making them 'static'" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Constify attribute_group structures x86/boot/64/clang: Use fixup_pointer() to access 'next_early_pgt' x86/elf: Remove the unnecessary ADDR_NO_RANDOMIZE checks x86: Fix norandmaps/ADDR_NO_RANDOMIZE x86/mtrr: Prevent CPU hotplug lock recursion x86: Mark various structures and functions as 'static' x86/cpufeature, kvm/svm: Rename (shorten) the new "virtualized VMSAVE/VMLOAD" CPUID flag x86/smpboot: Unbreak CPU0 hotplug x86/asm/64: Clear AC on NMI entries	2017-08-20 09:36:52 -07:00
Arvind Yadav	45bd07ad82	x86: Constify attribute_group structures attribute_groups are not supposed to change at runtime and none of the groups is modified. Mark the non-const structs as const. [ tglx: Folded into one big patch ] Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: tony.luck@intel.com Cc: bp@alien8.de Link: http://lkml.kernel.org/r/1500550238-15655-2-git-send-email-arvind.yadav.cs@gmail.com	2017-08-18 11:30:35 +02:00
Tony Luck	ce0fa3e56a	x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages Speculative processor accesses may reference any memory that has a valid page table entry. While a speculative access won't generate a machine check, it will log the error in a machine check bank. That could cause escalation of a subsequent error since the overflow bit will be then set in the machine check bank status register. Code has to be double-plus-tricky to avoid mentioning the 1:1 virtual address of the page we want to map out otherwise we may trigger the very problem we are trying to avoid. We use a non-canonical address that passes through the usual Linux table walking code to get to the same "pte". Thanks to Dave Hansen for reviewing several iterations of this. Also see: http://marc.info/?l=linux-mm&m=149860136413338&w=2 Signed-off-by: Tony Luck <tony.luck@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Elliott, Robert (Persistent Memory) <elliott@hpe.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20170816171803.28342-1-tony.luck@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-17 10:30:49 +02:00
Alexander Potapenko	187e91fe5e	x86/boot/64/clang: Use fixup_pointer() to access 'next_early_pgt' __startup_64() is normally using fixup_pointer() to access globals in a position-independent fashion. However 'next_early_pgt' was accessed directly, which wasn't guaranteed to work. Luckily GCC was generating a R_X86_64_PC32 PC-relative relocation for 'next_early_pgt', but Clang emitted a R_X86_64_32S, which led to accessing invalid memory and rebooting the kernel. Signed-off-by: Alexander Potapenko <glider@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Davidson <md@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: `c88d71508e` ("x86/boot/64: Rewrite startup_64() in C") Link: http://lkml.kernel.org/r/20170816190808.131748-1-glider@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-17 09:53:00 +02:00
Scott Wood	c455fd9235	x86/nmi: Use raw lock register_nmi_handler() can be called from PREEMPT_RT atomic context (e.g. wakeup_cpu_via_init_nmi() or native_stop_other_cpus()), and thus ordinary spinlocks cannot be used. Signed-off-by: Scott Wood <swood@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Don Zickus <dzickus@redhat.com> Link: http://lkml.kernel.org/r/20170724213242.27598-1-swood@redhat.com	2017-08-16 20:40:09 +02:00
Colin Ian King	5707b46a42	x86/intel_rdt: Remove redundant ternary operator on return The use of the ternary operator is redundant as ret can never be non-zero at that point. Instead, just return nbytes. Detected by CoverityScan, CID#1452658 ("Logically dead code") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: kernel-janitors@vger.kernel.org Link: http://lkml.kernel.org/r/20170808092859.13021-1-colin.king@canonical.com	2017-08-16 16:20:55 +02:00
Vikas Shivappa	24247aeeab	x86/intel_rdt/cqm: Improve limbo list processing During a mkdir, the entire limbo list is synchronously checked on each package for free RMIDs by sending IPIs. With a large number of RMIDs (SKL has 192) this creates a intolerable amount of work in IPIs. Replace the IPI based checking of the limbo list with asynchronous worker threads on each package which periodically scan the limbo list and move the RMIDs that have: llc_occupancy < threshold_occupancy on all packages to the free list. mkdir now returns -ENOSPC if the free list and the limbo list ere empty or returns -EBUSY if there are RMIDs on the limbo list and the free list is empty. Getting rid of the IPIs also simplifies the data structures and the serialization required for handling the lists. [ tglx: Rewrote changelog ... ] Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: vikas.shivappa@intel.com Cc: ak@linux.intel.com Cc: davidcc@google.com Link: http://lkml.kernel.org/r/1502845243-20454-3-git-send-email-vikas.shivappa@linux.intel.com	2017-08-16 12:05:41 +02:00
Vikas Shivappa	bbc4615e0b	x86/intel_rdt/mbm: Fix MBM overflow handler during CPU hotplug When a CPU is dying, the overflow worker is canceled and rescheduled on a different CPU in the same domain. But if the timer is already about to expire this essentially doubles the interval which might result in a non detected overflow. Cancel the overflow worker and reschedule it immediately on a different CPU in same domain. The work could be flushed as well, but that would reschedule it on the same CPU. [ tglx: Rewrote changelog once again ] Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: vikas.shivappa@intel.com Cc: ak@linux.intel.com Cc: davidcc@google.com Link: http://lkml.kernel.org/r/1502845243-20454-2-git-send-email-vikas.shivappa@linux.intel.com	2017-08-16 12:05:41 +02:00
Thomas Gleixner	84393817db	x86/mtrr: Prevent CPU hotplug lock recursion Larry reported a CPU hotplug lock recursion in the MTRR code. ============================================ WARNING: possible recursive locking detected systemd-udevd/153 is trying to acquire lock: (cpu_hotplug_lock.rw_sem){.+.+.+}, at: [<c030fc26>] stop_machine+0x16/0x30 but task is already holding lock: (cpu_hotplug_lock.rw_sem){.+.+.+}, at: [<c0234353>] mtrr_add_page+0x83/0x470 .... cpus_read_lock+0x48/0x90 stop_machine+0x16/0x30 mtrr_add_page+0x18b/0x470 mtrr_add+0x3e/0x70 mtrr_add_page() holds the hotplug rwsem already and calls stop_machine() which acquires it again. Call stop_machine_cpuslocked() instead. Reported-and-tested-by: Larry Finger <Larry.Finger@lwfinger.net> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1708140920250.1865@nanos Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Borislav Petkov <bp@suse.de>	2017-08-15 13:03:47 +02:00
Dave Airlie	0c697fafc6	Linux 4.13-rc5 -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJZkNpUAAoJEHm+PkMAQRiGr68H/2nr8kxpoUhZ7eA5C71waCjh gnJSevkzJAp+fCb0KfQFAp1qvpmLLle4e6tAxYgTQZg4Z3W5cJJNfxu9TzY5sGuL o9QUr43XzABepW4e4jhRtZv6dj3K6XruNeDQKXDZTDcc/S8zoiS/Pltq7VgPcAuM kX+3qsNdUyknngD6b0z9NtJkb0mHKY6J8MpraWRO34egDwsaN/tuhRj0DRQpCoyQ x/k+hMbc9MB9Dn8cfACo6Omb+r5Rfd7dTBUAju/TnIIgs//9voHba307N7XvLJZg kWc8MqMQQZXfRZHB0atpDMHyZS/XQRlNPXj76j0+Ud/byODKTFkkazmgTpALvj8= =CxeU -----END PGP SIGNATURE----- Backmerge tag 'v4.13-rc5' into drm-next Linux 4.13-rc5 There's a really nasty nouveau collision, hopefully someone can take a look once I pushed this out.	2017-08-15 16:16:58 +10:00
Greg Kroah-Hartman	d985524680	Merge 4.13-rc5 into char-misc-next We want the firmware, and other changes, in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-08-14 13:29:31 -07:00
Vikas Shivappa	a9110b552d	x86/intel_rdt: Modify the intel_pqr_state for better performance Currently we have pqr_state and rdt_default_state which store the cached CLOSID/RMIDs and the user configured cpu default values respectively. We touch both of these during context switch. Put all of them in one structure so that we can spare a cache line. Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: sai.praneeth.prakhya@intel.com Cc: ak@linux.intel.com Cc: davidcc@google.com Link: http://lkml.kernel.org/r/1502304395-7166-3-git-send-email-vikas.shivappa@linux.intel.com	2017-08-14 11:47:47 +02:00
Vikas Shivappa	eda61c265f	x86/intel_rdt/cqm: Clear the default RMID during hotcpu The user configured per cpu default RMID is not cleared during cpu hotplug. This may lead to incorrect RMID values after a cpu goes offline and again comes back online. Clear the per cpu default RMID during cpu offline and online handling. Reported-by: Prakyha Sai Praneeth <sai.praneeth.prakhya@intel.com> Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: ak@linux.intel.com Cc: davidcc@google.com Link: http://lkml.kernel.org/r/1502304395-7166-2-git-send-email-vikas.shivappa@linux.intel.com	2017-08-14 11:47:46 +02:00
Arnd Bergmann	aac64f7de9	x86/cpu/amd: Hide unused legacy_fixup_core_id() function The newly introduced function is only used when CONFIG_SMP is set: arch/x86/kernel/cpu/amd.c:305:13: warning: 'legacy_fixup_core_id' defined but not used This moves the existing #ifdef around the caller so it covers legacy_fixup_core_id() as well. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@suse.de> Cc: Emanuel Czirai <icanrealizeum@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Yazen Ghannam <Yazen.Ghannam@amd.com> Fixes: `b89b41d0b8` ("x86/cpu/amd: Limit cpu_core_id fixup to families older than F17h") Link: http://lkml.kernel.org/r/20170811111937.2006128-1-arnd@arndb.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-11 14:58:34 +02:00
Doug Smythies	8e2f3bce05	cpufreq: x86: Disable interrupts during MSRs reading According to Intel 64 and IA-32 Architectures SDM, Volume 3, Chapter 14.2, "Software needs to exercise care to avoid delays between the two RDMSRs (for example interrupts)". So, disable interrupts during reading MSRs IA32_APERF and IA32_MPERF. See also: commit `4ab60c3f32` (cpufreq: intel_pstate: Disable interrupts during MSRs reading). Signed-off-by: Doug Smythies <dsmythies@telus.net> Reviewed-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2017-08-11 01:27:41 +02:00
Vitaly Kuznetsov	2ffd9e33ce	x86/hyper-v: Use hypercall for remote TLB flush Hyper-V host can suggest us to use hypercall for doing remote TLB flush, this is supposed to work faster than IPIs. Implementation details: to do HvFlushVirtualAddress{Space,List} hypercalls we need to put the input somewhere in memory and we don't really want to have memory allocation on each call so we pre-allocate per cpu memory areas on boot. pv_ops patching is happening very early so we need to separate hyperv_setup_mmu_ops() and hyper_alloc_mmu(). It is possible and easy to implement local TLB flushing too and there is even a hint for that. However, I don't see a room for optimization on the host side as both hypercall and native tlb flush will result in vmexit. The hint is also not set on modern Hyper-V versions. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Reviewed-by: Stephen Hemminger <sthemmin@microsoft.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Jork Loeser <Jork.Loeser@microsoft.com> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Simon Xiao <sixiao@microsoft.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: devel@linuxdriverproject.org Link: http://lkml.kernel.org/r/20170802160921.21791-8-vkuznets@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-10 20:16:44 +02:00
Suravee Suthikulpanit	2b83809a5e	x86/cpu/amd: Derive L3 shared_cpu_map from cpu_llc_shared_mask For systems with X86_FEATURE_TOPOEXT, current logic uses the APIC ID to calculate shared_cpu_map. However, APIC IDs are not guaranteed to be contiguous for cores across different L3s (e.g. family17h system w/ downcore configuration). This breaks the logic, and results in an incorrect L3 shared_cpu_map. Instead, always use the previously calculated cpu_llc_shared_mask of each CPU to derive the L3 shared_cpu_map. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170731085159.9455-3-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-10 17:37:43 +02:00
Suravee Suthikulpanit	b89b41d0b8	x86/cpu/amd: Limit cpu_core_id fixup to families older than F17h Current cpu_core_id fixup causes downcored F17h configurations to be incorrect: NODE: 0 processor 0 core id : 0 processor 1 core id : 1 processor 2 core id : 2 processor 3 core id : 4 processor 4 core id : 5 processor 5 core id : 0 NODE: 1 processor 6 core id : 2 processor 7 core id : 3 processor 8 core id : 4 processor 9 core id : 0 processor 10 core id : 1 processor 11 core id : 2 Code that relies on the cpu_core_id, like match_smt(), for example, which builds the thread siblings masks used by the scheduler, is mislead. So, limit the fixup to pre-F17h machines. The new value for cpu_core_id for F17h and later will represent the CPUID_Fn8000001E_EBX[CoreId], which is guaranteed to be unique for each core within a socket. This way we have: NODE: 0 processor 0 core id : 0 processor 1 core id : 1 processor 2 core id : 2 processor 3 core id : 4 processor 4 core id : 5 processor 5 core id : 6 NODE: 1 processor 6 core id : 8 processor 7 core id : 9 processor 8 core id : 10 processor 9 core id : 12 processor 10 core id : 13 processor 11 core id : 14 Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> [ Heavily massaged. ] Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yazen Ghannam <Yazen.Ghannam@amd.com> Link: http://lkml.kernel.org/r/20170731085159.9455-2-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-10 17:37:43 +02:00
Peter Zijlstra	01651324ed	x86: Clarify/fix no-op barriers for text_poke_bp() So I was looking at text_poke_bp() today and I couldn't make sense of the barriers there. How's for something like so? Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: Jiri Kosina <jkosina@suse.cz> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: masami.hiramatsu.pt@hitachi.com Link: http://lkml.kernel.org/r/20170731102154.f57cvkjtnbmtctk6@hirez.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-10 17:35:19 +02:00
Andy Lutomirski	e137a4d8f4	x86/switch_to/64: Rewrite FS/GS switching yet again to fix AMD CPUs Switching FS and GS is a mess, and the current code is still subtly wrong: it assumes that "Loading a nonzero value into FS sets the index and base", which is false on AMD CPUs if the value being loaded is 1, 2, or 3. (The current code came from commit `3e2b68d752` ("x86/asm, sched/x86: Rewrite the FS and GS context switch code"), which made it better but didn't fully fix it.) Rewrite it to be much simpler and more obviously correct. This should fix it fully on AMD CPUs and shouldn't adversely affect performance. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Chang Seok <chang.seok.bae@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-10 17:15:13 +02:00
Andy Lutomirski	767d035d83	x86/fsgsbase/64: Fully initialize FS and GS state in start_thread_common execve used to leak FSBASE and GSBASE on AMD CPUs. Fix it. The security impact of this bug is small but not quite zero -- it could weaken ASLR when a privileged task execs a less privileged program, but only if program changed bitness across the exec, or the child binary was highly unusual or actively malicious. A child program that was compromised after the exec would not have access to the leaked base. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Chang Seok <chang.seok.bae@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-10 17:15:13 +02:00
Vitaly Kuznetsov	10e66760fa	x86/smpboot: Unbreak CPU0 hotplug A hang on CPU0 onlining after a preceding offlining is observed. Trace shows that CPU0 is stuck in check_tsc_sync_target() waiting for source CPU to run check_tsc_sync_source() but this never happens. Source CPU, in its turn, is stuck on synchronize_sched() which is called from native_cpu_up() -> do_boot_cpu() -> unregister_nmi_handler(). So it's a classic ABBA deadlock, due to the use of synchronize_sched() in unregister_nmi_handler(). Fix the bug by moving unregister_nmi_handler() from do_boot_cpu() to native_cpu_up() after cpu onlining is done. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170803105818.9934-1-vkuznets@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-10 17:02:47 +02:00
Masami Hiramatsu	d9f5f32a7d	kprobes/x86: Do not jump-optimize kprobes on irq entry code Since the kernel segment registers are not prepared at the entry of irq-entry code, if a kprobe on such code is jump-optimized, accessing per-CPU variables may cause a kernel panic. However, if the kprobe is not optimized, it triggers an int3 exception and sets segment registers correctly. With this patch we check the probe-address and if it is in the irq-entry code, it prohibits optimizing such kprobes. This means we can continue probing such interrupt handlers by kprobes but it is not optimized anymore. Reported-by: Francis Deslauriers <francis.deslauriers@efficios.com> Tested-by: Francis Deslauriers <francis.deslauriers@efficios.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: Chris Zankel <chris@zankel.net> Cc: David S . Miller <davem@davemloft.net> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Mikael Starvik <starvik@axis.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: linux-arch@vger.kernel.org Cc: linux-cris-kernel@axis.com Cc: mathieu.desnoyers@efficios.com Link: http://lkml.kernel.org/r/150172795654.27216.9824039077047777477.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-10 16:28:53 +02:00
Masami Hiramatsu	229a718605	irq: Make the irqentry text section unconditional Generate irqentry and softirqentry text sections without any Kconfig dependencies. This will add extra sections, but there should be no performace impact. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: Chris Zankel <chris@zankel.net> Cc: David S . Miller <davem@davemloft.net> Cc: Francis Deslauriers <francis.deslauriers@efficios.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Mikael Starvik <starvik@axis.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: linux-arch@vger.kernel.org Cc: linux-cris-kernel@axis.com Cc: mathieu.desnoyers@efficios.com Link: http://lkml.kernel.org/r/150172789110.27216.3955739126693102122.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-10 16:28:53 +02:00
Ingo Molnar	1d0f49e140	Merge branch 'x86/urgent' into x86/asm, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-10 13:14:15 +02:00
Linus Torvalds	6999507416	KVM fixes for v4.13-rc4 ARM: - Yet another race with VM destruction plugged - A set of small vgic fixes x86: - Preserve pending INIT - RCU fixes in paravirtual async pf, VM teardown, and VMXOFF emulation - nVMX interrupt injection and dirty tracking fixes - initialize to make UBSAN happy -----BEGIN PGP SIGNATURE----- iQEcBAABCAAGBQJZhNZ2AAoJEED/6hsPKofoKDEH/iIw1pcgdEW2NP/kFtKXSMCK josdFwGPQMjBGzx6No4tfMCNDOjW2FKYXapN6CASAqMJo5H2krj8VHMVwm0h3lUl 4RdbbkFTdfl/Znp8M39efFheWrjX+L37AltKV7xAgA7n8cO39KV4RReimzSc7aVq 5dDt4k0dbF9/zXHxkGiKEhwaSSbZEEznQQ/09annSoOVe6om5esUrUtnUF5P99uz IhAsmJbZxE5VmowjT5MjaR1mXSLLNL55HWKvkf3B3ZGnyxQU+3Vz7IGf2Ma2j+jV IrdXA11NHDY1anDYgDhFlr3rTCPu9CBmTv4O8zsDRlX9TGpr8bBX2dvjRKl7uOo= =KM80 -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM fixes from Radim Krčmář: "ARM: - Yet another race with VM destruction plugged - A set of small vgic fixes x86: - Preserve pending INIT - RCU fixes in paravirtual async pf, VM teardown, and VMXOFF emulation - nVMX interrupt injection and dirty tracking fixes - initialize to make UBSAN happy" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: arm/arm64: vgic: Use READ_ONCE fo cmpxchg KVM: nVMX: Fix interrupt window request with "Acknowledge interrupt on exit" KVM: nVMX: mark vmcs12 pages dirty on L2 exit kvm: nVMX: don't flush VMCS12 during VMXOFF or VCPU teardown KVM: nVMX: do not pin the VMCS12 KVM: avoid using rcu_dereference_protected KVM: X86: init irq->level in kvm_pv_kick_cpu_op KVM: X86: Fix loss of pending INIT due to race KVM: async_pf: make rcu irq exit if not triggered from idle task KVM: nVMX: fixes to nested virt interrupt injection KVM: nVMX: do not fill vm_exit_intr_error_code in prepare_vmcs12 KVM: arm/arm64: Handle hva aging while destroying the vm KVM: arm/arm64: PMU: Fix overflow interrupt injection KVM: arm/arm64: Fix bug in advertising KVM_CAP_MSI_DEVID capability	2017-08-04 15:18:27 -07:00
Linus Torvalds	0d5b994407	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Thomas Gleixner: "The recent irq core changes unearthed API abuse in the HPET code, which manifested itself in a suspend/resume regression. The fix replaces the cruft with the proper function calls and cures the regression" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/hpet: Cure interface abuse in the resume path	2017-08-04 15:16:09 -07:00
Lukas Wunner	630b3aff8a	treewide: Consolidate Apple DMI checks We're about to amend ACPI bus scan with DMI checks whether we're running on a Mac to support Apple device properties in AML. The DMI checks are performed for every single device, adding overhead for everything x86 that isn't Apple, which is the majority. Rafael and Andy therefore request to perform the DMI match only once and cache the result. Outside of ACPI various other Apple DMI checks exist and it seems reasonable to use the cached value there as well. Rafael, Andy and Darren suggest performing the DMI check in arch code and making it available with a header in include/linux/platform_data/x86/. To this end, add early_platform_quirks() to arch/x86/kernel/quirks.c to perform the DMI check and invoke it from setup_arch(). Switch over all existing Apple DMI checks, thereby fixing two deficiencies: * They are now #defined to false on non-x86 arches and can thus be optimized away if they're located in cross-arch code. * Some of them only match "Apple Inc." but not "Apple Computer, Inc.", which is used by BIOSes released between January 2006 (when the first x86 Macs started shipping) and January 2007 (when the company name changed upon introduction of the iPhone). Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Suggested-by: Darren Hart <dvhart@infradead.org> Signed-off-by: Lukas Wunner <lukas@wunner.de> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2017-08-03 23:26:22 +02:00
Rafael J. Wysocki	8a05c3115d	Merge branches 'pm-cpufreq-x86', 'pm-cpufreq-docs' and 'intel_pstate' * pm-cpufreq-x86: cpufreq: x86: Make scaling_cur_freq behave more as expected * pm-cpufreq-docs: cpufreq: docs: Add missing cpuinfo_cur_freq description * intel_pstate: cpufreq: intel_pstate: Drop ->get from intel_pstate structure	2017-08-03 20:29:24 +02:00
Fenghua Yu	0dd2d7494c	x86/intel_rdt: Show bitmask of shareable resource with other executing units CPUID.(EAX=0x10, ECX=res#):EBX[31:0] reports a bit mask for a resource. Each set bit within the length of the CBM indicates the corresponding unit of the resource allocation may be used by other entities in the platform (e.g. an integrated graphics engine or hardware units outside the processor core and have direct access to the resource). Each cleared bit within the length of the CBM indicates the corresponding allocation unit can be configured to implement a priority-based allocation scheme without interference with other hardware agents in the system. Bits outside the length of the CBM are reserved. More details on the bit mask are described in x86 Software Developer's Manual. The bitmask is shown in "info" directory for each resource. It's up to user to decide how to use the bitmask within a CBM in a partition to share or isolate a resource with other executing units. Suggested-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: ak@linux.intel.com Cc: davidcc@google.com Cc: vikas.shivappa@linux.intel.com Link: http://lkml.kernel.org/r/20170725223904.12996-1-tony.luck@intel.com	2017-08-01 22:41:30 +02:00
Vikas Shivappa	e33026831b	x86/intel_rdt/mbm: Handle counter overflow Set up a delayed work queue for each domain that will read all the MBM counters of active RMIDs once per second to make sure that they don't wrap around between reads from users. [Tony: Added the initializations for the work structure and completed the patch] Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: vikas.shivappa@intel.com Cc: ak@linux.intel.com Cc: davidcc@google.com Cc: reinette.chatre@intel.com Link: http://lkml.kernel.org/r/1501017287-28083-29-git-send-email-vikas.shivappa@linux.intel.com	2017-08-01 22:41:29 +02:00
Vikas Shivappa	a4de1dfdd7	x86/intel_rdt/mbm: Add mbm counter initialization MBM counters are monotonically increasing counts representing the total memory bytes at a particular time. In order to calculate total_bytes for an rdtgroup, we store the value of the counter when we create an rdtgroup or when a new domain comes online. When the total_bytes(all memory controller bytes) or local_bytes(local memory controller bytes) file in "mon_data" is read it shows the total bytes for that rdtgroup since its creation. User can snapshot this at different time intervals to obtain bytes/second. Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: vikas.shivappa@intel.com Cc: ak@linux.intel.com Cc: davidcc@google.com Cc: reinette.chatre@intel.com Link: http://lkml.kernel.org/r/1501017287-28083-28-git-send-email-vikas.shivappa@linux.intel.com	2017-08-01 22:41:29 +02:00
Tony Luck	9f52425ba3	x86/intel_rdt/mbm: Basic counting of MBM events (total and local) Check CPUID bits for whether each of the MBM events is supported. Allocate space for each RMID for each counter in each domain to save previous MSR counter value and running total of data. Create files in each of the monitor directories. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: vikas.shivappa@intel.com Cc: ak@linux.intel.com Cc: davidcc@google.com Cc: reinette.chatre@intel.com Link: http://lkml.kernel.org/r/1501017287-28083-27-git-send-email-vikas.shivappa@linux.intel.com	2017-08-01 22:41:28 +02:00
Vikas Shivappa	895c663ece	x86/intel_rdt/cqm: Add CPU hotplug support Resource groups have a per domain directory under "mon_data". Add or remove these directories as and when domains come online and go offline. Also update the per cpu rmids and cache upon onlining and offlining cpus. Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: vikas.shivappa@intel.com Cc: ak@linux.intel.com Cc: davidcc@google.com Cc: reinette.chatre@intel.com Link: http://lkml.kernel.org/r/1501017287-28083-26-git-send-email-vikas.shivappa@linux.intel.com	2017-08-01 22:41:28 +02:00
Vikas Shivappa	748b6b881c	x86/intel_rdt/cqm: Add sched_in support OS associates an RMID/CLOSid to a task by writing the per CPU IA32_PQR_ASSOC MSR when a task is scheduled in. The sched_in code will stay as no-op unless we are running on Intel SKU which supports either resource control or monitoring and we also enable them by mounting the resctrl fs. The per cpu CLOSid/RMID values are cached and the write is performed only when a task with a different CLOSid/RMID is scheduled in. Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: vikas.shivappa@intel.com Cc: ak@linux.intel.com Cc: davidcc@google.com Cc: reinette.chatre@intel.com Link: http://lkml.kernel.org/r/1501017287-28083-25-git-send-email-vikas.shivappa@linux.intel.com	2017-08-01 22:41:28 +02:00
Vikas Shivappa	4af4a88e0c	x86/intel_rdt/cqm: Add mount,umount support Add monitoring support during mount and unmount. Since root directory is a "ctrl_mon" directory which can control and monitor resources create the "mon_groups" directory which can hold monitor groups and a "mon_data" directory which would hold all monitoring data like the rest of resource groups. The mount succeeds if either of monitoring or control/allocation is enabled. If only monitoring is enabled user can still create monitor groups under the "/sys/fs/resctrl/mon_groups/" and any mkdir under root would fail. If only control/allocation is enabled all of the monitoring related directories/files would not exist and resctrl would work in legacy mode. Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: vikas.shivappa@intel.com Cc: ak@linux.intel.com Cc: davidcc@google.com Cc: reinette.chatre@intel.com Link: http://lkml.kernel.org/r/1501017287-28083-23-git-send-email-vikas.shivappa@linux.intel.com	2017-08-01 22:41:27 +02:00
Vikas Shivappa	f3cbeacaa0	x86/intel_rdt/cqm: Add rmdir support Resource groups (ctrl_mon and monitor groups) are represented by directories in resctrl fs. Add support to remove the directories. When a ctrl_mon directory is removed all the cpus and tasks are assigned back to the root rdtgroup. When a monitor group is removed the cpus and tasks are returned to the parent control group. Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: vikas.shivappa@intel.com Cc: ak@linux.intel.com Cc: davidcc@google.com Cc: reinette.chatre@intel.com Link: http://lkml.kernel.org/r/1501017287-28083-22-git-send-email-vikas.shivappa@linux.intel.com	2017-08-01 22:41:27 +02:00
Vikas Shivappa	f9049547f7	x86/intel_rdt: Separate the ctrl bits from rmdir Re-factor the code to separate the ctrl group removal from the rmdir to prepare to add RDT monitoring group removal. Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: vikas.shivappa@intel.com Cc: ak@linux.intel.com Cc: davidcc@google.com Cc: reinette.chatre@intel.com Link: http://lkml.kernel.org/r/1501017287-28083-21-git-send-email-vikas.shivappa@linux.intel.com	2017-08-01 22:41:26 +02:00
Vikas Shivappa	d89b737901	x86/intel_rdt/cqm: Add mon_data Add a mon_data directory for the root rdtgroup and all other rdtgroups. The directory holds all of the monitored data for all domains and events of all resources being monitored. The mon_data itself has a list of directories in the format mon_<domain_name>_<domain_id>. Each of these subdirectories contain one file per event in the mode "0444". Reading the file displays a snapshot of the monitored data for the event the file represents. For ex, on a 2 socket Broadwell with llc_occupancy being monitored the mon_data contents look as below: $ ls /sys/fs/resctrl/p1/mon_data/ mon_L3_00 mon_L3_01 Each domain directory has one file per event: $ ls /sys/fs/resctrl/p1/mon_data/mon_L3_00/ llc_occupancy To read current llc_occupancy of ctrl_mon group p1 $ cat /sys/fs/resctrl/p1/mon_data/mon_L3_00/llc_occupancy 33789096 [This patch idea is based on Tony's sample patches to organise data in a per domain directory and have one file per event (and use the fp->priv to store mon data bits)] Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: vikas.shivappa@intel.com Cc: ak@linux.intel.com Cc: davidcc@google.com Cc: reinette.chatre@intel.com Link: http://lkml.kernel.org/r/1501017287-28083-20-git-send-email-vikas.shivappa@linux.intel.com	2017-08-01 22:41:26 +02:00
Vikas Shivappa	90c403e831	x86/intel_rdt: Prepare for RDT monitor data support Rename the intel_rdt_schemata file to intel_rdt_ctrlmondata as we now want to add support for RDT monitoring data for the events that are supported in later patches. Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: vikas.shivappa@intel.com Cc: ak@linux.intel.com Cc: davidcc@google.com Cc: reinette.chatre@intel.com Link: http://lkml.kernel.org/r/1501017287-28083-19-git-send-email-vikas.shivappa@linux.intel.com	2017-08-01 22:41:25 +02:00
Vikas Shivappa	a9fcf8627d	x86/intel_rdt/cqm: Add cpus file support The cpus file is extended to support resource monitoring. This is used to over-ride the RMID of the default group when running on specific CPUs. It works similar to the resource control. The "cpus" and "cpus_list" file is present in default group, ctrl_mon groups and monitor groups. Each "cpus" file or cpu_list file reads a cpumask or list showing which CPUs belong to the resource group. By default all online cpus belong to the default root group. A CPU can be present in one "ctrl_mon" and one "monitor" group simultaneously. They can be added to a resource group by writing the CPU to the file. When a CPU is added to a ctrl_mon group it is automatically removed from the previous ctrl_mon group. A CPU can be added to a monitor group only if it is present in the parent ctrl_mon group and when a CPU is added to a monitor group, it is automatically removed from the previous monitor group. When CPUs go offline, they are automatically removed from the ctrl_mon and monitor groups. Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: vikas.shivappa@intel.com Cc: ak@linux.intel.com Cc: davidcc@google.com Cc: reinette.chatre@intel.com Link: http://lkml.kernel.org/r/1501017287-28083-18-git-send-email-vikas.shivappa@linux.intel.com	2017-08-01 22:41:25 +02:00
Vikas Shivappa	b09d981b3f	x86/intel_rdt: Prepare to add RDT monitor cpus file support Separate the ctrl cpus file handling from the generic cpus file handling and convert the per cpu closid from u32 to a struct which will be used later to add rmid to the same struct. Also cleanup some name space. Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: vikas.shivappa@intel.com Cc: ak@linux.intel.com Cc: davidcc@google.com Cc: reinette.chatre@intel.com Link: http://lkml.kernel.org/r/1501017287-28083-17-git-send-email-vikas.shivappa@linux.intel.com	2017-08-01 22:41:25 +02:00
Vikas Shivappa	d6aaba615a	x86/intel_rdt/cqm: Add tasks file support The root directory, ctrl_mon and monitor groups are populated with a read/write file named "tasks". When read, it shows all the task IDs assigned to the resource group. Tasks can be added to groups by writing the PID to the file. A task can be present in one "ctrl_mon" group "and" one "monitor" group. IOW a PID_x can be seen in a ctrl_mon group and a monitor group at the same time. When a task is added to a ctrl_mon group, it is automatically removed from the previous ctrl_mon group where it belonged. Similarly if a task is moved to a monitor group it is removed from the previous monitor group . Also since the monitor groups can only have subset of tasks of parent ctrl_mon group, a task can be moved to a monitor group only if its already present in the parent ctrl_mon group. Task membership is indicated by a new field in the task_struct "u32 rmid" which holds the RMID for the task. RMID=0 is reserved for the default root group where the tasks belong to at mount. [tony: zero the rmid if rdtgroup was deleted when task was being moved] Signed-off-by: Tony Luck <tony.luck@linux.intel.com> Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: vikas.shivappa@intel.com Cc: ak@linux.intel.com Cc: davidcc@google.com Cc: reinette.chatre@intel.com Link: http://lkml.kernel.org/r/1501017287-28083-16-git-send-email-vikas.shivappa@linux.intel.com	2017-08-01 22:41:24 +02:00
Vikas Shivappa	0734ded1ab	x86/intel_rdt: Change closid type from int to u32 OS associates a CLOSid(Class of service id) to a task by writing the high 32 bits of per CPU IA32_PQR_ASSOC MSR when a task is scheduled in. CPUID.(EAX=10H, ECX=1):EDX[15:0] enumerates the max CLOSID supported and it is zero indexed. Hence change the type to u32 from int. Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: vikas.shivappa@intel.com Cc: ak@linux.intel.com Cc: davidcc@google.com Cc: reinette.chatre@intel.com Link: http://lkml.kernel.org/r/1501017287-28083-15-git-send-email-vikas.shivappa@linux.intel.com	2017-08-01 22:41:24 +02:00
Vikas Shivappa	c7d9aac613	x86/intel_rdt/cqm: Add mkdir support for RDT monitoring Resource control groups can be created using mkdir in resctrl fs(rdtgroup). In order to extend the resctrl interface to support monitoring the control groups, extend the current mkdir to support resource monitoring also. This allows the rdtgroup created under the root directory to be able to both control and monitor resources (ctrl_mon group). The ctrl_mon groups are associated with one CLOSID like the legacy rdtgroups and one RMID(Resource monitoring ID) as well. Hardware uses RMID to track the resource usage. Once either of the CLOSID or RMID are exhausted, the mkdir fails with -ENOSPC. If there are RMIDs in limbo list but not free an -EBUSY is returned. User can also monitor a subset of the ctrl_mon rdtgroup's tasks/cpus using the monitor groups. The monitor groups are created using mkdir under the "mon_groups" directory in every ctrl_mon group. [Merged Tony's code: Removed a lot of common mkdir code, a fix to handling of the list of the child rdtgroups and some cleanups in list traversal. Also the changes to have similar alloc and free for CLOS/RMID and return -EBUSY when RMIDs are in limbo and not free] Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: vikas.shivappa@intel.com Cc: ak@linux.intel.com Cc: davidcc@google.com Cc: reinette.chatre@intel.com Link: http://lkml.kernel.org/r/1501017287-28083-14-git-send-email-vikas.shivappa@linux.intel.com	2017-08-01 22:41:23 +02:00
Vikas Shivappa	65b4f40305	x86/intel_rdt: Prepare for RDT monitoring mkdir support Separate the ctrl mkdir code from the rest in order to prepare for adding support for RDT monitoring mkdir support as well. Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: vikas.shivappa@intel.com Cc: ak@linux.intel.com Cc: davidcc@google.com Cc: reinette.chatre@intel.com Link: http://lkml.kernel.org/r/1501017287-28083-13-git-send-email-vikas.shivappa@linux.intel.com	2017-08-01 22:41:23 +02:00
Vikas Shivappa	d4ab332010	x86/intel_rdt/cqm: Add info files for RDT monitoring Add info directory files specific to RDT monitoring. num_rmids: The number of RMIDs which are valid for the resource. mon_features: Lists the monitoring events if monitoring is enabled for the resource. max_threshold_occupancy: This is specific to llc_occupancy monitoring and is used to determine if an RMID can be reused. Provides an upper bound on the threshold and is shown to the user in bytes though the internal value will be rounded to the scaling factor supported by the h/w. Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: vikas.shivappa@intel.com Cc: ak@linux.intel.com Cc: davidcc@google.com Cc: reinette.chatre@intel.com Link: http://lkml.kernel.org/r/1501017287-28083-12-git-send-email-vikas.shivappa@linux.intel.com	2017-08-01 22:41:22 +02:00
Tony luck	5dc1d5c6ba	x86/intel_rdt: Simplify info and base file lists The info directory files and base files need to be different for each resource like cache and Memory bandwidth. With in each resource, the files would be further different for monitoring and ctrl. This leads to a lot of different static array declarations given that we are adding resctrl monitoring. Simplify this to one common list of files and then declare a set of flags to choose the files based on the resource, whether it is info or base and if it is control type file. This is as a preparation to include monitoring based info and base files. No functional change. [Vikas: Extended the flags to have few bits per category like resource, info/base etc] Signed-off-by: Tony luck <tony.luck@intel.com> Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: vikas.shivappa@intel.com Cc: ak@linux.intel.com Cc: davidcc@google.com Cc: reinette.chatre@intel.com Link: http://lkml.kernel.org/r/1501017287-28083-11-git-send-email-vikas.shivappa@linux.intel.com	2017-08-01 22:41:22 +02:00
Vikas Shivappa	edf6fa1c4a	x86/intel_rdt/cqm: Add RMID (Resource monitoring ID) management Hardware uses RMID(Resource monitoring ID) to keep track of each of the RDT events associated with tasks. The number of RMIDs is dependent on the SKU and is enumerated via CPUID. We add support to manage the RMIDs which include managing the RMID allocation and reading LLC occupancy for an RMID. RMID allocation is managed by keeping a free list which is initialized to all available RMIDs except for RMID 0 which is always reserved for root group. RMIDs goto a limbo list once they are freed since the RMIDs are still tagged to cache lines of the tasks which were using them - thereby still having some occupancy. They continue to be in limbo list until the occupancy < threshold_occupancy. The threshold_occupancy is a user configurable value. OS uses IA32_QM_CTR MSR to read the occupancy associated with an RMID after programming the IA32_EVENTSEL MSR with the RMID. [Tony: Improved limbo search] Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: vikas.shivappa@intel.com Cc: ak@linux.intel.com Cc: davidcc@google.com Cc: reinette.chatre@intel.com Link: http://lkml.kernel.org/r/1501017287-28083-10-git-send-email-vikas.shivappa@linux.intel.com	2017-08-01 22:41:21 +02:00
Vikas Shivappa	6a445edce6	x86/intel_rdt/cqm: Add RDT monitoring initialization Add common data structures for RDT resource monitoring and perform RDT monitoring related data structure initializations which include setting up the RMID(Resource monitoring ID) lists and event list which the resource supports. [ tony: some cleanup to make adding MBM easier later, remove "cqm" from some names, make some data structure local to intel_rdt_monitor.c static. Add copyright header] [ tglx: Made it readable ] Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: vikas.shivappa@intel.com Cc: ak@linux.intel.com Cc: davidcc@google.com Cc: reinette.chatre@intel.com Link: http://lkml.kernel.org/r/1501017287-28083-9-git-send-email-vikas.shivappa@linux.intel.com	2017-08-01 22:41:21 +02:00
Vikas Shivappa	dd131853f3	x86/intel_rdt: Make rdt_resources_all more readable Change the format of the global rdt_resources_all. This holds all the RDT resource structure initialization values. Make this more readable by using the format: rdt_resources_all[] = { [<resource_index>] = {... } ... } Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: vikas.shivappa@intel.com Cc: ak@linux.intel.com Cc: davidcc@google.com Cc: reinette.chatre@intel.com Link: http://lkml.kernel.org/r/1501017287-28083-8-git-send-email-vikas.shivappa@linux.intel.com	2017-08-01 22:41:20 +02:00
Vikas Shivappa	1b5c0b7583	x86/intel_rdt: Cleanup namespace to support RDT monitoring Few of the data-structures have generic names although they are RDT allocation specific. Rename them to be allocation specific to accommodate RDT monitoring. E.g. s/enabled/alloc_enabled/ No functional change. Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: vikas.shivappa@intel.com Cc: ak@linux.intel.com Cc: davidcc@google.com Cc: reinette.chatre@intel.com Link: http://lkml.kernel.org/r/1501017287-28083-7-git-send-email-vikas.shivappa@linux.intel.com	2017-08-01 22:41:20 +02:00
Reinette Chatre	cb2200e967	x86/intel_rdt: Mark rdt_root and closid_alloc as static Sparse reports that both of these can be static. Make it so. Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: vikas.shivappa@intel.com Cc: ak@linux.intel.com Cc: davidcc@google.com Link: http://lkml.kernel.org/r/1501017287-28083-6-git-send-email-vikas.shivappa@linux.intel.com	2017-08-01 22:41:20 +02:00
Vikas Shivappa	0583020456	x86/intel_rdt: Change file names to accommodate RDT monitor code Because the "perf cqm" and resctrl code were separately added and indivdually configurable, there seem to be separate context switch code and also things on global .h which are not really needed. Move only the scheduling specific code and definitions to <asm/intel_rdt_sched.h> and the put all the other declarations to a local intel_rdt.h. h/t to Reinette Chatre for pointing out that we should separate the public interfaces used by other parts of the kernel from private objects shared between the various files comprising RDT. No functional change. Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: vikas.shivappa@intel.com Cc: ak@linux.intel.com Cc: davidcc@google.com Cc: reinette.chatre@intel.com Link: http://lkml.kernel.org/r/1501017287-28083-5-git-send-email-vikas.shivappa@linux.intel.com	2017-08-01 22:41:19 +02:00
Vikas Shivappa	f01d7d51f5	x86/intel_rdt: Introduce a common compile option for RDT We currently have a CONFIG_RDT_A which is for RDT(Resource directory technology) allocation based resctrl filesystem interface. As a preparation to add support for RDT monitoring as well into the same resctrl filesystem, change the config option to be CONFIG_RDT which would include both RDT allocation and monitoring code. No functional change. Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: vikas.shivappa@intel.com Cc: ak@linux.intel.com Cc: davidcc@google.com Cc: reinette.chatre@intel.com Link: http://lkml.kernel.org/r/1501017287-28083-4-git-send-email-vikas.shivappa@linux.intel.com	2017-08-01 22:41:19 +02:00
Vikas Shivappa	c39a0e2c88	x86/perf/cqm: Wipe out perf based cqm 'perf cqm' never worked due to the incompatibility between perf infrastructure and cqm hardware support. The hardware uses RMIDs to track the llc occupancy of tasks and these RMIDs are per package. This makes monitoring a hierarchy like cgroup along with monitoring of tasks separately difficult and several patches sent to lkml to fix them were NACKed. Further more, the following issues in the current perf cqm make it almost unusable: 1. No support to monitor the same group of tasks for which we do allocation using resctrl. 2. It gives random and inaccurate data (mostly 0s) once we run out of RMIDs due to issues in Recycling. 3. Recycling results in inaccuracy of data because we cannot guarantee that the RMID was stolen from a task when it was not pulling data into cache or even when it pulled the least data. Also for monitoring llc_occupancy, if we stop using an RMID_x and then start using an RMID_y after we reclaim an RMID from an other event, we miss accounting all the occupancy that was tagged to RMID_x at a later perf_count. 2. Recycling code makes the monitoring code complex including scheduling because the event can lose RMID any time. Since MBM counters count bandwidth for a period of time by taking snap shot of total bytes at two different times, recycling complicates the way we count MBM in a hierarchy. Also we need a spin lock while we do the processing to account for MBM counter overflow. We also currently use a spin lock in scheduling to prevent the RMID from being taken away. 4. Lack of support when we run different kind of event like task, system-wide and cgroup events together. Data mostly prints 0s. This is also because we can have only one RMID tied to a cpu as defined by the cqm hardware but a perf can at the same time tie multiple events during one sched_in. 5. No support of monitoring a group of tasks. There is partial support for cgroup but it does not work once there is a hierarchy of cgroups or if we want to monitor a task in a cgroup and the cgroup itself. 6. No support for monitoring tasks for the lifetime without perf overhead. 7. It reported the aggregate cache occupancy or memory bandwidth over all sockets. But most cloud and VMM based use cases want to know the individual per-socket usage. Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: vikas.shivappa@intel.com Cc: ak@linux.intel.com Cc: davidcc@google.com Cc: reinette.chatre@intel.com Link: http://lkml.kernel.org/r/1501017287-28083-2-git-send-email-vikas.shivappa@linux.intel.com	2017-08-01 22:41:18 +02:00
Wanpeng Li	337c017ccd	KVM: async_pf: make rcu irq exit if not triggered from idle task WARNING: CPU: 5 PID: 1242 at kernel/rcu/tree_plugin.h:323 rcu_note_context_switch+0x207/0x6b0 CPU: 5 PID: 1242 Comm: unity-settings- Not tainted 4.13.0-rc2+ #1 RIP: 0010:rcu_note_context_switch+0x207/0x6b0 Call Trace: __schedule+0xda/0xba0 ? kvm_async_pf_task_wait+0x1b2/0x270 schedule+0x40/0x90 kvm_async_pf_task_wait+0x1cc/0x270 ? prepare_to_swait+0x22/0x70 do_async_page_fault+0x77/0xb0 ? do_async_page_fault+0x77/0xb0 async_page_fault+0x28/0x30 RIP: 0010:__d_lookup_rcu+0x90/0x1e0 I encounter this when trying to stress the async page fault in L1 guest w/ L2 guests running. Commit `9b132fbe54` (Add rcu user eqs exception hooks for async page fault) adds rcu_irq_enter/exit() to kvm_async_pf_task_wait() to exit cpu idle eqs when needed, to protect the code that needs use rcu. However, we need to call the pair even if the function calls schedule(), as seen from the above backtrace. This patch fixes it by informing the RCU subsystem exit/enter the irq towards/away from idle for both n.halted and !n.halted. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: stable@vger.kernel.org Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2017-08-01 22:24:18 +02:00
Thomas Gleixner	bb68cfe2f5	x86/hpet: Cure interface abuse in the resume path The HPET resume path abuses irq_domain_[de]activate_irq() to restore the MSI message in the HPET chip for the boot CPU on resume and it relies on an implementation detail of the interrupt core code, which magically makes the HPET unmask call invoked via a irq_disable/enable pair. This worked as long as the irq code did unconditionally invoke the unmask() callback. With the recent changes which keep track of the masked state to avoid expensive hardware access, this does not longer work. As a consequence the HPET timer interrupts are not unmasked which breaks resume as the boot CPU waits forever that a timer interrupt arrives. Make the restore of the MSI message explicit and invoke the unmask() function directly. While at it get rid of the pointless affinity setting as nothing can change the affinity of the interrupt and the vector across suspend/resume. The restore of the MSI message reestablishes the previous affinity setting which is the correct one. Fixes: `bf22ff45be` ("genirq: Avoid unnecessary low level irq function calls") Reported-and-tested-by: Tomi Sarvela <tomi.p.sarvela@intel.com> Reported-by: Martin Peres <martin.peres@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: jeffy.chen@rock-chips.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1707312158590.2287@nanos	2017-08-01 13:02:37 +02:00
Linus Torvalds	f137e0b0c5	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A small set of x86 fixes: - prevent the kernel from using the EFI reboot method when EFI is disabled. - two patches addressing clang issues" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot: Disable the address-of-packed-member compiler warning x86/efi: Fix reboot_mode when EFI runtime services are disabled x86/boot: #undef memcpy() et al in string.c	2017-07-30 12:19:35 -07:00
Linus Torvalds	dbc52a8030	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Thomas Gleixner: "A couple of fixes for performance counters and kprobes: - a series of small patches which make the uncore performance counters on Skylake server systems work correctly - add a missing instruction slot release to the failure path of kprobes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: kprobes/x86: Release insn_slot in failure path perf/x86/intel/uncore: Fix missing marker for skx_uncore_cha_extra_regs perf/x86/intel/uncore: Fix SKX CHA event extra regs perf/x86/intel/uncore: Remove invalid Skylake server CHA filter field perf/x86/intel/uncore: Fix Skylake server CHA LLC_LOOKUP event umask perf/x86/intel/uncore: Fix Skylake server PCU PMU event format perf/x86/intel/uncore: Fix Skylake UPI PMU event masks	2017-07-30 11:52:15 -07:00
Rafael J. Wysocki	4815d3c56d	cpufreq: x86: Make scaling_cur_freq behave more as expected After commit `f8475cef90` "x86: use common aperfmperf_khz_on_cpu() to calculate KHz using APERF/MPERF" the scaling_cur_freq policy attribute in sysfs only behaves as expected on x86 with APERF/MPERF registers available when it is read from at least twice in a row. The value returned by the first read may not be meaningful, because the computations in there use cached values from the previous iteration of aperfmperf_snapshot_khz() which may be stale. To prevent that from happening, modify arch_freq_get_on_cpu() to call aperfmperf_snapshot_khz() twice, with a short delay between these calls, if the previous invocation of aperfmperf_snapshot_khz() was too far back in the past (specifically, more that 1s ago). Also, as pointed out by Doug Smythies, aperf_delta is limited now and the multiplication of it by cpu_khz won't overflow, so simplify the s->khz computations too. Fixes: `f8475cef90` "x86: use common aperfmperf_khz_on_cpu() to calculate KHz using APERF/MPERF" Reported-by: Doug Smythies <dsmythies@telus.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2017-07-30 14:26:51 +02:00
Tom Lendacky	4e237903f9	x86/mm, kexec: Fix memory corruption with SME on successive kexecs After issuing successive kexecs it was found that the SHA hash failed verification when booting the kexec'd kernel. When SME is enabled, the change from using pages that were marked encrypted to now being marked as not encrypted (through new identify mapped page tables) results in memory corruption if there are any cache entries for the previously encrypted pages. This is because separate cache entries can exist for the same physical location but tagged both with and without the encryption bit. To prevent this, issue a wbinvd if SME is active before copying the pages from the source location to the destination location to clear any possible cache entry conflicts. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Cc: <kexec@lists.infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Dave Young <dyoung@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/e7fb8610af3a93e8f8ae6f214cd9249adc0df2b4.1501186516.git.thomas.lendacky@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-07-30 12:09:12 +02:00
Andy Lutomirski	99504819fc	x86/asm/32: Remove a bunch of '& 0xffff' from pt_regs segment reads Now that pt_regs properly defines segment fields as 16-bit on 32-bit CPUs, there's no need to mask off the high word. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-07-30 12:04:41 +02:00
Andy Lutomirski	630c1863bc	x86/traps: Don't clear segment high bits in early_idt_handler_common() Now that pt_regs defines the segment fields as 16-bit, there's no need to sanitize the values. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-07-30 12:04:40 +02:00
Andy Lutomirski	a632375764	x86/ldt/64: Refresh DS and ES when modify_ldt changes an entry On x86_32, modify_ldt() implicitly refreshes the cached DS and ES segments because they are refreshed on return to usermode. On x86_64, they're not refreshed on return to usermode. To improve determinism and match x86_32's behavior, refresh them when we update the LDT. This avoids a situation in which the DS points to a descriptor that is changed but the old cached segment persists until the next reschedule. If this happens, then the user-visible state will change nondeterministically some time after modify_ldt() returns, which is unfortunate. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Chang Seok <chang.seok.bae@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-07-27 09:12:57 +02:00
Dave Airlie	0eb2c0ae57	Linux 4.13-rc2 -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJZdS4PAAoJEHm+PkMAQRiGbEYH/2mukTPOUAfNoWaVjO2YHxuL 5yI3n1838tKIJm967IUmGdckN/RYGPjJxvZ+muXN2/rv23+9j3LVq9vQcsYqRQop vrWP+hvGGJvOGJ2NYBDB+4AUrPPdeX9stolwyAcYvyCZ8AilPIovm4s2poA+fuQX D78c8JSfpse32oc93dy4bUz3mRFKTeufstrWEuzqXI691mthF2G9EpA0R3hlbqv+ GiUnNcZVOnOuCt/47GnpWVKsyv91l3CkGq3bV1GSUi8a/1PnyFxHQxQI/qgbkLXs NuswRupSeLDQKRgiDLgWF/BpdHEp4dpFFWXm00KWlgxeGSQnKat9bpW/d5OgnhA= =mv3H -----END PGP SIGNATURE----- Backmerge tag 'v4.13-rc2' into drm-next Linux 4.13-rc2 This is required for drm-misc fixing.	2017-07-27 08:15:43 +10:00
Wincy Van	210f84b0ca	x86: irq: Define a global vector for nested posted interrupts We are using the same vector for nested/non-nested posted interrupts delivery, this may cause interrupts latency in L1 since we can't kick the L2 vcpu out of vmx-nonroot mode. This patch introduces a new vector which is only for nested posted interrupts to solve the problems above. Signed-off-by: Wincy Van <fanwenyi0529@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-07-26 18:57:45 +02:00
Josh Poimboeuf	ee9f8fce99	x86/unwind: Add the ORC unwinder Add the new ORC unwinder which is enabled by CONFIG_ORC_UNWINDER=y. It plugs into the existing x86 unwinder framework. It relies on objtool to generate the needed .orc_unwind and .orc_unwind_ip sections. For more details on why ORC is used instead of DWARF, see Documentation/x86/orc-unwinder.txt - but the short version is that it's a simplified, fundamentally more robust debugninfo data structure, which also allows up to two orders of magnitude faster lookups than the DWARF unwinder - which matters to profiling workloads like perf. Thanks to Andy Lutomirski for the performance improvement ideas: splitting the ORC unwind table into two parallel arrays and creating a fast lookup table to search a subset of the unwind table. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: live-patching@vger.kernel.org Link: http://lkml.kernel.org/r/0a6cbfb40f8da99b7a45a1a8302dc6aef16ec812.1500938583.git.jpoimboe@redhat.com [ Extended the changelog. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-07-26 13:18:20 +02:00
Yazen Ghannam	9662d43f52	x86/mce/AMD: Allow any CPU to initialize the smca_banks array Current SMCA implementations have the same banks on each CPU with the non-core banks only visible to a "master thread" on each die. Practically, this means the smca_banks array, which describes the banks, only needs to be populated once by a single master thread. CPU 0 seemed like a good candidate to do the populating. However, it's possible that CPU 0 is not enabled in which case the smca_banks array won't be populated. Rather than try to figure out another master thread to do the populating, we should just allow any CPU to populate the array. Drop the CPU 0 check and return early if the bank was already initialized. Also, drop the WARNing about an already initialized bank, since this will be a common, expected occurrence. The smca_banks array is only populated at boot time and CPUs are brought online sequentially. So there's no need for locking around the array. If the first CPU up is a master thread, then it will populate the array with all banks, core and non-core. Every CPU afterwards will return early. If the first CPU up is not a master thread, then it will populate the array with all core banks. The first CPU afterwards that is a master thread will skip populating the core banks and continue populating the non-core banks. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Jack Miller <jack@codezen.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170724101228.17326-4-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-07-25 15:50:53 +02:00
Stefan Assmann	4ecf7191fd	x86/efi: Fix reboot_mode when EFI runtime services are disabled When EFI runtime services are disabled, for example by the "noefi" kernel cmdline parameter, the reboot_type could still be set to BOOT_EFI causing reboot to fail. Fix this by checking if EFI runtime services are enabled. Signed-off-by: Stefan Assmann <sassmann@kpanic.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170724122248.24006-1-sassmann@kpanic.de [ Fixed 'not disabled' double negation. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-07-25 11:30:45 +02:00
Shu Wang	a99f03428f	x86/microcode/AMD: Free unneeded patch before exit from update_cache() verify_and_add_patch() allocates memory for a microcode patch and hands it down to be added to the cache of patches. However, if the cache already has the latest patch, the newly allocated one needs to be freed before returning. Do that. This issue has been found by kmemleak: unreferenced object 0xffff88010e780b40 (size 32): comm "bash", pid 860, jiffies 4294690939 (age 29.297s) backtrace: kmemleak_alloc kmem_cache_alloc_trace load_microcode_amd.isra.0 request_microcode_amd reload_store dev_attr_store sysfs_kf_write kernfs_fop_write __vfs_write vfs_write SyS_write do_syscall_64 return_from_SYSCALL_64 0xffffffffffffffff (gdb) list 0xffffffff81050d60 0xffffffff81050d60 is in load_microcode_amd (arch/x86/kernel/cpu/microcode/amd.c:616). which is this: patch = kzalloc(sizeof(patch), GFP_KERNEL); --> if (!patch) { pr_err("Patch allocation failure.\n"); return -EINVAL; } Signed-off-by: Shu Wang <shuwang@redhat.com> [ Rewrite commit message. ] Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: chuhu@redhat.com Cc: liwang@redhat.com Link: http://lkml.kernel.org/r/20170724101228.17326-2-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-07-25 11:26:19 +02:00
Andy Shevchenko	42d516ce34	ACPI / boot: Add number of legacy IRQs to debug output Sometimes it's useful to have that when mp_config_acpi_legacy_irqs() is called. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2017-07-24 22:47:57 +02:00
Andy Shevchenko	6c9a58e84e	ACPI / boot: Correct address space of __acpi_map_table() Sparse complains about wrong address space used in __acpi_map_table() and in __acpi_unmap_table(). arch/x86/kernel/acpi/boot.c:127:29: warning: incorrect type in return expression (different address spaces) arch/x86/kernel/acpi/boot.c:127:29: expected char * arch/x86/kernel/acpi/boot.c:127:29: got void [noderef] <asn:2>* arch/x86/kernel/acpi/boot.c:135:23: warning: incorrect type in argument 1 (different address spaces) arch/x86/kernel/acpi/boot.c:135:23: expected void [noderef] <asn:2>addr arch/x86/kernel/acpi/boot.c:135:23: got char map Correct address space to be in align of type of returned and passed parameter. Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2017-07-24 22:47:56 +02:00
Andy Shevchenko	861a6ee814	ACPI / boot: Don't define unused variables Some code in acpi_parse_x2apic() conditionally compiled, though parts of it are being used in any case. This annoys gcc. arch/x86/kernel/acpi/boot.c: In function ‘acpi_parse_x2apic’: arch/x86/kernel/acpi/boot.c:203:5: warning: variable ‘enabled’ set but not used [-Wunused-but-set-variable] u8 enabled; ^~~~~~~ arch/x86/kernel/acpi/boot.c:202:6: warning: variable ‘apic_id’ set but not used [-Wunused-but-set-variable] int apic_id; ^~~~~~~ Re-arrange the code to avoid compiling unused variables. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2017-07-24 22:47:56 +02:00
Eric W. Biederman	cc731525f2	signal: Remove kernel interal si_code magic struct siginfo is a union and the kernel since 2.4 has been hiding a union tag in the high 16bits of si_code using the values: __SI_KILL __SI_TIMER __SI_POLL __SI_FAULT __SI_CHLD __SI_RT __SI_MESGQ __SI_SYS While this looks plausible on the surface, in practice this situation has not worked well. - Injected positive signals are not copied to user space properly unless they have these magic high bits set. - Injected positive signals are not reported properly by signalfd unless they have these magic high bits set. - These kernel internal values leaked to userspace via ptrace_peek_siginfo - It was possible to inject these kernel internal values and cause the the kernel to misbehave. - Kernel developers got confused and expected these kernel internal values in userspace in kernel self tests. - Kernel developers got confused and set si_code to __SI_FAULT which is SI_USER in userspace which causes userspace to think an ordinary user sent the signal and that it was not kernel generated. - The values make it impossible to reorganize the code to transform siginfo_copy_to_user into a plain copy_to_user. As si_code must be massaged before being passed to userspace. So remove these kernel internal si codes and make the kernel code simpler and more maintainable. To replace these kernel internal magic si_codes introduce the helper function siginfo_layout, that takes a signal number and an si_code and computes which union member of siginfo is being used. Have siginfo_layout return an enumeration so that gcc will have enough information to warn if a switch statement does not handle all of union members. A couple of architectures have a messed up ABI that defines signal specific duplications of SI_USER which causes more special cases in siginfo_layout than I would like. The good news is only problem architectures pay the cost. Update all of the code that used the previous magic __SI_ values to use the new SIL_ values and to call siginfo_layout to get those values. Escept where not all of the cases are handled remove the defaults in the switch statements so that if a new case is missed in the future the lack will show up at compile time. Modify the code that copies siginfo si_code to userspace to just copy the value and not cast si_code to a short first. The high bits are no longer used to hold a magic union member. Fixup the siginfo header files to stop including the __SI_ values in their constants and for the headers that were missing it to properly update the number of si_codes for each signal type. The fixes to copy_siginfo_from_user32 implementations has the interesting property that several of them perviously should never have worked as the __SI_ values they depended up where kernel internal. With that dependency gone those implementations should work much better. The idea of not passing the __SI_ values out to userspace and then not reinserting them has been tested with criu and criu worked without changes. Ref: 2.4.0-test1 Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2017-07-24 14:30:28 -05:00
Masami Hiramatsu	38115f2f8c	kprobes/x86: Release insn_slot in failure path The following commit: `003002e04e` ("kprobes: Fix arch_prepare_kprobe to handle copy insn failures") returns an error if the copying of the instruction, but does not release the allocated insn_slot. Clean up correctly. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: David S . Miller <davem@davemloft.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: `003002e04e` ("kprobes: Fix arch_prepare_kprobe to handle copy insn failures") Link: http://lkml.kernel.org/r/150064834183.6172.11694375818447664416.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-07-24 11:14:59 +02:00
Greg Kroah-Hartman	24a81a2c25	Merge 4.13-rc2 into char-misc-next We want the char/misc driver fixes in here as well to handle future changes. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-07-23 19:58:30 -07:00
Rob Herring	db15e7f273	x86/devicetree: Convert to using %pOF instead of ->full_name Now that we have a custom printf format specifier, convert users of full_name to use %pOF instead. This is preparation to remove storing of the full path string for each device node. Signed-off-by: Rob Herring <robh@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: devicetree@vger.kernel.org Link: http://lkml.kernel.org/r/20170718214339.7774-7-robh@kernel.org [ Clarify the error message while at it, as 'node' is ambiguous. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-07-21 10:14:15 +02:00
Kirill A. Shutemov	b569bab78d	x86/mm: Prepare to expose larger address space to userspace On x86, 5-level paging enables 56-bit userspace virtual address space. Not all user space is ready to handle wide addresses. It's known that at least some JIT compilers use higher bits in pointers to encode their information. It collides with valid pointers with 5-level paging and leads to crashes. To mitigate this, we are not going to allocate virtual address space above 47-bit by default. But userspace can ask for allocation from full address space by specifying hint address (with or without MAP_FIXED) above 47-bits. If hint address set above 47-bit, but MAP_FIXED is not specified, we try to look for unmapped area by specified address. If it's already occupied, we look for unmapped area in full address space, rather than from 47-bit window. A high hint address would only affect the allocation in question, but not any future mmap()s. Specifying high hint address on older kernel or on machine without 5-level paging support is safe. The hint will be ignored and kernel will fall back to allocation from 47-bit address space. This approach helps to easily make application's memory allocator aware about large address space without manually tracking allocated virtual address space. The patch puts all machinery in place, but not yet allows userspace to have mappings above 47-bit -- TASK_SIZE_MAX has to be raised to get the effect. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170716225954.74185-7-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-07-21 10:05:18 +02:00
Kirill A. Shutemov	44b04912fa	x86/mpx: Do not allow MPX if we have mappings above 47-bit MPX (without MAWA extension) cannot handle addresses above 47 bits, so we need to make sure that MPX cannot be enabled if we already have a VMA above the boundary and forbid creating such VMAs once MPX is enabled. The patch implements mpx_unmapped_area_check() which is called from all variants of get_unmapped_area() to check if the requested address fits mpx. On enabling MPX, we check if we already have any vma above 47-bit boundary and forbit the enabling if we do. As long as DEFAULT_MAP_WINDOW is equal to TASK_SIZE_MAX, the change is nop. It will change when we allow userspace to have mappings above 47-bits. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170716225954.74185-6-kirill.shutemov@linux.intel.com [ Readability edits. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-07-21 10:05:18 +02:00
Kirill A. Shutemov	e8f01a8dad	x86/mm: Rename tasksize_32bit/64bit to task_size_32bit/64bit() Rename these helpers to be consistent with spelling of TASK_SIZE and related constants. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170716225954.74185-5-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-07-21 10:05:18 +02:00
Seunghun Han	e708e35ba6	x86/ioapic: Pass the correct data to unmask_ioapic_irq() One of the rarely executed code pathes in check_timer() calls unmask_ioapic_irq() passing irq_get_chip_data(0) as argument. That's wrong as unmask_ioapic_irq() expects a pointer to the irq data of interrupt 0. irq_get_chip_data(0) returns NULL, so the following dereference in unmask_ioapic_irq() causes a kernel panic. The issue went unnoticed in the first place because irq_get_chip_data() returns a void pointer so the compiler cannot do a type check on the argument. The code path was added for machines with broken configuration, but it seems that those machines are either not running current kernels or simply do not longer exist. Hand in irq_get_irq_data(0) as argument which provides the correct data. [ tglx: Rewrote changelog ] Fixes: `4467715a44` ("x86/irq: Move irq_cfg.irq_2_pin into io_apic.c") Signed-off-by: Seunghun Han <kkamagui@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1500369644-45767-1-git-send-email-kkamagui@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-07-20 10:28:10 +02:00
Seunghun Han	dad5ab0db8	x86/acpi: Prevent out of bound access caused by broken ACPI tables The bus_irq argument of mp_override_legacy_irq() is used as the index into the isa_irq_to_gsi[] array. The bus_irq argument originates from ACPI_MADT_TYPE_IO_APIC and ACPI_MADT_TYPE_INTERRUPT items in the ACPI tables, but is nowhere sanity checked. That allows broken or malicious ACPI tables to overwrite memory, which might cause malfunction, panic or arbitrary code execution. Add a sanity check and emit a warning when that triggers. [ tglx: Added warning and rewrote changelog ] Signed-off-by: Seunghun Han <kkamagui@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: security@kernel.org Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: stable@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-07-20 10:27:59 +02:00
Dave Airlie	2d62c799f8	Merge tag 'drm-intel-next-2017-07-17' of git://anongit.freedesktop.org/git/drm-intel into drm-next 2nd round of 4.14 features: - prep for deferred fbdev setup - refactor fixed 16.16 computations and skl+ wm code (Mahesh Kumar) - more cnl paches (Rodrigo, Imre et al) - tighten context cleanup and handling (Chris Wilson) - fix interlaced handling on skl+ (Mahesh Kumar) - small bits as usual * tag 'drm-intel-next-2017-07-17' of git://anongit.freedesktop.org/git/drm-intel: (84 commits) drm/i915: Update DRIVER_DATE to 20170717 drm/i915: Protect against deferred fbdev setup drm/i915/fbdev: Always forward hotplug events drm/i915/skl+: unify cpp value in WM calculation drm/i915/skl+: WM calculation don't require height drm/i915: Addition wrapper for fixed16.16 operation drm/i915: cleanup fixed-point wrappers naming drm/i915: Always perform internal fixed16 division in 64 bits drm/i915: take-out common clamping code of fixed16 wrappers drm/i915/cnl: Add missing type case. drm/i915/cnl: Add max allowed Cannonlake DC. drm/i915: Make DP-MST connector info work drm/i915/cnl: Get DDI clock based on PLLs. drm/i915/cnl: Inherit RPS stuff from previous platforms. drm/i915/cnl: Gen10 render context size. drm/i915/cnl: Don't trust VBT's alternate pin for port D for now. drm/i915: Fix the kernel panic when using aliasing ppgtt drm/i915/cnl: Cannonlake color init. drm/i915/cnl: Add force wake for gen10+. x86/gpu: CNL uses the same GMS values as SKL ...	2017-07-20 11:31:43 +10:00
Tom Lendacky	aca20d5462	x86/mm: Add support to make use of Secure Memory Encryption Add support to check if SME has been enabled and if memory encryption should be activated (checking of command line option based on the configuration of the default state). If memory encryption is to be activated, then the encryption mask is set and the kernel is encrypted "in place." Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Dave Young <dyoung@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Toshimitsu Kani <toshi.kani@hpe.com> Cc: kasan-dev@googlegroups.com Cc: kvm@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-doc@vger.kernel.org Cc: linux-efi@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/5f0da2fd4cce63f556117549e2c89c170072209f.1500319216.git.thomas.lendacky@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-07-18 20:23:26 +02:00
Tom Lendacky	bba4ed011a	x86/mm, kexec: Allow kexec to be used with SME Provide support so that kexec can be used to boot a kernel when SME is enabled. Support is needed to allocate pages for kexec without encryption. This is needed in order to be able to reboot in the kernel in the same manner as originally booted. Additionally, when shutting down all of the CPUs we need to be sure to flush the caches and then halt. This is needed when booting from a state where SME was not active into a state where SME is active (or vice-versa). Without these steps, it is possible for cache lines to exist for the same physical location but tagged both with and without the encryption bit. This can cause random memory corruption when caches are flushed depending on which cacheline is written last. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: <kexec@lists.infradead.org> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Dave Young <dyoung@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Toshimitsu Kani <toshi.kani@hpe.com> Cc: kasan-dev@googlegroups.com Cc: kvm@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-doc@vger.kernel.org Cc: linux-efi@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/b95ff075db3e7cd545313f2fb609a49619a09625.1500319216.git.thomas.lendacky@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-07-18 11:38:04 +02:00
Tom Lendacky	f655e6e6b9	x86/cpu/AMD: Make the microcode level available earlier in the boot Move the setting of the cpuinfo_x86.microcode field from amd_init() to early_amd_init() so that it is available earlier in the boot process. This avoids having to read MSR_AMD64_PATCH_LEVEL directly during early boot. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Dave Young <dyoung@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Toshimitsu Kani <toshi.kani@hpe.com> Cc: kasan-dev@googlegroups.com Cc: kvm@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-doc@vger.kernel.org Cc: linux-efi@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/7b7525fa12593dac5f4b01fcc25c95f97e93862f.1500319216.git.thomas.lendacky@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-07-18 11:38:03 +02:00
Tom Lendacky	c7753208a9	x86, swiotlb: Add memory encryption support Since DMA addresses will effectively look like 48-bit addresses when the memory encryption mask is set, SWIOTLB is needed if the DMA mask of the device performing the DMA does not support 48-bits. SWIOTLB will be initialized to create decrypted bounce buffers for use by these devices. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Dave Young <dyoung@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Toshimitsu Kani <toshi.kani@hpe.com> Cc: kasan-dev@googlegroups.com Cc: kvm@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-doc@vger.kernel.org Cc: linux-efi@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/aa2d29b78ae7d508db8881e46a3215231b9327a7.1500319216.git.thomas.lendacky@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-07-18 11:38:03 +02:00
Tom Lendacky	5997efb967	x86/boot: Use memremap() to map the MPF and MPC data The SMP MP-table is built by UEFI and placed in memory in a decrypted state. These tables are accessed using a mix of early_memremap(), early_memunmap(), phys_to_virt() and virt_to_phys(). Change all accesses to use early_memremap()/early_memunmap(). This allows for proper setting of the encryption mask so that the data can be successfully accessed when SME is active. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Dave Young <dyoung@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Toshimitsu Kani <toshi.kani@hpe.com> Cc: kasan-dev@googlegroups.com Cc: kvm@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-doc@vger.kernel.org Cc: linux-efi@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/d9464b0d7c861021ed8f494e4a40d6cd10f1eddd.1500319216.git.thomas.lendacky@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-07-18 11:38:02 +02:00
Tom Lendacky	d68baa3fa6	x86/boot/e820: Add support to determine the E820 type of an address Add a function that will return the E820 type associated with an address range. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Dave Young <dyoung@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Toshimitsu Kani <toshi.kani@hpe.com> Cc: kasan-dev@googlegroups.com Cc: kvm@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-doc@vger.kernel.org Cc: linux-efi@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/b797aaa588803bf33263d5dd8c32377668fa931a.1500319216.git.thomas.lendacky@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-07-18 11:38:01 +02:00
Tom Lendacky	b9d05200bc	x86/mm: Insure that boot memory areas are mapped properly The boot data and command line data are present in memory in a decrypted state and are copied early in the boot process. The early page fault support will map these areas as encrypted, so before attempting to copy them, add decrypted mappings so the data is accessed properly when copied. For the initrd, encrypt this data in place. Since the future mapping of the initrd area will be mapped as encrypted the data will be accessed properly. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Dave Young <dyoung@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Toshimitsu Kani <toshi.kani@hpe.com> Cc: kasan-dev@googlegroups.com Cc: kvm@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-doc@vger.kernel.org Cc: linux-efi@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/bb0d430b41efefd45ee515aaf0979dcfda8b6a44.1500319216.git.thomas.lendacky@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-07-18 11:38:01 +02:00
Tom Lendacky	21729f81ce	x86/mm: Provide general kernel support for memory encryption Changes to the existing page table macros will allow the SME support to be enabled in a simple fashion with minimal changes to files that use these macros. Since the memory encryption mask will now be part of the regular pagetable macros, we introduce two new macros (_PAGE_TABLE_NOENC and _KERNPG_TABLE_NOENC) to allow for early pagetable creation/initialization without the encryption mask before SME becomes active. Two new pgprot() macros are defined to allow setting or clearing the page encryption mask. The FIXMAP_PAGE_NOCACHE define is introduced for use with MMIO. SME does not support encryption for MMIO areas so this define removes the encryption mask from the page attribute. Two new macros are introduced (__sme_pa() / __sme_pa_nodebug()) to allow creating a physical address with the encryption mask. These are used when working with the cr3 register so that the PGD can be encrypted. The current __va() macro is updated so that the virtual address is generated based off of the physical address without the encryption mask thus allowing the same virtual address to be generated regardless of whether encryption is enabled for that physical location or not. Also, an early initialization function is added for SME. If SME is active, this function: - Updates the early_pmd_flags so that early page faults create mappings with the encryption mask. - Updates the __supported_pte_mask to include the encryption mask. - Updates the protection_map entries to include the encryption mask so that user-space allocations will automatically have the encryption mask applied. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Dave Young <dyoung@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Toshimitsu Kani <toshi.kani@hpe.com> Cc: kasan-dev@googlegroups.com Cc: kvm@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-doc@vger.kernel.org Cc: linux-efi@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/b36e952c4c39767ae7f0a41cf5345adf27438480.1500319216.git.thomas.lendacky@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-07-18 11:38:00 +02:00
Tom Lendacky	5868f3651f	x86/mm: Add support to enable SME in early boot processing Add support to the early boot code to use Secure Memory Encryption (SME). Since the kernel has been loaded into memory in a decrypted state, encrypt the kernel in place and update the early pagetables with the memory encryption mask so that new pagetable entries will use memory encryption. The routines to set the encryption mask and perform the encryption are stub routines for now with functionality to be added in a later patch. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Dave Young <dyoung@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Toshimitsu Kani <toshi.kani@hpe.com> Cc: kasan-dev@googlegroups.com Cc: kvm@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-doc@vger.kernel.org Cc: linux-efi@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/e52ad781f085224bf835b3caff9aa3aee6febccb.1500319216.git.thomas.lendacky@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-07-18 11:37:59 +02:00
Tom Lendacky	9af9b94068	x86/cpu/AMD: Handle SME reduction in physical address size When System Memory Encryption (SME) is enabled, the physical address space is reduced. Adjust the x86_phys_bits value to reflect this reduction. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Dave Young <dyoung@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Toshimitsu Kani <toshi.kani@hpe.com> Cc: kasan-dev@googlegroups.com Cc: kvm@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-doc@vger.kernel.org Cc: linux-efi@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/593c037a3cad85ba92f3d061ffa7462e9ce3531d.1500319216.git.thomas.lendacky@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-07-18 11:37:59 +02:00
Tom Lendacky	872cbefd2d	x86/cpu/AMD: Add the Secure Memory Encryption CPU feature Update the CPU features to include identifying and reporting on the Secure Memory Encryption (SME) feature. SME is identified by CPUID 0x8000001f, but requires BIOS support to enable it (set bit 23 of MSR_K8_SYSCFG). Only show the SME feature as available if reported by CPUID, enabled by BIOS and not configured as CONFIG_X86_32=y. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Dave Young <dyoung@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Toshimitsu Kani <toshi.kani@hpe.com> Cc: kasan-dev@googlegroups.com Cc: kvm@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-doc@vger.kernel.org Cc: linux-efi@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/85c17ff450721abccddc95e611ae8df3f4d9718b.1500319216.git.thomas.lendacky@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-07-18 11:37:59 +02:00
Tom Lendacky	f7750a7956	x86, mpparse, x86/acpi, x86/PCI, x86/dmi, SFI: Use memremap() for RAM mappings The ioremap() function is intended for mapping MMIO. For RAM, the memremap() function should be used. Convert calls from ioremap() to memremap() when re-mapping RAM. This will be used later by SME to control how the encryption mask is applied to memory mappings, with certain memory locations being mapped decrypted vs encrypted. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Dave Young <dyoung@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Toshimitsu Kani <toshi.kani@hpe.com> Cc: kasan-dev@googlegroups.com Cc: kvm@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-doc@vger.kernel.org Cc: linux-efi@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/b13fccb9abbd547a7eef7b1fdfc223431b211c88.1500319216.git.thomas.lendacky@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-07-18 11:37:58 +02:00
Ingo Molnar	1ed7d32763	Merge branch 'x86/boot' into x86/mm, to pick up interacting changes The SME patches we are about to apply add some E820 logic, so merge in pending E820 code changes first, to have a single code base. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-07-18 11:36:53 +02:00
Josh Poimboeuf	5a3cf86978	x86/dumpstack: Fix interrupt and exception stack boundary checks On x86_64, the double fault exception stack is located immediately after the interrupt stack in memory. This causes confusion in the unwinder when it tries to unwind through an empty interrupt stack, where the stack pointer points to the address bordering the two stacks. The unwinder incorrectly thinks it's running on the double fault stack. Fix this kind of stack border confusion by never considering the beginning address of an exception or interrupt stack to be part of the stack. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: live-patching@vger.kernel.org Fixes: `5fe599e02e` ("x86/dumpstack: Add support for unwinding empty IRQ stacks") Link: http://lkml.kernel.org/r/bcc142160a5104de5c354c21c394c93a0173943f.1499786555.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-07-18 10:56:23 +02:00
Josh Poimboeuf	b0529beceb	x86/dumpstack: Fix occasionally missing registers If two consecutive stack frames have pt_regs, the oops dump code fails to print the second frame's registers. Fix that. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: live-patching@vger.kernel.org Fixes: `3b3fa11bc7` ("x86/dumpstack: Print any pt_regs found on the stack") Link: http://lkml.kernel.org/r/269c5c00c7d45c699f3dcea42a3a594c6cf7a9a3.1499786555.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-07-18 10:56:23 +02:00
Andy Lutomirski	1d3e53e862	x86/entry/64: Refactor IRQ stacks and make them NMI-safe This will allow IRQ stacks to nest inside NMIs or similar entries that can happen during IRQ stack setup or teardown. The new macros won't work correctly if they're invoked with IRQs on. Add a check under CONFIG_DEBUG_ENTRY to detect that. Signed-off-by: Andy Lutomirski <luto@kernel.org> [ Use %r10 instead of %r11 in xen_do_hypervisor_callback to make objtool and ORC unwinder's lives a little easier. ] Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: live-patching@vger.kernel.org Link: http://lkml.kernel.org/r/b0b2ff5fb97d2da2e1d7e1f380190c92545c8bb5.1499786555.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-07-18 10:56:22 +02:00
Vitaly Kuznetsov	dd018597a0	x86/hyper-v: stash the max number of virtual/logical processor Max virtual processor will be needed for 'extended' hypercalls supporting more than 64 vCPUs. While on it, unify on 'Hyper-V' in mshyperv.c as we currently have a mix, report acquired misc features as well. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-07-17 17:20:28 +02:00
Mikulas Patocka	5f8a16156a	x86/cpu: Use indirect call to measure performance in init_amd_k6() This old piece of code is supposed to measure the performance of indirect calls to determine if the processor is buggy or not, however the compiler optimizer turns it into a direct call. Use the OPTIMIZER_HIDE_VAR() macro to thwart the optimization, so that a real indirect call is generated. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/alpine.LRH.2.02.1707110737530.8746@file01.intranet.prod.int.rdu2.redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-07-16 11:05:04 +02:00
Linus Torvalds	e37a07e0c2	Second batch of KVM updates for v4.13 Common: - add uevents for VM creation/destruction - annotate and properly access RCU-protected objects s390: - rename IOCTL added in the first v4.13 merge x86: - emulate VMLOAD VMSAVE feature in SVM - support paravirtual asynchronous page fault while nested - add Hyper-V userspace interfaces for better migration - improve master clock corner cases - extend internal error reporting after EPT misconfig - correct single-stepping of emulated instructions in SVM - handle MCE during VM entry - fix nVMX VM entry checks and nVMX VMCS shadowing -----BEGIN PGP SIGNATURE----- iQEcBAABCAAGBQJZaOm6AAoJEED/6hsPKofoqO8H/3breVIyVv9mwg7A5+o+6LTq GzV/YXHSC8NtfxZn8ViS/TCziYiBSFv7XiPSodkXbOgYSz8Yya5x9D+dbEH+xgG7 l+LsZEqdSFbHCkvKrMiwSsoXtsT5WygA56+KZiBmu8cvlwqSyXWHFn3ZJ1wKzGq/ zivlkfCoh2m6bGdNmrG9pHUSgxvDh94pXesaVBKy4hgeovY1qjzby3Lo+HuIUzai exuEU1EKRlUIfLK1B2Anp5IIv5Q1lFnMSvD6YSiWYywZb95dN/adsX1bv+MKeOdt TIAgotsWjaAuT9JolAJjfVPHG0+uMBMsWg4Zh9Ra/gPPaSh3KEC2h1++zEYKjvw= =1zII -----END PGP SIGNATURE----- Merge tag 'kvm-4.13-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull more KVM updates from Radim Krčmář: "Second batch of KVM updates for v4.13 Common: - add uevents for VM creation/destruction - annotate and properly access RCU-protected objects s390: - rename IOCTL added in the first v4.13 merge x86: - emulate VMLOAD VMSAVE feature in SVM - support paravirtual asynchronous page fault while nested - add Hyper-V userspace interfaces for better migration - improve master clock corner cases - extend internal error reporting after EPT misconfig - correct single-stepping of emulated instructions in SVM - handle MCE during VM entry - fix nVMX VM entry checks and nVMX VMCS shadowing" * tag 'kvm-4.13-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (28 commits) kvm: x86: hyperv: make VP_INDEX managed by userspace KVM: async_pf: Let guest support delivery of async_pf from guest mode KVM: async_pf: Force a nested vmexit if the injected #PF is async_pf KVM: async_pf: Add L1 guest async_pf #PF vmexit handler KVM: x86: Simplify kvm_x86_ops->queue_exception parameter list kvm: x86: hyperv: add KVM_CAP_HYPERV_SYNIC2 KVM: x86: make backwards_tsc_observed a per-VM variable KVM: trigger uevents when creating or destroying a VM KVM: SVM: Enable Virtual VMLOAD VMSAVE feature KVM: SVM: Add Virtual VMLOAD VMSAVE feature definition KVM: SVM: Rename lbr_ctl field in the vmcb control area KVM: SVM: Prepare for new bit definition in lbr_ctl KVM: SVM: handle singlestep exception when skipping emulated instructions KVM: x86: take slots_lock in kvm_free_pit KVM: s390: Fix KVM_S390_GET_CMMA_BITS ioctl definition kvm: vmx: Properly handle machine check during VM-entry KVM: x86: update master clock before computing kvmclock_offset kvm: nVMX: Shadow "high" parts of shadowed 64-bit VMCS fields kvm: nVMX: Fix nested_vmx_check_msr_bitmap_controls kvm: nVMX: Validate the I/O bitmaps on nested VM-entry ...	2017-07-15 10:18:16 -07:00
Wanpeng Li	52a5c155cf	KVM: async_pf: Let guest support delivery of async_pf from guest mode Adds another flag bit (bit 2) to MSR_KVM_ASYNC_PF_EN. If bit 2 is 1, async page faults are delivered to L1 as #PF vmexits; if bit 2 is 0, kvm_can_do_async_pf returns 0 if in guest mode. This is similar to what svm.c wanted to do all along, but it is only enabled for Linux as L1 hypervisor. Foreign hypervisors must never receive async page faults as vmexits, because they'd probably be very confused about that. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2017-07-14 14:26:16 +02:00
Nicholas Piggin	05a4a95279	kernel/watchdog: split up config options Split SOFTLOCKUP_DETECTOR from LOCKUP_DETECTOR, and split HARDLOCKUP_DETECTOR_PERF from HARDLOCKUP_DETECTOR. LOCKUP_DETECTOR implies the general boot, sysctl, and programming interfaces for the lockup detectors. An architecture that wants to use a hard lockup detector must define HAVE_HARDLOCKUP_DETECTOR_PERF or HAVE_HARDLOCKUP_DETECTOR_ARCH. Alternatively an arch can define HAVE_NMI_WATCHDOG, which provides the minimum arch_touch_nmi_watchdog, and it otherwise does its own thing and does not implement the LOCKUP_DETECTOR interfaces. sparc is unusual in that it has started to implement some of the interfaces, but not fully yet. It should probably be converted to a full HAVE_HARDLOCKUP_DETECTOR_ARCH. [npiggin@gmail.com: fix] Link: http://lkml.kernel.org/r/20170617223522.66c0ad88@roar.ozlabs.ibm.com Link: http://lkml.kernel.org/r/20170616065715.18390-4-npiggin@gmail.com Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Don Zickus <dzickus@redhat.com> Reviewed-by: Babu Moger <babu.moger@oracle.com> Tested-by: Babu Moger <babu.moger@oracle.com> [sparc] Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-07-12 16:26:02 -07:00
Xunlei Pang	203e9e4121	kexec: move vmcoreinfo out of the kernel's .bss section As Eric said, "what we need to do is move the variable vmcoreinfo_note out of the kernel's .bss section. And modify the code to regenerate and keep this information in something like the control page. Definitely something like this needs a page all to itself, and ideally far away from any other kernel data structures. I clearly was not watching closely the data someone decided to keep this silly thing in the kernel's .bss section." This patch allocates extra pages for these vmcoreinfo_XXX variables, one advantage is that it enhances some safety of vmcoreinfo, because vmcoreinfo now is kept far away from other kernel data structures. Link: http://lkml.kernel.org/r/1493281021-20737-1-git-send-email-xlpang@redhat.com Signed-off-by: Xunlei Pang <xlpang@redhat.com> Tested-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Reviewed-by: Juergen Gross <jgross@suse.com> Suggested-by: Eric Biederman <ebiederm@xmission.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Dave Young <dyoung@redhat.com> Cc: Hari Bathini <hbathini@linux.vnet.ibm.com> Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-07-12 16:25:59 -07:00
Daniel Vetter	953152253e	main drm pull for v4.13 -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJZYseIAAoJEAx081l5xIa+85kP/0zKzKKVzZXSXG2TAGb5jNfk Ex+TELG8tWk9KBxA7lEE5c0WEsnP79cNoXZLQu8wlUzO8+kwQK5Bz0zgNUkpSuo1 RthwdsxBQX1++UxB+HoSG+dOa7hkKVqlgQR3z9qyhsBXzetkJV0DoYcpMV0A1EWd 6Jzt+AvCShVkcW+21LqHPlc5EIVewrDMoA3oU6aYCLhyAOUTVvvQB2ML8YApH7TM JrSrzCFHTrQEBbGUrZQhzR0sZzZzk9byntb/I/mdVbHeCyIHiL8sC4PfWSOyyazm GkPnA8G3aFAY9haBRz9jG/VBr1yVb0mCBjkWQ1lGfIAOCDDSc+d7PDXdG+i4AewK jZheXlrDIdGgmJLy4W3rdEqJvdf7UQHZOs8594OL19l4+FxCTrol1JSHSMeavCvr 8bUNil9Jb/ONU/wmp+q55U0k4TCTyerUA7gKnuaJAwBvd4n78/PKmQnbrWinDyJc GQXp6zESk9bKt5DXSnVZuVf4POTzpuAsQkkfX1V2y145EHTQYfS3jLENWqEjyZUy QtKCHZvRkJfGaFU4Pr+vBo9Iu1GlA5OiOv08QadldTT4OxUI0T6yaLDobHCQfKPE sc3wCuCM+/dAnqoKDcGC4hAmF8zDdO0kw65P2m7uC6T9Jm1G35CioKbzo+fzUhuL fg5TBpbp2Wwe2oPA5iBm =2S5N -----END PGP SIGNATURE----- Merge tag 'drm-for-v4.13' into drm-intel-next-queued Resync with the main drm-next pull request for 4.13. What we really need is to fully resync with pending drm-misc, but that's not yet possible due to the still ongoing merge window. Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>	2017-07-10 21:56:39 +02:00
Linus Torvalds	2b97620341	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "The x86 updates contain: - A fix for a longstanding PAT bug, where PAT was reported on CPUs that do not support it, which leads to wrong caching attributes and missing MTRR updates - Prevent overwriting of the e820 firmware table, which causes kexec kernels to lose the fake mptable which is stored there. - Cleanup of the UV/BAU code, removing unused code and making local functions static" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot/e820: Introduce the bootloader provided e820_table_firmware[] table x86/boot/e820: Rename the e820_table_firmware to e820_table_kexec x86/boot/e820: Avoid overwriting e820_table_firmware x86/mm/pat: Don't report PAT on CPUs that don't support it x86/platform/uv/BAU: Minor cleanup, make some local functions static	2017-07-09 11:21:31 -07:00
Linus Torvalds	f72e24a124	This is the first pull request for the new dma-mapping subsystem In this new subsystem we'll try to properly maintain all the generic code related to dma-mapping, and will further consolidate arch code into common helpers. This pull request contains: - removal of the DMA_ERROR_CODE macro, replacing it with calls to ->mapping_error so that the dma_map_ops instances are more self contained and can be shared across architectures (me) - removal of the ->set_dma_mask method, which duplicates the ->dma_capable one in terms of functionality, but requires more duplicate code. - various updates for the coherent dma pool and related arm code (Vladimir) - various smaller cleanups (me) -----BEGIN PGP SIGNATURE----- iQI/BAABCAApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAlldmw0LHGhjaEBsc3Qu ZGUACgkQD55TZVIEUYOiKA/+Ln1mFLSf3nfTzIHa24Bbk8ZTGr0B8TD4Vmyyt8iG oO3AeaTLn3d6ugbH/uih/tPz8PuyXsdiTC1rI/ejDMiwMTSjW6phSiIHGcStSR9X VFNhmMFacp7QpUpvxceV0XZYKDViAoQgHeGdp3l+K5h/v4AYePV/v/5RjQPaEyOh YLbCzETO+24mRWdJxdAqtTW4ovYhzj6XsiJ+pAjlV0+SWU6m5L5E+VAPNi1vqv1H 1O2KeCFvVYEpcnfL3qnkw2timcjmfCfeFAd9mCUAc8mSRBfs3QgDTKw3XdHdtRml LU2WuA5cpMrOdBO4mVra2plo8E2szvpB1OZZXoKKdCpK3VGwVpVHcTvClK2Ks/3B GDLieroEQNu2ZIUIdWXf/g2x6le3BcC9MmpkAhnGPqCZ7skaIBO5Cjpxm0zTJAPl PPY3CMBBEktAvys6DcudOYGixNjKUuAm5lnfpcfTEklFdG0AjhdK/jZOplAFA6w4 LCiy0rGHM8ZbVAaFxbYoFCqgcjnv6EjSiqkJxVI4fu/Q7v9YXfdPnEmE0PJwCVo5 +i7aCLgrYshTdHr/F3e5EuofHN3TDHwXNJKGh/x97t+6tt326QMvDKX059Kxst7R rFukGbrYvG8Y7yXwrSDbusl443ta0Ht7T1oL4YUoJTZp0nScAyEluDTmrH1JVCsT R4o= =0Fso -----END PGP SIGNATURE----- Merge tag 'dma-mapping-4.13' of git://git.infradead.org/users/hch/dma-mapping Pull dma-mapping infrastructure from Christoph Hellwig: "This is the first pull request for the new dma-mapping subsystem In this new subsystem we'll try to properly maintain all the generic code related to dma-mapping, and will further consolidate arch code into common helpers. This pull request contains: - removal of the DMA_ERROR_CODE macro, replacing it with calls to ->mapping_error so that the dma_map_ops instances are more self contained and can be shared across architectures (me) - removal of the ->set_dma_mask method, which duplicates the ->dma_capable one in terms of functionality, but requires more duplicate code. - various updates for the coherent dma pool and related arm code (Vladimir) - various smaller cleanups (me)" * tag 'dma-mapping-4.13' of git://git.infradead.org/users/hch/dma-mapping: (56 commits) ARM: dma-mapping: Remove traces of NOMMU code ARM: NOMMU: Set ARM_DMA_MEM_BUFFERABLE for M-class cpus ARM: NOMMU: Introduce dma operations for noMMU drivers: dma-mapping: allow dma_common_mmap() for NOMMU drivers: dma-coherent: Introduce default DMA pool drivers: dma-coherent: Account dma_pfn_offset when used with device tree dma: Take into account dma_pfn_offset dma-mapping: replace dmam_alloc_noncoherent with dmam_alloc_attrs dma-mapping: remove dmam_free_noncoherent crypto: qat - avoid an uninitialized variable warning au1100fb: remove a bogus dma_free_nonconsistent call MAINTAINERS: add entry for dma mapping helpers powerpc: merge __dma_set_mask into dma_set_mask dma-mapping: remove the set_dma_mask method powerpc/cell: use the dma_supported method for ops switching powerpc/cell: clean up fixed mapping dma_ops initialization tile: remove dma_supported and mapping_error methods xen-swiotlb: remove xen_swiotlb_set_dma_mask arm: implement ->dma_supported instead of ->set_dma_mask mips/loongson64: implement ->dma_supported instead of ->set_dma_mask ...	2017-07-06 19:20:54 -07:00
Paulo Zanoni	2e1e9d4893	x86/gpu: CNL uses the same GMS values as SKL So don't forget to reserve its stolen memory bits. v2: Add ack and remove "TODO" from commit message. Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Link: http://patchwork.freedesktop.org/patch/msgid/1499302845-17856-1-git-send-email-rodrigo.vivi@intel.com	2017-07-06 13:22:14 -07:00
Andy Lutomirski	660da7c922	x86/mm: Enable CR4.PCIDE on supported systems We can use PCID if the CPU has PCID and PGE and we're not on Xen. By itself, this has no effect. A followup patch will start using PCID. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Nadav Amit <nadav.amit@gmail.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/6327ecd907b32f79d5aa0d466f04503bbec5df88.1498751203.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-07-05 10:52:58 +02:00
Andy Lutomirski	0790c9aad8	x86/mm: Add the 'nopcid' boot option to turn off PCID The parameter is only present on x86_64 systems to save a few bytes, as PCID is always disabled on x86_32. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Nadav Amit <nadav.amit@gmail.com> Reviewed-by: Borislav Petkov <bp@suse.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/8bbb2e65bcd249a5f18bfb8128b4689f08ac2b60.1498751203.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-07-05 10:52:57 +02:00
Andy Lutomirski	cba4671af7	x86/mm: Disable PCID on 32-bit kernels 32-bit kernels on new hardware will see PCID in CPUID, but PCID can only be used in 64-bit mode. Rather than making all PCID code conditional, just disable the feature on 32-bit builds. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Nadav Amit <nadav.amit@gmail.com> Reviewed-by: Borislav Petkov <bp@suse.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/2e391769192a4d31b808410c383c6bf0734bc6ea.1498751203.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-07-05 10:52:57 +02:00
Chen Yu	12df216c61	x86/boot/e820: Introduce the bootloader provided e820_table_firmware[] table Add the real e820_tabel_firmware[] that will not be modified by the kernel or the EFI boot stub under any circumstance. In addition to that modify the code so that e820_table_firmwarep[] is exposed via sysfs to represent the real firmware memory layout, rather than exposing the e820_table_kexec[] table. This fixes a hibernation bug/warning, which uses e820_table_kexec[] to check RAM layout consistency across hibernation/resume: The suspend kernel: [ 0.000000] e820: update [mem 0x76671018-0x76679457] usable ==> usable The resume kernel: [ 0.000000] e820: update [mem 0x7666f018-0x76677457] usable ==> usable ... [ 15.752088] PM: Using 3 thread(s) for decompression. [ 15.752088] PM: Loading and decompressing image data (471870 pages)... [ 15.764971] Hibernate inconsistent memory map detected! [ 15.770833] PM: Image mismatch: architecture specific data Actually it is safe to restore these pages because E820_TYPE_RAM and E820_TYPE_RESERVED_KERN are treated the same during hibernation, so the original e820 table provided by the bootloader is used for hibernation MD5 fingerprint checking. The side effect is that, this newly introduced variable might increase the kernel size at compile time. Suggested-by: Ingo Molnar <mingo@redhat.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Xunlei Pang <xlpang@redhat.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-07-05 10:09:02 +02:00
Chen Yu	a09bae0f8a	x86/boot/e820: Rename the e820_table_firmware to e820_table_kexec Currently the e820_table_firmware[] table is mainly used by the kexec, and it is not what it's supposed to be - despite its name it might be modified by the kernel. So change its name to e820_table_kexec[]. In the next patch we will introduce the real e820_table_firmware[] table. No functional change. Signed-off-by: Chen Yu <yu.c.chen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Xunlei Pang <xlpang@redhat.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-07-05 10:09:02 +02:00
Chen Yu	b7a67e02cd	x86/boot/e820: Avoid overwriting e820_table_firmware The following commit in 2013: `77ea8c9489` ("x86: Reserve setup_data ranges late after parsing memmap cmdline") has fixed the issue of losing setup_data information by deferring the e820_reserve_setup_data() call until the early params have been parsed. But this also introduced a new problem that, during early params parsing, the kexec kernel might fake a mptable and saves it into the e820_table_firmware[] table (without saving the mptable to the e820_table[]), however the subsequent invoking of e820_reserve_setup_data() will overwrite the e820_table_firmware[] according to the e820_table[], thus the fake mptable information is lost. Fix this issue by updating the e820_table_firmware[] according to the setup_data information, but without overwriting it. Signed-off-by: Chen Yu <yu.c.chen@intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Xunlei Pang <xlpang@redhat.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-07-05 10:09:02 +02:00
Mikulas Patocka	99c13b8c88	x86/mm/pat: Don't report PAT on CPUs that don't support it The pat_enabled() logic is broken on CPUs which do not support PAT and where the initialization code fails to call pat_init(). Due to that the enabled flag stays true and pat_enabled() returns true wrongfully. As a consequence the mappings, e.g. for Xorg, are set up with the wrong caching mode and the required MTRR setups are omitted. To cure this the following changes are required: 1) Make pat_enabled() return true only if PAT initialization was invoked and successful. 2) Invoke init_cache_modes() unconditionally in setup_arch() and remove the extra callsites in pat_disable() and the pat disabled code path in pat_init(). Also rename __pat_enabled to pat_disabled to reflect the real purpose of this variable. Fixes: `9cd25aac1f` ("x86/mm/pat: Emulate PAT when it is disabled") Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Bernhard Held <berny156@gmx.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: "Luis R. Rodriguez" <mcgrof@suse.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/alpine.LRH.2.02.1707041749300.3456@file01.intranet.prod.int.rdu2.redhat.com	2017-07-05 09:01:24 +02:00
Linus Torvalds	408c9861c6	Power management updates for v4.13-rc1 - Rework suspend-to-idle to allow it to take wakeup events signaled by the EC into account on ACPI-based platforms in order to properly support power button wakeup from suspend-to-idle on recent Dell laptops (Rafael Wysocki). That includes the core suspend-to-idle code rework, support for the Low Power S0 _DSM interface, and support for the ACPI INT0002 Virtual GPIO device from Hans de Goede (required for USB keyboard wakeup from suspend-to-idle to work on some machines). - Stop trying to export the current CPU frequency via /proc/cpuinfo on x86 as that is inaccurate and confusing (Len Brown). - Rework the way in which the current CPU frequency is exported by the kernel (over the cpufreq sysfs interface) on x86 systems with the APERF and MPERF registers by always using values read from these registers, when available, to compute the current frequency regardless of which cpufreq driver is in use (Len Brown). - Rework the PCI/ACPI device wakeup infrastructure to remove the questionable and artificial distinction between "devices that can wake up the system from sleep states" and "devices that can generate wakeup signals in the working state" from it, which allows the code to be simplified quite a bit (Rafael Wysocki). - Fix the wakeup IRQ framework by making it use SRCU instead of RCU which doesn't allow sleeping in the read-side critical sections, but which in turn is expected to be allowed by the IRQ bus locking infrastructure (Thomas Gleixner). - Modify some computations in the intel_pstate driver to avoid rounding errors resulting from them (Srinivas Pandruvada). - Reduce the overhead of the intel_pstate driver in the HWP (hardware-managed P-states) mode and when the "performance" P-state selection algorithm is in use by making it avoid registering scheduler callbacks in those cases (Len Brown). - Rework the energy_performance_preference sysfs knob in intel_pstate by changing the values that correspond to different symbolic hint names used by it (Len Brown). - Make it possible to use more than one cpuidle driver at the same time on ARM (Daniel Lezcano). - Make it possible to prevent the cpuidle menu governor from using the 0 state by disabling it via sysfs (Nicholas Piggin). - Add support for FFH (Fixed Functional Hardware) MWAIT in ACPI C1 on AMD systems (Yazen Ghannam). - Make the CPPC cpufreq driver take the lowest nonlinear performance information into account (Prashanth Prakash). - Add support for hi3660 to the cpufreq-dt driver, fix the imx6q driver and clean up the sfi, exynos5440 and intel_pstate drivers (Colin Ian King, Krzysztof Kozlowski, Octavian Purdila, Rafael Wysocki, Tao Wang). - Fix a few minor issues in the generic power domains (genpd) framework and clean it up somewhat (Krzysztof Kozlowski, Mikko Perttunen, Viresh Kumar). - Fix a couple of minor issues in the operating performance points (OPP) framework and clean it up somewhat (Viresh Kumar). - Fix a CONFIG dependency in the hibernation core and clean it up slightly (Balbir Singh, Arvind Yadav, BaoJun Luo). - Add rk3228 support to the rockchip-io adaptive voltage scaling (AVS) driver (David Wu). - Fix an incorrect bit shift operation in the RAPL power capping driver (Adam Lessnau). - Add support for the EPP field in the HWP (hardware managed P-states) control register, HWP.EPP, to the x86_energy_perf_policy tool and update msr-index.h with HWP.EPP values (Len Brown). - Fix some minor issues in the turbostat tool (Len Brown). - Add support for AMD family 0x17 CPUs to the cpupower tool and fix a minor issue in it (Sherry Hurwitz). - Assorted cleanups, mostly related to the constification of some data structures (Arvind Yadav, Joe Perches, Kees Cook, Krzysztof Kozlowski). -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJZWrICAAoJEILEb/54YlRxZYMQAIRhfbyDxKq+ByvSilUS8kTA AItwJ8FFzykhiwN75Cqabg4rAGyWma7IRs1vzU7zeC1aEQIn+bTQtvk+utZNI+g2 ANFlDha20q/sXsP/CDMMTIAdW9tSOC0TOvFI9s2V2Y8dJZhoekO4ctx34FAfUS5d Ao6rwSAWCMsCXcGaTAlqTA+TEJmBG7u6Iq6hq6ngltoFwOv3mWWBVn52VVaJ7SMp 9/IPbbLGMFAedrgEBRGCR+MME1xZZpvcZIJaTt1Mgn7Cx3cJaysIUAvqY/SsvFGq 5FcUTcF2qpK3+AGawiAxZIjvOBsGRtIwqKinNIzYWs/NjiIdzmgVAmTeuPtTqp+5 HFehUdtkFcnuDnLqSNzAaZUa7tw84cJkwnbVMnesx0MkG6rZ1SeL22E2Sabpcdsh 3Yo1ThzJSxi59DhiiE92EQnNCEjmCldRy+8q5Ag035muxl6EJYvuNBMnZv/BMCUn ltSNOrmps1DlN+Col8ORIeNzQ1YjYzWMqKAYzSbyccm4ug/iSHx0/DuESmQ4GTlF YCwkmqyWiHrBwpl51jc+4a7SGlMmKRqU+MJes0CjagaaqoUAb8qeBOpzEJ0yNwjZ wtI41l6blE6kbMD3yqGdCfiB2S7GlPVoxa15eX1wRyLH3fLjwwrzJirEaiBS86tI 1PzHZEOmBlh3DYC6DBKA =Wsph -----END PGP SIGNATURE----- Merge tag 'pm-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "The big ticket items here are the rework of suspend-to-idle in order to add proper support for power button wakeup from it on recent Dell laptops and the rework of interfaces exporting the current CPU frequency on x86. In addition to that, support for a few new pieces of hardware is added, the PCI/ACPI device wakeup infrastructure is simplified significantly and the wakeup IRQ framework is fixed to unbreak the IRQ bus locking infrastructure. Also, there are some functional improvements for intel_pstate, tools updates and small fixes and cleanups all over. Specifics: - Rework suspend-to-idle to allow it to take wakeup events signaled by the EC into account on ACPI-based platforms in order to properly support power button wakeup from suspend-to-idle on recent Dell laptops (Rafael Wysocki). That includes the core suspend-to-idle code rework, support for the Low Power S0 _DSM interface, and support for the ACPI INT0002 Virtual GPIO device from Hans de Goede (required for USB keyboard wakeup from suspend-to-idle to work on some machines). - Stop trying to export the current CPU frequency via /proc/cpuinfo on x86 as that is inaccurate and confusing (Len Brown). - Rework the way in which the current CPU frequency is exported by the kernel (over the cpufreq sysfs interface) on x86 systems with the APERF and MPERF registers by always using values read from these registers, when available, to compute the current frequency regardless of which cpufreq driver is in use (Len Brown). - Rework the PCI/ACPI device wakeup infrastructure to remove the questionable and artificial distinction between "devices that can wake up the system from sleep states" and "devices that can generate wakeup signals in the working state" from it, which allows the code to be simplified quite a bit (Rafael Wysocki). - Fix the wakeup IRQ framework by making it use SRCU instead of RCU which doesn't allow sleeping in the read-side critical sections, but which in turn is expected to be allowed by the IRQ bus locking infrastructure (Thomas Gleixner). - Modify some computations in the intel_pstate driver to avoid rounding errors resulting from them (Srinivas Pandruvada). - Reduce the overhead of the intel_pstate driver in the HWP (hardware-managed P-states) mode and when the "performance" P-state selection algorithm is in use by making it avoid registering scheduler callbacks in those cases (Len Brown). - Rework the energy_performance_preference sysfs knob in intel_pstate by changing the values that correspond to different symbolic hint names used by it (Len Brown). - Make it possible to use more than one cpuidle driver at the same time on ARM (Daniel Lezcano). - Make it possible to prevent the cpuidle menu governor from using the 0 state by disabling it via sysfs (Nicholas Piggin). - Add support for FFH (Fixed Functional Hardware) MWAIT in ACPI C1 on AMD systems (Yazen Ghannam). - Make the CPPC cpufreq driver take the lowest nonlinear performance information into account (Prashanth Prakash). - Add support for hi3660 to the cpufreq-dt driver, fix the imx6q driver and clean up the sfi, exynos5440 and intel_pstate drivers (Colin Ian King, Krzysztof Kozlowski, Octavian Purdila, Rafael Wysocki, Tao Wang). - Fix a few minor issues in the generic power domains (genpd) framework and clean it up somewhat (Krzysztof Kozlowski, Mikko Perttunen, Viresh Kumar). - Fix a couple of minor issues in the operating performance points (OPP) framework and clean it up somewhat (Viresh Kumar). - Fix a CONFIG dependency in the hibernation core and clean it up slightly (Balbir Singh, Arvind Yadav, BaoJun Luo). - Add rk3228 support to the rockchip-io adaptive voltage scaling (AVS) driver (David Wu). - Fix an incorrect bit shift operation in the RAPL power capping driver (Adam Lessnau). - Add support for the EPP field in the HWP (hardware managed P-states) control register, HWP.EPP, to the x86_energy_perf_policy tool and update msr-index.h with HWP.EPP values (Len Brown). - Fix some minor issues in the turbostat tool (Len Brown). - Add support for AMD family 0x17 CPUs to the cpupower tool and fix a minor issue in it (Sherry Hurwitz). - Assorted cleanups, mostly related to the constification of some data structures (Arvind Yadav, Joe Perches, Kees Cook, Krzysztof Kozlowski)" * tag 'pm-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (69 commits) cpufreq: Update scaling_cur_freq documentation cpufreq: intel_pstate: Clean up after performance governor changes PM: hibernate: constify attribute_group structures. cpuidle: menu: allow state 0 to be disabled intel_idle: Use more common logging style PM / Domains: Fix missing default_power_down_ok comment PM / Domains: Fix unsafe iteration over modified list of domains PM / Domains: Fix unsafe iteration over modified list of domain providers PM / Domains: Fix unsafe iteration over modified list of device links PM / Domains: Handle safely genpd_syscore_switch() call on non-genpd device PM / Domains: Call driver's noirq callbacks PM / core: Drop run_wake flag from struct dev_pm_info PCI / PM: Simplify device wakeup settings code PCI / PM: Drop pme_interrupt flag from struct pci_dev ACPI / PM: Consolidate device wakeup settings code ACPI / PM: Drop run_wake from struct acpi_device_wakeup_flags PM / QoS: constify *_attribute_group. PM / AVS: rockchip-io: add io selectors and supplies for rk3228 powercap/RAPL: prevent overridding bits outside of the mask PM / sysfs: Constify attribute groups ...	2017-07-04 13:39:41 -07:00
Linus Torvalds	4422d80ed7	Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS updates from Thomas Gleixner: "The RAS updates for the 4.13 merge window: - Cleanup of the MCE injection facility (Borsilav Petkov) - Rework of the AMD/SMCA handling (Yazen Ghannam) - Enhancements for ACPI/APEI to handle new notitication types (Shiju Jose) - atomic_t to refcount_t conversion (Elena Reshetova) - A few fixes and enhancements all over the place" * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: RAS/CEC: Check the correct variable in the debugfs error handling x86/mce: Always save severity in machine_check_poll() x86/MCE, xen/mcelog: Make /dev/mcelog registration messages more precise x86/mce: Update bootlog description to reflect behavior on AMD x86/mce: Don't disable MCA banks when offlining a CPU on AMD x86/mce/mce-inject: Preset the MCE injection struct x86/mce: Clean up include files x86/mce: Get rid of register_mce_write_callback() x86/mce: Merge mce_amd_inj into mce-inject x86/mce/AMD: Use saved threshold block info in interrupt handler x86/mce/AMD: Use msr_stat when clearing MCA_STATUS x86/mce/AMD: Carve out SMCA bank configuration x86/mce/AMD: Redo error logging from APIC LVT interrupt handlers x86/mce: Convert threshold_bank.cpus from atomic_t to refcount_t RAS: Make local function parse_ras_param() static ACPI/APEI: Handle GSIV and GPIO notification types	2017-07-03 18:33:03 -07:00
Linus Torvalds	9a9594efe5	Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull SMP hotplug updates from Thomas Gleixner: "This update is primarily a cleanup of the CPU hotplug locking code. The hotplug locking mechanism is an open coded RWSEM, which allows recursive locking. The main problem with that is the recursive nature as it evades the full lockdep coverage and hides potential deadlocks. The rework replaces the open coded RWSEM with a percpu RWSEM and establishes full lockdep coverage that way. The bulk of the changes fix up recursive locking issues and address the now fully reported potential deadlocks all over the place. Some of these deadlocks have been observed in the RT tree, but on mainline the probability was low enough to hide them away." * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits) cpu/hotplug: Constify attribute_group structures powerpc: Only obtain cpu_hotplug_lock if called by rtasd ARM/hw_breakpoint: Fix possible recursive locking for arch_hw_breakpoint_init cpu/hotplug: Remove unused check_for_tasks() function perf/core: Don't release cred_guard_mutex if not taken cpuhotplug: Link lock stacks for hotplug callbacks acpi/processor: Prevent cpu hotplug deadlock sched: Provide is_percpu_thread() helper cpu/hotplug: Convert hotplug locking to percpu rwsem s390: Prevent hotplug rwsem recursion arm: Prevent hotplug rwsem recursion arm64: Prevent cpu hotplug rwsem recursion kprobes: Cure hotplug lock ordering issues jump_label: Reorder hotplug lock and jump_label_lock perf/tracing/cpuhotplug: Fix locking order ACPI/processor: Use cpu_hotplug_disable() instead of get_online_cpus() PCI: Replace the racy recursion prevention PCI: Use cpu_hotplug_disable() instead of get_online_cpus() perf/x86/intel: Drop get_online_cpus() in intel_snb_check_microcode() x86/perf: Drop EXPORT of perf_check_microcode ...	2017-07-03 18:08:06 -07:00
Linus Torvalds	3ad918e65d	Merge branch 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 timers updates from Thomas Gleixner: "This update contains: - The solution for the TSC deadline timer borkage, which is caused by a hardware problem in the TSC_ADJUST/TSC_DEADLINE_TIMER logic. The problem is documented now and fixed with a microcode update, so we can remove the workaround and just check for the microcode version. If the microcode is not up to date, then the TSC deadline timer is disabled. If the borkage is fixed by the proper microcode version, then the deadline timer can be used. In both cases the restrictions to the range of the TSC_ADJUST value, which were added as workarounds, are removed. - A few simple fixes and updates to the timer related x86 code" * 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/tsc: Call check_system_tsc_reliable() before unsynchronized_tsc() x86/hpet: Do not use smp_processor_id() in preemptible code x86/time: Make setup_default_timer_irq() static x86/tsc: Remove the TSC_ADJUST clamp x86/apic: Add TSC_DEADLINE quirk due to errata x86/apic: Change the lapic name in deadline mode	2017-07-03 18:01:50 -07:00
Linus Torvalds	03ffbcdd78	Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "The irq department delivers: - Expand the generic infrastructure handling the irq migration on CPU hotplug and convert X86 over to it. (Thomas Gleixner) Aside of consolidating code this is a preparatory change for: - Finalizing the affinity management for multi-queue devices. The main change here is to shut down interrupts which are affine to a outgoing CPU and reenabling them when the CPU comes online again. That avoids moving interrupts pointlessly around and breaking and reestablishing affinities for no value. (Christoph Hellwig) Note: This contains also the BLOCK-MQ and NVME changes which depend on the rework of the irq core infrastructure. Jens acked them and agreed that they should go with the irq changes. - Consolidation of irq domain code (Marc Zyngier) - State tracking consolidation in the core code (Jeffy Chen) - Add debug infrastructure for hierarchical irq domains (Thomas Gleixner) - Infrastructure enhancement for managing generic interrupt chips via devmem (Bartosz Golaszewski) - Constification work all over the place (Tobias Klauser) - Two new interrupt controller drivers for MVEBU (Thomas Petazzoni) - The usual set of fixes, updates and enhancements all over the place" * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (112 commits) irqchip/or1k-pic: Fix interrupt acknowledgement irqchip/irq-mvebu-gicp: Allocate enough memory for spi_bitmap irqchip/gic-v3: Fix out-of-bound access in gic_set_affinity nvme: Allocate queues for all possible CPUs blk-mq: Create hctx for each present CPU blk-mq: Include all present CPUs in the default queue mapping genirq: Avoid unnecessary low level irq function calls genirq: Set irq masked state when initializing irq_desc genirq/timings: Add infrastructure for estimating the next interrupt arrival time genirq/timings: Add infrastructure to track the interrupt timings genirq/debugfs: Remove pointless NULL pointer check irqchip/gic-v3-its: Don't assume GICv3 hardware supports 16bit INTID irqchip/gic-v3-its: Add ACPI NUMA node mapping irqchip/gic-v3-its-platform-msi: Make of_device_ids const irqchip/gic-v3-its: Make of_device_ids const irqchip/irq-mvebu-icu: Add new driver for Marvell ICU irqchip/irq-mvebu-gicp: Add new driver for Marvell GICP dt-bindings/interrupt-controller: Add DT binding for the Marvell ICU genirq/irqdomain: Remove auto-recursive hierarchy support irqchip/MSI: Use irq_domain_update_bus_token instead of an open coded access ...	2017-07-03 16:50:31 -07:00
Linus Torvalds	7a69f9c60b	Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm updates from Ingo Molnar: "The main changes in this cycle were: - Continued work to add support for 5-level paging provided by future Intel CPUs. In particular we switch the x86 GUP code to the generic implementation. (Kirill A. Shutemov) - Continued work to add PCID CPU support to native kernels as well. In this round most of the focus is on reworking/refreshing the TLB flush infrastructure for the upcoming PCID changes. (Andy Lutomirski)" * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits) x86/mm: Delete a big outdated comment about TLB flushing x86/mm: Don't reenter flush_tlb_func_common() x86/KASLR: Fix detection 32/64 bit bootloaders for 5-level paging x86/ftrace: Exclude functions in head64.c from function-tracing x86/mmap, ASLR: Do not treat unlimited-stack tasks as legacy mmap x86/mm: Remove reset_lazy_tlbstate() x86/ldt: Simplify the LDT switching logic x86/boot/64: Put __startup_64() into .head.text x86/mm: Add support for 5-level paging for KASLR x86/mm: Make kernel_physical_mapping_init() support 5-level paging x86/mm: Add sync_global_pgds() for configuration with 5-level paging x86/boot/64: Add support of additional page table level during early boot x86/boot/64: Rename init_level4_pgt and early_level4_pgt x86/boot/64: Rewrite startup_64() in C x86/boot/compressed: Enable 5-level paging during decompression stage x86/boot/efi: Define __KERNEL32_CS GDT on 64-bit configurations x86/boot/efi: Fix __KERNEL_CS definition of GDT entry on 64-bit configurations x86/boot/efi: Cleanup initialization of GDT entries x86/asm: Fix comment in return_from_SYSCALL_64() x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation ...	2017-07-03 14:45:09 -07:00
Linus Torvalds	9bc088ab66	Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 microcode updates from Ingo Molnar: "The main changes are a fix early microcode application for resume-from-RAM, plus a 32-bit initrd placement fix - by Borislav Petkov" * 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/microcode: Make a couple of symbols static x86/microcode/intel: Save pointer to ucode patch for early AP loading x86/microcode: Look for the initrd at the correct address on 32-bit	2017-07-03 14:35:18 -07:00
Linus Torvalds	e1449007e8	Merge branch 'x86-hyperv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 hyperv updates from Ingo Molnar: "Avoid boot time TSC calibration on Hyper-V hosts, to improve calibration robustness. (Vitaly Kuznetsov)" * 'x86-hyperv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/hyperv: Read TSC frequency from a synthetic MSR x86/hyperv: Check frequency MSRs presence according to the specification	2017-07-03 14:09:32 -07:00
Linus Torvalds	e6529f6f58	Merge branch 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 debug update from Ingo Molnar: "A single fix for an off-by one bug in test_nmi_ipi() that probably doesn't matter in practice" * 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/nmi: Fix timeout test in test_nmi_ipi()	2017-07-03 14:07:45 -07:00
Linus Torvalds	25e09ca524	Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 boot updates from Ingo Molnar: "The main changes in this cycle were KASLR improvements for rare environments with special boot options, by Baoquan He. Also misc smaller changes/cleanups" * 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/debug: Extend the lower bound of crash kernel low reservations x86/boot: Remove unused copy_*_gs() functions x86/KASLR: Use the right memcpy() implementation Documentation/kernel-parameters.txt: Update 'memmap=' boot option description x86/KASLR: Handle the memory limit specified by the 'memmap=' and 'mem=' boot options x86/KASLR: Parse all 'memmap=' boot option entries	2017-07-03 13:40:38 -07:00
Linus Torvalds	2a275382a4	Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 apic updates from Ingo Molnar: "Janitorial changes: removal of an unused function plus __init annotations" * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/apic: Make arch_init_msi/htirq_domain __init x86/apic: Make init_legacy_irqs() __init x86/ioapic: Remove unused IO_APIC_irq_trigger() function	2017-07-03 13:36:23 -07:00
Linus Torvalds	9bd42183b9	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "The main changes in this cycle were: - Add the SYSTEM_SCHEDULING bootup state to move various scheduler debug checks earlier into the bootup. This turns silent and sporadically deadly bugs into nice, deterministic splats. Fix some of the splats that triggered. (Thomas Gleixner) - A round of restructuring and refactoring of the load-balancing and topology code (Peter Zijlstra) - Another round of consolidating ~20 of incremental scheduler code history: this time in terms of wait-queue nomenclature. (I didn't get much feedback on these renaming patches, and we can still easily change any names I might have misplaced, so if anyone hates a new name, please holler and I'll fix it.) (Ingo Molnar) - sched/numa improvements, fixes and updates (Rik van Riel) - Another round of x86/tsc scheduler clock code improvements, in hope of making it more robust (Peter Zijlstra) - Improve NOHZ behavior (Frederic Weisbecker) - Deadline scheduler improvements and fixes (Luca Abeni, Daniel Bristot de Oliveira) - Simplify and optimize the topology setup code (Lauro Ramos Venancio) - Debloat and decouple scheduler code some more (Nicolas Pitre) - Simplify code by making better use of llist primitives (Byungchul Park) - ... plus other fixes and improvements" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (103 commits) sched/cputime: Refactor the cputime_adjust() code sched/debug: Expose the number of RT/DL tasks that can migrate sched/numa: Hide numa_wake_affine() from UP build sched/fair: Remove effective_load() sched/numa: Implement NUMA node level wake_affine() sched/fair: Simplify wake_affine() for the single socket case sched/numa: Override part of migrate_degrades_locality() when idle balancing sched/rt: Move RT related code from sched/core.c to sched/rt.c sched/deadline: Move DL related code from sched/core.c to sched/deadline.c sched/cpuset: Only offer CONFIG_CPUSETS if SMP is enabled sched/fair: Spare idle load balancing on nohz_full CPUs nohz: Move idle balancer registration to the idle path sched/loadavg: Generalize "_idle" naming to "_nohz" sched/core: Drop the unused try_get_task_struct() helper function sched/fair: WARN() and refuse to set buddy when !se->on_rq sched/debug: Fix SCHED_WARN_ON() to return a value on !CONFIG_SCHED_DEBUG as well sched/wait: Disambiguate wq_entry->task_list and wq_head->task_list naming sched/wait: Move bit_wait_table[] and related functionality from sched/core.c to sched/wait_bit.c sched/wait: Split out the wait_bit*() APIs from <linux/wait.h> into <linux/wait_bit.h> sched/wait: Re-adjust macro line continuation backslashes in <linux/wait.h> ...	2017-07-03 13:08:04 -07:00
Linus Torvalds	e94693f797	Merge branch 'core-objtool-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool updates from Ingo Molnar: "This is an extensive rewrite of the objdump tool to track all stack pointer modifications through the machine instructions of disassembled functions found in kernel .o files. This re-design removes the prior dependency on CONFIG_FRAME_POINTERS, with the goal to prepare the tool to generate kernel debuginfo data in the future. There's also an increase in checking/tracking robustness as a side effect as well. No (intended) changes to existing functionality" * 'core-objtool-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Silence warnings for functions which use IRET objtool: Implement stack validation 2.0 objtool, x86: Add several functions and files to the objtool whitelist objtool: Move checking code to check.c	2017-07-03 11:12:04 -07:00
Rafael J. Wysocki	f1c7842e5f	Merge branches 'pm-cpufreq', 'intel_pstate' and 'pm-cpuidle' * pm-cpufreq: cpufreq / CPPC: Initialize policy->min to lowest nonlinear performance cpufreq: sfi: make freq_table static cpufreq: exynos5440: Fix inconsistent indenting cpufreq: imx6q: imx6ull should use the same flow as imx6ul cpufreq: dt: Add support for hi3660 * intel_pstate: cpufreq: Update scaling_cur_freq documentation cpufreq: intel_pstate: Clean up after performance governor changes intel_pstate: skip scheduler hook when in "performance" mode intel_pstate: delete scheduler hook in HWP mode x86: use common aperfmperf_khz_on_cpu() to calculate KHz using APERF/MPERF cpufreq: intel_pstate: Remove max/min fractions to limit performance x86: do not use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz" * pm-cpuidle: cpuidle: menu: allow state 0 to be disabled intel_idle: Use more common logging style x86/ACPI/cstate: Allow ACPI C1 FFH MWAIT use on AMD systems ARM: cpuidle: Support asymmetric idle definition	2017-07-03 14:21:18 +02:00
Linus Torvalds	e18aca0236	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "Fixlets for x86: - Prevent kexec crash when KASLR is enabled, which was caused by an address calculation bug - Restore the freeing of PUDs on memory hot remove - Correct a negated pointer check in the intel uncore performance monitoring driver - Plug a memory leak in an error exit path in the RDT code" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/intel_rdt: Fix memory leak on mount failure x86/boot/KASLR: Fix kexec crash due to 'virt_addr' calculation bug x86/boot/KASLR: Add checking for the offset of kernel virtual address randomization perf/x86/intel/uncore: Fix wrong box pointer check x86/mm/hotplug: Fix BUG_ON() after hot-remove by not freeing PUD	2017-07-01 09:10:17 -07:00
Vikas Shivappa	79298acc4b	x86/intel_rdt: Fix memory leak on mount failure If mount fails, the kn_info directory is not freed causing memory leak. Add the missing error handling path. Fixes: `4e978d06de` ("x86/intel_rdt: Add "info" files to resctrl file system") Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: vikas.shivappa@intel.com Cc: andi.kleen@intel.com Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1498503368-20173-3-git-send-email-vikas.shivappa@linux.intel.com	2017-06-30 21:20:00 +02:00
Linus Torvalds	27ab862a3a	IOMMU Fixes for Linux 4.12-rc7 Two fixes: * A fix for AMD IOMMU interrupt remapping code when IRQs are forwarded directly to KVM guests * Fixed check in the recently merged code to allow tboot with Intel VT-d disabled -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABAgAGBQJZVoIAAAoJECvwRC2XARrjpYoP/0JGpnrtGsw2GnXbe4jCXwWg hWpJze0QyHi7oJrNHRK1W72rwQxp53NUXS942bosXj18U+KU3x3UnLQfU5u1RfEi Z44riGpurZ2+Ied+X01eS5ALeno9/jj7DOnSDsrZS2vI56aJTGISNO/8xvE5M5Mj xwvOQW+K1Wgqac2Sdi+ckt8tUhik1MKbkL8k04/27Yzq6FxvkEyiHXDQcXA+eGId ewW034Z2ueriZ6fzuanQW3j57albeIN6XsTaOHbwD2edQwiyZ6yoMakKgMuHnpgx F4BtPtcbgSHSQ48ZZb9bdQpvEO3lY0jmSlMI4S2Fu5DmSIKC/KOy/4yYUhlQtrHT UUbIvq/pC+SMPxZiZAJhyIFcV6YkTelArPFx+QxsWvMMiXeGnezgeFsAOzwZuptF FFm9ItgfPkGxkiECFwJwSAHsTiiFocRfYHHv/ace/6X4ZB+nZrl3mSfX7+EtT2LG Kje2XtUzoGR/8LSBTMlQQeurhBZwbnoaFEtiVMrLODhvFT3IU7B00wgQHWNpjyRj Rqe/ScHRdRF1NQtW6QDTpNU4rZGB4lt1WxpMlONVVpO4LI/fXZq8Nq4lT+FzUHV1 xNQEh9xV2DK7bjT3HDd/OKSusaLzImNFYmi6+t/gROX8z/PqISb5IOmjaiO9pYYM coSJHmYL1hc0yFsNp6eB =bw0V -----END PGP SIGNATURE----- Merge tag 'iommu-fixes-v4.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull IOMMU fixes from Joerg Roedel: "Two fixes: - A fix for AMD IOMMU interrupt remapping code when IRQs are forwarded directly to KVM guests - Fixed check in the recently merged code to allow tboot with Intel VT-d disabled" * tag 'iommu-fixes-v4.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/amd: Fix interrupt remapping when disable guest_mode iommu/vt-d: Correctly disable Intel IOMMU force on	2017-06-30 10:37:48 -07:00
Josh Poimboeuf	c207aee480	objtool, x86: Add several functions and files to the objtool whitelist In preparation for an objtool rewrite which will have broader checks, whitelist functions and files which cause problems because they do unusual things with the stack. These whitelists serve as a TODO list for which functions and files don't yet have undwarf unwinder coverage. Eventually most of the whitelists can be removed in favor of manual CFI hint annotations or objtool improvements. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: live-patching@vger.kernel.org Link: http://lkml.kernel.org/r/7f934a5d707a574bda33ea282e9478e627fb1829.1498659915.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-06-30 10:19:19 +02:00
Kirill A. Shutemov	bb43dbc5e0	x86/ftrace: Exclude functions in head64.c from function-tracing A recent commit moved most logic of early boot up from startup_64() written in assembly to __startup_64() written in C. Fengguang reported breakage due to the change. It was tracked down to CONFIG_FUNCTION_TRACER being enabled. Tracing this function is not possible because it's invoked from the earliest boot stage before the relocation fixups have been done. It is the function doing the relocation. Exclude it from being built with tracer stubs. Fixes: `c88d71508e` ("x86/boot/64: Rewrite startup_64() in C") Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: lkp@01.org Link: http://lkml.kernel.org/r/20170627115948.17938-1-kirill.shutemov@linux.intel.com	2017-06-29 22:33:27 +02:00
Tobias Klauser	6474924e2b	arch: remove unused macro/function thread_saved_pc() The only user of thread_saved_pc() in non-arch-specific code was removed in commit `8243d55977` ("sched/core: Remove pointless printout in sched_show_task()"). Remove the implementations as well. Some architectures use thread_saved_pc() in their arch-specific code. Leave their thread_saved_pc() intact. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-06-28 16:13:57 -07:00
Christoph Hellwig	5860acc1a9	x86: remove arch specific dma_supported implementation And instead wire it up as method for all the dma_map_ops instances. Note that this also means the arch specific check will be fully instead of partially applied in the AMD iommu driver. Signed-off-by: Christoph Hellwig <hch@lst.de>	2017-06-28 06:54:46 -07:00
Christoph Hellwig	8bd17c6670	x86/calgary: implement ->mapping_error DMA_ERROR_CODE is going to go away, so don't rely on it. Signed-off-by: Christoph Hellwig <hch@lst.de>	2017-06-28 06:54:35 -07:00
Christoph Hellwig	14a9aad7f0	x86/pci-nommu: implement ->mapping_error DMA_ERROR_CODE is going to go away, so don't rely on it. Signed-off-by: Christoph Hellwig <hch@lst.de>	2017-06-28 06:54:34 -07:00
Yazen Ghannam	5209654a46	x86/ACPI/cstate: Allow ACPI C1 FFH MWAIT use on AMD systems AMD systems support the Monitor/Mwait instructions and these can be used for ACPI C1 in the same way as on Intel systems. Three things are needed: 1) This patch. 2) BIOS that declares a C1 state in _CST to use FFH, with correct values. 3) CPUID_Fn00000005_EDX is non-zero on the system. The BIOS on AMD systems have historically not defined a C1 state in _CST, so the acpi_idle driver uses HALT for ACPI C1. Currently released systems have CPUID_Fn00000005_EDX as reserved/RAZ. If a BIOS is released for these systems that requests a C1 state with FFH, the FFH implementation in Linux will fail since CPUID_Fn00000005_EDX is 0. The acpi_idle driver will then fallback to using HALT for ACPI C1. Future systems are expected to have non-zero CPUID_Fn00000005_EDX and BIOS support for using FFH for ACPI C1. Allow ffh_cstate_init() to succeed on AMD systems. Tested on Fam15h and Fam17h systems. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2017-06-27 02:00:52 +02:00
Len Brown	f8475cef90	x86: use common aperfmperf_khz_on_cpu() to calculate KHz using APERF/MPERF The goal of this change is to give users a uniform and meaningful result when they read /sys/...cpufreq/scaling_cur_freq on modern x86 hardware, as compared to what they get today. Modern x86 processors include the hardware needed to accurately calculate frequency over an interval -- APERF, MPERF, and the TSC. Here we provide an x86 routine to make this calculation on supported hardware, and use it in preference to any driver driver-specific cpufreq_driver.get() routine. MHz is computed like so: MHz = base_MHz * delta_APERF / delta_MPERF MHz is the average frequency of the busy processor over a measurement interval. The interval is defined to be the time between successive invocations of aperfmperf_khz_on_cpu(), which are expected to to happen on-demand when users read sysfs attribute cpufreq/scaling_cur_freq. As with previous methods of calculating MHz, idle time is excluded. base_MHz above is from TSC calibration global "cpu_khz". This x86 native method to calculate MHz returns a meaningful result no matter if P-states are controlled by hardware or firmware and/or if the Linux cpufreq sub-system is or is-not installed. When this routine is invoked more frequently, the measurement interval becomes shorter. However, the code limits re-computation to 10ms intervals so that average frequency remains meaningful. Discerning users are encouraged to take advantage of the turbostat(8) utility, which can gracefully handle concurrent measurement intervals of arbitrary length. Signed-off-by: Len Brown <len.brown@intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2017-06-27 01:47:32 +02:00
Yazen Ghannam	e2de64ec52	x86/mce: Always save severity in machine_check_poll() The MCE severity gives a hint as to how to handle the error. The notifier blocks can then use the severity to decide on an action. It's not necessary for machine_check_poll() to filter errors for the notifier chain, since each block will check its own set of conditions before handling an error. Also, there isn't any urgency for machine_check_poll() to make decisions based on severity like in do_machine_check(). If we can assume that a severity is set then we can use it in more notifier blocks. For example, the CEC block could check for a "KEEP" severity rather than checking bits in the status. This isn't possible now since the severity is not set except for "DEFFRRED/UCNA" errors with a valid address. Save the severity since we have it, and let the notifier blocks decide if they want to do anything. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1498074402-98633-1-git-send-email-Yazen.Ghannam@amd.com	2017-06-26 15:58:56 +02:00
Colin Ian King	d7f7dc7b88	x86/microcode: Make a couple of symbols static The helper function __load_ucode_amd() and pointer intel_ucode_patch do not need to be in global scope, so make them static. Fixes those sparse warnings: "symbol '__load_ucode_amd' was not declared. Should it be static?" "symbol 'intel_ucode_patch' was not declared. Should it be static?" Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170622095736.11937-1-colin.king@canonical.com	2017-06-26 15:57:37 +02:00
Len Brown	51204e0639	x86: do not use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz" cpufreq_quick_get() allows cpufreq drivers to over-ride cpu_khz that is otherwise reported in x86 /proc/cpuinfo "cpu MHz". There are four problems with this scheme, any of them is sufficient justification to delete it. 1. Depending on which cpufreq driver is loaded, the behavior of this field is different. 2. Distros complain that they have to explain to users why and how this field changes. Distros have requested a constant. 3. The two major providers of this information, acpi_cpufreq and intel_pstate, both "get it wrong" in different ways. acpi_cpufreq lies to the user by telling them that they are running at whatever frequency was last requested by software. intel_pstate lies to the user by telling them that they are running at the average frequency computed over an undefined measurement. But an average computed over an undefined interval, is itself, undefined... 4. On modern processors, user space utilities, such as turbostat(1), are more accurate and more precise, while supporing concurrent measurement over arbitrary intervals. Users who have been consulting /proc/cpuinfo to track changing CPU frequency will be dissapointed that it no longer wiggles -- perhaps being unaware of the limitations of the information they have been consuming. Yes, they can change their scripts to look in sysfs cpufreq/scaling_cur_frequency. Here they will find the same data of dubious quality here removed from /proc/cpuinfo. The value in sysfs will be addressed in a subsequent patch to address issues 1-3, above. Issue 4 will remain -- users that really care about accurate frequency information should not be using either proc or sysfs kernel interfaces. They should be using using turbostat(8), or a similar purpose-built analysis tool. Signed-off-by: Len Brown <len.brown@intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2017-06-24 01:45:47 +02:00
Thomas Gleixner	3ca57222c3	x86/apic: Mark single target interrupts If the interrupt destination mode of the APIC is physical then the effective affinity is restricted to a single CPU. Mark the interrupt accordingly in the domain allocation code, so the core code can avoid pointless affinity setting attempts. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Keith Busch <keith.busch@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Christoph Hellwig <hch@lst.de> Link: http://lkml.kernel.org/r/20170619235447.508846202@linutronix.de	2017-06-22 18:21:26 +02:00
Thomas Gleixner	c7d6c9dd87	x86/apic: Implement effective irq mask update Add the effective irq mask update to the apic implementations and enable effective irq masks for x86. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Keith Busch <keith.busch@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Christoph Hellwig <hch@lst.de> Link: http://lkml.kernel.org/r/20170619235446.878370703@linutronix.de	2017-06-22 18:21:23 +02:00
Thomas Gleixner	0e24f7c9f6	x86/apic: Add irq_data argument to apic->cpu_mask_to_apicid() The decision to which CPUs an interrupt is effectively routed happens in the various apic->cpu_mask_to_apicid() implementations To support effective affinity masks this information needs to be updated in irq_data. Add a pointer to irq_data to the callbacks and feed it through the call chain. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Keith Busch <keith.busch@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Christoph Hellwig <hch@lst.de> Link: http://lkml.kernel.org/r/20170619235446.720739075@linutronix.de	2017-06-22 18:21:22 +02:00
Thomas Gleixner	91cd9cb7ee	x86/apic: Move cpumask and to core code All implementations of apic->cpu_mask_to_apicid_and() and the two incoming cpumasks to search for the target. Move that operation to the call site and rename it to cpu_mask_to_apicid() Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Keith Busch <keith.busch@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Christoph Hellwig <hch@lst.de> Link: http://lkml.kernel.org/r/20170619235446.641575516@linutronix.de	2017-06-22 18:21:22 +02:00
Thomas Gleixner	52b166af40	x86/apic: Move online masking to core code All implementations of apic->cpu_mask_to_apicid_and() mask out the offline cpus. The callsite already has a mask available, which has the offline CPUs removed. Use that and remove the extra bits. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Keith Busch <keith.busch@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Christoph Hellwig <hch@lst.de> Link: http://lkml.kernel.org/r/20170619235446.560868224@linutronix.de	2017-06-22 18:21:21 +02:00
Thomas Gleixner	bbcf9574bc	x86/uv: Use default_cpu_mask_to_apicid_and() Same functionality except the extra bits ored on the apicid. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Keith Busch <keith.busch@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Christoph Hellwig <hch@lst.de> Link: http://lkml.kernel.org/r/20170619235446.482841015@linutronix.de	2017-06-22 18:21:21 +02:00
Thomas Gleixner	ad95212ee6	x86/apic: Move flat_cpu_mask_to_apicid_and() into C source No point in having inlines assigned to function pointers at multiple places. Just bloats the text. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Keith Busch <keith.busch@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Christoph Hellwig <hch@lst.de> Link: http://lkml.kernel.org/r/20170619235446.405975721@linutronix.de	2017-06-22 18:21:21 +02:00
Thomas Gleixner	ad7a929fa4	x86/irq: Use irq_migrate_all_off_this_cpu() The generic migration code supports all the required features already. Remove the x86 specific implementation and use the generic one. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Keith Busch <keith.busch@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Christoph Hellwig <hch@lst.de> Link: http://lkml.kernel.org/r/20170619235445.851311033@linutronix.de	2017-06-22 18:21:18 +02:00
Thomas Gleixner	654abd0a7b	x86/irq: Restructure fixup_irqs() Reorder fixup_irqs() so it matches the flow in the generic migration code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Keith Busch <keith.busch@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Christoph Hellwig <hch@lst.de> Link: http://lkml.kernel.org/r/20170619235445.774272454@linutronix.de	2017-06-22 18:21:18 +02:00
Thomas Gleixner	8e7b632237	x86/irq: Cleanup pending irq move in fixup_irqs() If an CPU goes offline, the interrupts are migrated away, but a eventually pending interrupt move, which has not yet been made effective is kept pending even if the outgoing CPU is the sole target of the pending affinity mask. What's worse is, that the pending affinity mask is discarded even if it would contain a valid subset of the online CPUs. Use the newly introduced helper to: - Discard a pending move when the outgoing CPU is the only target in the pending mask. - Use the pending mask instead of the affinity mask to find a valid target for the CPU if the pending mask intersects with the online CPUs. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Keith Busch <keith.busch@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Christoph Hellwig <hch@lst.de> Link: http://lkml.kernel.org/r/20170619235444.774068557@linutronix.de	2017-06-22 18:21:13 +02:00
Thomas Gleixner	f8f37ca789	x86/msi: Create named irq domains Use the fwnode to create named irq domains so diagnosis works. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Keith Busch <keith.busch@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Christoph Hellwig <hch@lst.de> Link: http://lkml.kernel.org/r/20170619235444.299024560@linutronix.de	2017-06-22 18:21:11 +02:00
Thomas Gleixner	0323b96904	x86/msi: Remove unused remap irq domain interface Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Keith Busch <keith.busch@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Christoph Hellwig <hch@lst.de> Link: http://lkml.kernel.org/r/20170619235444.221049665@linutronix.de	2017-06-22 18:21:11 +02:00
Thomas Gleixner	667724c5a3	x86/msi: Provide new iommu irqdomain interface Provide a new interface for creating the iommu remapping domains, so that the caller can supply a name and a id in order to create named irqdomains. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Joerg Roedel <joro@8bytes.org> Cc: Keith Busch <keith.busch@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: iommu@lists.linux-foundation.org Cc: Christoph Hellwig <hch@lst.de> Link: http://lkml.kernel.org/r/20170619235443.986661206@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-06-22 18:21:10 +02:00
Thomas Gleixner	5f432711ba	x86/htirq: Create named domain Use the fwnode to create a named domain so diagnosis works. Mark the init function __init while at it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Keith Busch <keith.busch@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Christoph Hellwig <hch@lst.de> Link: http://lkml.kernel.org/r/20170619235443.829047007@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-06-22 18:21:09 +02:00
Thomas Gleixner	1b604745c8	x86/ioapic: Create named irq domain Use the fwnode to create a named domain so diagnosis works, but only when the the ioapic is not device tree based. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Keith Busch <keith.busch@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Christoph Hellwig <hch@lst.de> Link: http://lkml.kernel.org/r/20170619235443.752782603@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-06-22 18:21:09 +02:00
Thomas Gleixner	9d35f85959	x86/vector: Create named irq domain Use the fwnode to create a named domain so diagnosis works. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Keith Busch <keith.busch@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Christoph Hellwig <hch@lst.de> Link: http://lkml.kernel.org/r/20170619235443.673635238@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-06-22 18:21:08 +02:00
Thomas Gleixner	8947dfb257	x86/apic: Add name to irq chip Add the missing name, so debugging will work proper. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Keith Busch <keith.busch@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Christoph Hellwig <hch@lst.de> Link: http://lkml.kernel.org/r/20170619235443.266561988@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-06-22 18:21:06 +02:00
Zhenzhong Duan	a1272dd553	x86/tsc: Call check_system_tsc_reliable() before unsynchronized_tsc() tsc_clocksource_reliable is initialized in check_system_tsc_reliable(), but it is checked in unsynchronized_tsc() which is called before the initialization. In practice that's not an issue because systems which mark the TSC reliable have X86_FEATURE_CONSTANT_TSC set as well, which is evaluated in unsynchronized_tsc() before tsc_clocksource_reliable. Reorder the calls so initialization happens before usage. [ tglx: Massaged changelog ] Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/b1532ef7-cd9f-45f7-9f49-48dd2a5c2495@default	2017-06-22 16:00:03 +02:00
Vitaly Kuznetsov	71c2a2d0a8	x86/hyperv: Read TSC frequency from a synthetic MSR It was found that SMI_TRESHOLD of 50000 is not enough for Hyper-V guests in nested environment and falling back to counting jiffies is not an option for Gen2 guests as they don't have PIT. As Hyper-V provides TSC frequency in a synthetic MSR we can just use this information instead of doing a error prone calibration. Reported-and-tested-by: Ladi Prosek <lprosek@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Jork Loeser <jloeser@microsoft.com> Cc: devel@linuxdriverproject.org Cc: "K. Y. Srinivasan" <kys@microsoft.com> Link: http://lkml.kernel.org/r/20170622100730.18112-3-vkuznets@redhat.com	2017-06-22 15:35:12 +02:00
Vitaly Kuznetsov	2cf0284223	x86/hyperv: Check frequency MSRs presence according to the specification Hyper-V TLFS specifies two bits which should be checked before accessing frequency MSRs: - AccessFrequencyMsrs (BIT(11) in EAX) which indicates if we have access to frequency MSRs. - FrequencyMsrsAvailable (BIT(8) in EDX) which indicates is these MSRs are present. Rename and specify these bits accordingly. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Ladi Prosek <lprosek@redhat.com> Cc: Jork Loeser <jloeser@microsoft.com> Cc: devel@linuxdriverproject.org Cc: "K. Y. Srinivasan" <kys@microsoft.com> Link: http://lkml.kernel.org/r/20170622100730.18112-2-vkuznets@redhat.com	2017-06-22 15:35:11 +02:00
Jiri Bohac	fe2d48b805	x86/debug: Extend the lower bound of crash kernel low reservations The following change in 2013: `0212f91596` ("x86: Add Crash kernel low reservation") ... introduced reserve_crashkernel_low(). This function is used to reserve crash kernel memory either if crashkernel=size,low is given on the command line or if the region reserved by reserve_crashkernel is entirely above 4G. reserve_crashkernel_low() tries to find a block of 'low_size' bytes. But there seems to be no good reason to restrict the lower bound of the range to 'low_size'. Make memblock_find_in_range() search from the start of memory. Signed-off-by: Jiri Bohac <jbohac@suse.cz> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/20170616161602.2r7birrf2y3ylv6v@dwarf.suse.cz Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-06-22 11:10:23 +02:00
Andy Lutomirski	d54368127a	x86/mm: Remove reset_lazy_tlbstate() The only call site also calls idle_task_exit(), and idle_task_exit() puts us into a clean state by explicitly switching to init_mm. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Rik van Riel <riel@redhat.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/3acc7ad02a2ec060d2321a1e0f6de1cb90069517.1498022414.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-06-22 10:57:50 +02:00
Ingo Molnar	a4eb8b9935	Merge branch 'linus' into x86/mm, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-06-22 10:57:28 +02:00
Dou Liyang	538ac46c64	x86/apic: Make arch_init_msi/htirq_domain __init These two functions are only called by arch_early_irq_init(), which is an __init function, so mark them __init as well. Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1498101341-10182-1-git-send-email-douly.fnst@cn.fujitsu.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-06-22 10:34:42 +02:00
Dou Liyang	a884d25f38	x86/apic: Make init_legacy_irqs() __init This function is only called by arch_early_irq_init(), which is an __init function, so mark the child function __init as well. In addition mark it inline for the !CONFIG_X86_IO_APIC case. Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1498040061-5332-1-git-send-email-douly.fnst@cn.fujitsu.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-06-22 10:34:41 +02:00
Juergen Gross	b867059018	x86/MCE, xen/mcelog: Make /dev/mcelog registration messages more precise When running under Xen as dom0, /dev/mcelog is being provided by Xen instead of the normal mcelog character device of the MCE core. Convert an error message being issued by the MCE core in this case to an informative message that Xen has registered the device. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: xen-devel@lists.xenproject.org Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170614084059.19294-1-jgross@suse.com	2017-06-20 23:25:19 +02:00
Kirill A. Shutemov	26179670a6	x86/boot/64: Put __startup_64() into .head.text Put __startup_64() and fixup_pointer() into .head.text section to make sure it's always near startup_64() and always callable. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel test robot <fengguang.wu@intel.com> Cc: wfg@linux.intel.com Link: http://lkml.kernel.org/r/20170616113024.ajmif63cmcszry5a@black.fi.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-06-20 12:56:27 +02:00
Borislav Petkov	bd20733045	x86/microcode/intel: Save pointer to ucode patch for early AP loading Normally, when the initrd is gone, we can't search it for microcode blobs to apply anymore. For that we need to stash away the patch in our own storage. And save_microcode_in_initrd_intel() looks like the proper place to do that from. So in order for early loading to work, invalidate the intel_ucode_patch pointer to the patch before scanning the initrd one last time. If the scanning code finds a microcode patch, it will assign that pointer again, this time with our own storage's address. This way, early microcode application during resume-from-RAM works too, even after the initrd is long gone. Tested-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170614140626.4462-2-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-06-20 12:54:25 +02:00
Borislav Petkov	a3d98c9358	x86/microcode: Look for the initrd at the correct address on 32-bit Early during boot, the BSP finds the ramdisk's position from boot_params but by the time the APs get to boot, the BSP has continued in the mean time and has potentially managed to relocate that ramdisk. And in that case, the APs need to find the ramdisk at its new position, in physical memory as they're running before paging has been enabled. Thus, get the updated physical location of the ramdisk which is in the relocated_ramdisk variable. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170614140626.4462-1-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-06-20 12:54:24 +02:00
Dan Carpenter	c133c76157	x86/nmi: Fix timeout test in test_nmi_ipi() We're supposed to exit the loop with "timeout" set to zero. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Don Zickus <dzickus@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel-janitors@vger.kernel.org Fixes: `99e8b9ca90` ("x86, NMI: Add NMI IPI selftest") Link: http://lkml.kernel.org/r/20170619105304.GA23995@elgon.mountain Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-06-20 12:52:43 +02:00
Ingo Molnar	902b319413	Merge branch 'WIP.sched/core' into sched/core Conflicts: kernel/sched/Makefile Pick up the waitqueue related renames - it didn't get much feedback, so it appears to be uncontroversial. Famous last words? ;-) Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-06-20 12:28:21 +02:00
Borislav Petkov	803ff8a7a6	x86/hpet: Do not use smp_processor_id() in preemptible code When hpet=force is supplied on the kernel command line and the HPET supports the Legacy Replacement Interrupt Route option (HPET_ID_LEGSUP), the legacy interrupts init code uses the boot CPU's mask initially by calling smp_processor_id() assuming that it is running on the BSP. It does run on the BSP but the code region is preemptible and the preemption check fires. Simply use the BSP's id directly to avoid the warning. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/20170620093154.18472-1-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-06-20 12:23:26 +02:00
Hugh Dickins	1be7107fbe	mm: larger stack guard gap, between vmas Stack guard page is a useful feature to reduce a risk of stack smashing into a different mapping. We have been using a single page gap which is sufficient to prevent having stack adjacent to a different mapping. But this seems to be insufficient in the light of the stack usage in userspace. E.g. glibc uses as large as 64kB alloca() in many commonly used functions. Others use constructs liks gid_t buffer[NGROUPS_MAX] which is 256kB or stack strings with MAX_ARG_STRLEN. This will become especially dangerous for suid binaries and the default no limit for the stack size limit because those applications can be tricked to consume a large portion of the stack and a single glibc call could jump over the guard page. These attacks are not theoretical, unfortunatelly. Make those attacks less probable by increasing the stack guard gap to 1MB (on systems with 4k pages; but make it depend on the page size because systems with larger base pages might cap stack allocations in the PAGE_SIZE units) which should cover larger alloca() and VLA stack allocations. It is obviously not a full fix because the problem is somehow inherent, but it should reduce attack space a lot. One could argue that the gap size should be configurable from userspace, but that can be done later when somebody finds that the new 1MB is wrong for some special case applications. For now, add a kernel command line option (stack_guard_gap) to specify the stack gap size (in page units). Implementation wise, first delete all the old code for stack guard page: because although we could get away with accounting one extra page in a stack vma, accounting a larger gap can break userspace - case in point, a program run with "ulimit -S -v 20000" failed when the 1MB gap was counted for RLIMIT_AS; similar problems could come with RLIMIT_MLOCK and strict non-overcommit mode. Instead of keeping gap inside the stack vma, maintain the stack guard gap as a gap between vmas: using vm_start_gap() in place of vm_start (or vm_end_gap() in place of vm_end if VM_GROWSUP) in just those few places which need to respect the gap - mainly arch_get_unmapped_area(), and and the vma tree's subtree_gap support for that. Original-patch-by: Oleg Nesterov <oleg@redhat.com> Original-patch-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Tested-by: Helge Deller <deller@gmx.de> # parisc Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-06-19 21:50:20 +08:00
Shaohua Li	7304e8f28b	iommu/vt-d: Correctly disable Intel IOMMU force on I made a mistake in commit `bfd20f1`. We should skip the force on with the option enabled instead of vice versa. Not sure why this passed our performance test, sorry. Fixes: `bfd20f1cc8` ('x86, iommu/vt-d: Add an option to disable Intel IOMMU force on') Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2017-06-15 16:41:10 +02:00
Yazen Ghannam	6057077f6e	x86/mce: Update bootlog description to reflect behavior on AMD The bootlog option is only disabled by default on AMD Fam10h and older systems. Update bootlog description to say this. Change the family value to hex to avoid confusion. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170613162835.30750-9-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-06-14 07:32:10 +02:00
Yazen Ghannam	ec33838244	x86/mce: Don't disable MCA banks when offlining a CPU on AMD AMD systems have non-core, shared MCA banks within a die. These banks are controlled by a master CPU per die. If this CPU is offlined then all the shared banks are disabled in addition to the CPU's core banks. Also, Fam17h systems may have SMT enabled. The MCA_CTL register is shared between SMT thread siblings. If a CPU is offlined then all its sibling's MCA banks are also disabled. Extend the existing vendor check to AMD too. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> [ Fix up comment. ] Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170613162835.30750-8-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-06-14 07:32:09 +02:00
Borislav Petkov	86d2eac5a7	x86/mce/mce-inject: Preset the MCE injection struct Populate the MCE injection struct before doing initial injection so that values which don't change have sane defaults. Tested-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Yazen Ghannam <yazen.ghannam@amd.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/20170613162835.30750-7-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-06-14 07:32:09 +02:00
Borislav Petkov	5c99881b33	x86/mce: Clean up include files Not really needed. Tested-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Yazen Ghannam <yazen.ghannam@amd.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/20170613162835.30750-6-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-06-14 07:32:08 +02:00
Borislav Petkov	fbe9ff9eaf	x86/mce: Get rid of register_mce_write_callback() Make the mcelog call a notifier which lands in the injector module and does the injection. This allows for mce-inject to be a normal kernel module now. Tested-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Yazen Ghannam <yazen.ghannam@amd.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/20170613162835.30750-5-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-06-14 07:32:07 +02:00
Borislav Petkov	bc8e80d56c	x86/mce: Merge mce_amd_inj into mce-inject Reuse mce_amd_inj's debugfs interface so that mce-inject can benefit from it too. The old functionality is still preserved under CONFIG_X86_MCELOG_LEGACY. Tested-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Yazen Ghannam <yazen.ghannam@amd.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/20170613162835.30750-4-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-06-14 07:32:07 +02:00
Yazen Ghannam	17ef4af0ec	x86/mce/AMD: Use saved threshold block info in interrupt handler In the amd_threshold_interrupt() handler, we loop through every possible block in each bank and rediscover the block's address and if it's valid, e.g. valid, counter present and not locked. However, we already have the address saved in the threshold blocks list for each CPU and bank. The list only contains blocks that have passed all the valid checks. Besides the redundancy, there's also a smp_call_function* in get_block_address() which causes a warning when servicing the interrupt: WARNING: CPU: 0 PID: 0 at kernel/smp.c:281 smp_call_function_single+0xdd/0xf0 ... Call Trace: <IRQ> rdmsr_safe_on_cpu() get_block_address.isra.2() amd_threshold_interrupt() smp_threshold_interrupt() threshold_interrupt() because we do get called in an interrupt handler with interrupts disabled, which can result in a deadlock. Drop the redundant valid checks and move the overflow check, logging and block reset into a separate function. Check the first block then iterate over the rest. This procedure is needed since the first block is used as the head of the list. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170613162835.30750-3-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-06-14 07:32:06 +02:00
Yazen Ghannam	a24b8c3409	x86/mce/AMD: Use msr_stat when clearing MCA_STATUS The value of MCA_STATUS is used as the MSR when clearing MCA_STATUS. This may cause the following warning: unchecked MSR access error: WRMSR to 0x11b (tried to write 0x0000000000000000) Call Trace: <IRQ> smp_threshold_interrupt() threshold_interrupt() Use msr_stat instead which has the MSR address. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Fixes: `37d43acfd7` ("x86/mce/AMD: Redo error logging from APIC LVT interrupt handlers") Link: http://lkml.kernel.org/r/20170613162835.30750-2-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-06-14 07:32:06 +02:00
Ingo Molnar	10b90ee2ec	Linux 4.12-rc5 -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJZPdbLAAoJEHm+PkMAQRiGx4wH/1nCjfnl6fE8oJ24/1gEAOUh biFdqJkYZmlLYHVtYfLm4Ueg4adJdg0wx6qM/4RaAzmQVvLfDV34bc1qBf1+P95G kVF+osWyXrZo5cTwkwapHW/KNu4VJwAx2D1wrlxKDVG5AOrULH1pYOYGOpApEkZU 4N+q5+M0ce0GJpqtUZX+UnI33ygjdDbBxXoFKsr24B7eA0ouGbAJ7dC88WcaETL+ 2/7tT01SvDMo0jBSV0WIqlgXwZ5gp3yPGnklC3F4159Yze6VFrzHMKS/UpPF8o8E W9EbuzwxsKyXUifX2GY348L1f+47glen/1sedbuKnFhP6E9aqUQQJXvEO7ueQl4= =m2Gx -----END PGP SIGNATURE----- Merge tag 'v4.12-rc5' into ras/core, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-06-14 07:31:46 +02:00
Kirill A. Shutemov	032370b9c8	x86/boot/64: Add support of additional page table level during early boot This patch adds support for 5-level paging during early boot. It generalizes boot for 4- and 5-level paging on 64-bit systems with compile-time switch between them. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170606113133.22974-10-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-06-13 08:56:55 +02:00
Kirill A. Shutemov	65ade2f872	x86/boot/64: Rename init_level4_pgt and early_level4_pgt With CONFIG_X86_5LEVEL=y, level 4 is no longer top level of page tables. Let's give these variable more generic names: init_top_pgt and early_top_pgt. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170606113133.22974-9-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-06-13 08:56:55 +02:00
Kirill A. Shutemov	c88d71508e	x86/boot/64: Rewrite startup_64() in C The patch write most of startup_64 logic in C. This is preparation for 5-level paging enabling. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170606113133.22974-8-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-06-13 08:56:54 +02:00
Andy Lutomirski	6c690ee103	x86/mm: Split read_cr3() into read_cr3_pa() and __read_cr3() The kernel has several code paths that read CR3. Most of them assume that CR3 contains the PGD's physical address, whereas some of them awkwardly use PHYSICAL_PAGE_MASK to mask off low bits. Add explicit mask macros for CR3 and convert all of the CR3 readers. This will keep them from breaking when PCID is enabled. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: xen-devel <xen-devel@lists.xen.org> Link: http://lkml.kernel.org/r/883f8fb121f4616c1c1427ad87350bb2f5ffeca1.1497288170.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-06-13 08:48:09 +02:00
Ingo Molnar	3f365cf304	Merge branch 'sched/urgent' into x86/mm, to pick up dependent fix Andy will need the following scheduler fix for the PCID series: `252d2a4117`: sched/core: Idle_task_exit() shouldn't use switch_mm_irqs_off() So do a cross-merge. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-06-13 08:47:22 +02:00
Dou Liyang	b1b4f2fe68	x86/time: Make setup_default_timer_irq() static This function isn't used outside of time.c, so let's mark it static. Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1497321029-29049-1-git-send-email-douly.fnst@cn.fujitsu.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-06-13 08:42:09 +02:00
Peter Zijlstra	8a524f803a	x86/debug: Handle early WARN_ONs proper Hans managed to trigger a WARN very early in the boot which killed his (Virtual) box. The reason is that the recent rework of WARN() to use UD0 forgot to add the fixup_bug() call to early_fixup_exception(). As a result the kernel does not handle the WARN_ON injected UD0 exception and panics. Add the missing fixup call, so early UD's injected by WARN() get handled. Fixes: `9a93848fe7` ("x86/debug: Implement __WARN() using UD0") Reported-and-tested-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Frank Mehnert <frank.mehnert@oracle.com> Cc: Hans de Goede <hdegoede@redhat.com> Cc: Michael Thayer <michael.thayer@oracle.com> Link: http://lkml.kernel.org/r/20170612180108.w4vgu2ckucmllf3a@hirez.programming.kicks-ass.net	2017-06-12 21:17:48 +02:00
Linus Torvalds	9d0eb46246	Bug fixes (ARM, s390, x86) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJZPOhfAAoJEL/70l94x66DNgYH/i1pbBmsxeYxWEoScLDTVYZa gLKHpjpciCBcKdMGSuBXeP702pi85N/TG7M2XIT/nXmFHYsie9RUSWXK63IZxpPx 7wRNocstQU+DyMAP3pagJIFPnhUT9ufCZYFDin8sQoh1Dk1xbV38WaUPb/YfPYIt xGyti2SzT/CiBOR5zQiNb8m8k+M19QGzjcglHRq5Uk/oSElaw635M6u3PZD0Zvtd 9L3NhtDYp1tUpG2Rc/5fXX651BVl+J6+xMukuAJBSqcI2hNfe0oM7tTi6larPHPo eguW0ChU2qat7Gjob18oKlLAMGwPHt+p7utH2pju5UTWFdPBY2lxUgv5Yfd3Qzo= =C/gM -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM fixes from Paolo Bonzini: "Bug fixes (ARM, s390, x86)" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: async_pf: avoid async pf injection when in guest mode KVM: cpuid: Fix read/write out-of-bounds vulnerability in cpuid emulation arm: KVM: Allow unaligned accesses at HYP arm64: KVM: Allow unaligned accesses at EL2 arm64: KVM: Preserve RES1 bits in SCTLR_EL2 KVM: arm/arm64: Handle possible NULL stage2 pud when ageing pages KVM: nVMX: Fix exception injection kvm: async_pf: fix rcu_irq_enter() with irqs enabled KVM: arm/arm64: vgic-v3: Fix nr_pre_bits bitfield extraction KVM: s390: fix ais handling vs cpu model KVM: arm/arm64: Fix isues with GICv2 on GICv3 migration	2017-06-11 11:07:25 -07:00
Dominik Brodowski	5b0bc9ac2c	x86/microcode/intel: Clear patch pointer before jettisoning the initrd During early boot, load_ucode_intel_ap() uses __load_ucode_intel() to obtain a pointer to the relevant microcode patch (embedded in the initrd), and stores this value in 'intel_ucode_patch' to speed up the microcode patch application for subsequent CPUs. On resuming from suspend-to-RAM, however, load_ucode_ap() calls load_ucode_intel_ap() for each non-boot-CPU. By then the initramfs is long gone so the pointer stored in 'intel_ucode_patch' no longer points to a valid microcode patch. Clear that pointer so that we effectively fall back to the CPU hotplug notifier callbacks to update the microcode. Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> [ Edit and massage commit message. ] Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> # 4.10.. Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170607095819.9754-1-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-06-08 10:03:05 +02:00
Borislav Petkov	bbf79d21bd	x86/ldt: Rename ldt_struct::size to ::nr_entries ... because this is exactly what it is: the number of entries in the LDT. Calling it "size" is simply confusing and it is actually begging to be called "nr_entries" or somesuch, especially if you see constructs like: alloc_size = size * LDT_ENTRY_SIZE; since LDT_ENTRY_SIZE is the size of a single entry. There should be no functionality change resulting from this patch, as the before/after output from tools/testing/selftests/x86/ldt_gdt.c shows. Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Andy Lutomirski <luto@amacapital.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170606173116.13977-1-bp@alien8.de [ Renamed 'n_entries' to 'nr_entries' ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-06-08 09:28:21 +02:00
Paolo Bonzini	bbaf0e2b1c	kvm: async_pf: fix rcu_irq_enter() with irqs enabled native_safe_halt enables interrupts, and you just shouldn't call rcu_irq_enter() with interrupts enabled. Reorder the call with the following local_irq_disable() to respect the invariant. Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2017-06-06 14:43:16 +02:00
Andy Lutomirski	3d28ebceaf	x86/mm: Rework lazy TLB to track the actual loaded mm Lazy TLB state is currently managed in a rather baroque manner. AFAICT, there are three possible states: - Non-lazy. This means that we're running a user thread or a kernel thread that has called use_mm(). current->mm == current->active_mm == cpu_tlbstate.active_mm and cpu_tlbstate.state == TLBSTATE_OK. - Lazy with user mm. We're running a kernel thread without an mm and we're borrowing an mm_struct. We have current->mm == NULL, current->active_mm == cpu_tlbstate.active_mm, cpu_tlbstate.state != TLBSTATE_OK (i.e. TLBSTATE_LAZY or 0). The current cpu is set in mm_cpumask(current->active_mm). CR3 points to current->active_mm->pgd. The TLB is up to date. - Lazy with init_mm. This happens when we call leave_mm(). We have current->mm == NULL, current->active_mm == cpu_tlbstate.active_mm, but that mm is only relelvant insofar as the scheduler is tracking it for refcounting. cpu_tlbstate.state != TLBSTATE_OK. The current cpu is clear in mm_cpumask(current->active_mm). CR3 points to swapper_pg_dir, i.e. init_mm->pgd. This patch simplifies the situation. Other than perf, x86 stops caring about current->active_mm at all. We have cpu_tlbstate.loaded_mm pointing to the mm that CR3 references. The TLB is always up to date for that mm. leave_mm() just switches us to init_mm. There are no longer any special cases for mm_cpumask, and switch_mm() switches mms without worrying about laziness. After this patch, cpu_tlbstate.state serves only to tell the TLB flush code whether it may switch to init_mm instead of doing a normal flush. This makes fairly extensive changes to xen_exit_mmap(), which used to look a bit like black magic. Perf is unchanged. With or without this change, perf may behave a bit erratically if it tries to read user memory in kernel thread context. We should build on this patch to teach perf to never look at user memory when cpu_tlbstate.loaded_mm != current->mm. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Nadav Amit <namit@vmware.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-06-05 09:59:44 +02:00
Christian Sünkenberg	ae1d557d8f	x86/cpu/cyrix: Add alternative Device ID of Geode GX1 SoC A SoC variant of Geode GX1, notably NSC branded SC1100, seems to report an inverted Device ID in its DIR0 configuration register, specifically 0xb instead of the expected 0x4. Catch this presumably quirky version so it's properly recognized as GX1 and has its cache switched to write-back mode, which provides a significant performance boost in most workloads. SC1100's datasheet "Geode™ SC1100 Information Appliance On a Chip", states in section 1.1.7.1 "Device ID" that device identification values are specified in SC1100's device errata. These, however, seem to not have been publicly released. Wading through a number of boot logs and /proc/cpuinfo dumps found on pastebin and blogs, this patch should mostly be relevant for a number of now admittedly aging Soekris NET4801 and PC Engines WRAP devices, the latter being the platform this issue was discovered on. Performance impact was verified using "openssl speed", with write-back caching scaling throughput between -3% and +41%. Signed-off-by: Christian Sünkenberg <christian.suenkenberg@student.kit.edu> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1496596719.26725.14.camel@student.kit.edu Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-06-05 08:34:20 +02:00
Peter Zijlstra	855615eee9	x86/tsc: Remove the TSC_ADJUST clamp Now that all affected platforms have a microcode update; and we check this and disable TSC_DEADLINE and print a microcode revision update error if its too old, we can remove the TSC_ADJUST clamp. This should help with systems where the second socket runs ahead of the first socket and needs a negative adjustment. In this case we'd hit the 0 clamp and give up for not achieving synchronization. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: kevin.b.stanton@intel.com Link: http://lkml.kernel.org/r/20170531155306.100950003@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-06-04 21:55:53 +02:00
Peter Zijlstra	bd9240a18e	x86/apic: Add TSC_DEADLINE quirk due to errata Due to errata it is possible for the TSC_DEADLINE timer to misbehave after using TSC_ADJUST. A microcode update is available to fix this situation. Avoid using the TSC_DEADLINE timer if it is affected by this issue and report the required microcode version. [ tglx: Renamed function to apic_check_deadline_errata() ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: kevin.b.stanton@intel.com Link: http://lkml.kernel.org/r/20170531155306.050849877@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-06-04 21:55:53 +02:00
Peter Zijlstra	c6e9f42bbe	x86/apic: Change the lapic name in deadline mode So that we can more easily see in what mode the lapic timer operates. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: kevin.b.stanton@intel.com Link: http://lkml.kernel.org/r/20170531155305.989808008@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-06-04 21:55:52 +02:00
Borislav Petkov	5d9070b1f0	x86/debug/32: Convert a smp_processor_id() call to raw to avoid DEBUG_PREEMPT warning ... to raw_smp_processor_id() to not trip the BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1 check. The reasoning behind it is that __warn() already uses the raw_ variants but the show_regs() path on 32-bit doesn't. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170528092212.fiod7kygpjm23m3o@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-05-29 08:22:49 +02:00
Borislav Petkov	dac6ca243c	x86/microcode/AMD: Change load_microcode_amd()'s param to bool to fix preemptibility bug With CONFIG_DEBUG_PREEMPT enabled, I get: BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1 caller is debug_smp_processor_id CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.12.0-rc2+ #2 Call Trace: dump_stack check_preemption_disabled debug_smp_processor_id save_microcode_in_initrd_amd ? microcode_init save_microcode_in_initrd ... because, well, it says it above, we're using smp_processor_id() in preemptible code. But passing the CPU number is not really needed. It is only used to determine whether we're on the BSP, and, if so, to save the microcode patch for early loading. [ We don't absolutely need to do it on the BSP but we do that customarily there. ] Instead, convert that function parameter to a boolean which denotes whether the patch should be saved or not, thereby avoiding the use of smp_processor_id() in preemptible code. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170528200414.31305-1-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-05-29 08:22:48 +02:00
Linus Torvalds	38e6bf238d	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A series of fixes for X86: - The final fix for the end-of-stack issue in the unwinder - Handle non PAT systems gracefully - Prevent access to uninitiliazed memory - Move early delay calaibration after basic init - Fix Kconfig help text - Fix a cross compile issue - Unbreak older make versions" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/timers: Move simple_udelay_calibration past init_hypervisor_platform x86/alternatives: Prevent uninitialized stack byte read in apply_alternatives() x86/PAT: Fix Xorg regression on CPUs that don't support PAT x86/watchdog: Fix Kconfig help text file path reference to lockup watchdog documentation x86/build: Permit building with old make versions x86/unwind: Add end-of-stack check for ftrace handlers Revert "x86/entry: Fix the end of the stack for newly forked tasks" x86/boot: Use CROSS_COMPILE prefix for readelf	2017-05-27 09:17:58 -07:00
Linus Torvalds	de0b9d751b	Merge branch 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS fixes from Thomas Gleixner: "Two fixlets for RAS: - Export memory_error() so the NFIT module can utilize it - Handle memory errors in NFIT correctly" * 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: acpi, nfit: Fix the memory error check in nfit_handle_mce() x86/MCE: Export memory_error()	2017-05-27 09:06:43 -07:00
Thomas Gleixner	6ee98ffeea	x86/ftrace: Make sure that ftrace trampolines are not RWX ftrace use module_alloc() to allocate trampoline pages. The mapping of module_alloc() is RWX, which makes sense as the memory is written to right after allocation. But nothing makes these pages RO after writing to them. Add proper set_memory_rw/ro() calls to protect the trampolines after modification. Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1705251056410.1862@nanos Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-05-26 22:37:02 -04:00
Masami Hiramatsu	c93f5cf571	kprobes/x86: Fix to set RWX bits correctly before releasing trampoline Fix kprobes to set(recover) RWX bits correctly on trampoline buffer before releasing it. Releasing readonly page to module_memfree() crash the kernel. Without this fix, if kprobes user register a bunch of kprobes in function body (since kprobes on function entry usually use ftrace) and unregister it, kernel hits a BUG and crash. Link: http://lkml.kernel.org/r/149570868652.3518.14120169373590420503.stgit@devbox Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Fixes: `d0381c81c2` ("kprobes/x86: Set kprobes pages read-only") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-05-26 22:37:00 -04:00
Matthias Kaehlcke	9df8109fd7	x86/ioapic: Remove unused IO_APIC_irq_trigger() function The function isn't used since commit: `5ad274d41c` ("x86/irq: Remove unused old IOAPIC irqdomain interfaces") Removing it fixes the following warning when building with clang: arch/x86/kernel/apic/io_apic.c:1219:19: error: unused function 'IO_APIC_irq_trigger' [-Werror,-Wunused-function] Signed-off-by: Matthias Kaehlcke <mka@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Douglas Anderson <dianders@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170522232035.187985-1-mka@chromium.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-05-26 14:37:41 +02:00
Jan Kiszka	702644ec1c	x86/timers: Move simple_udelay_calibration past init_hypervisor_platform This ensures that adjustments to x86_platform done by the hypervisor setup is already respected by this simple calibration. The current user of this, introduced by `1b5aeebf3a` ("x86/earlyprintk: Add support for earlyprintk via USB3 debug port"), comes much later into play. Fixes: `dd759d93f4` ("x86/timers: Add simple udelay calibration") Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Link: http://lkml.kernel.org/r/5e89fe60-aab3-2c1c-aba8-32f8ad376189@siemens.com	2017-05-26 13:04:09 +02:00
Thomas Gleixner	f2545b2d4c	jump_label: Reorder hotplug lock and jump_label_lock The conversion of the hotplug locking to a percpu rwsem unearthed lock ordering issues all over the place. The jump_label code has two issues: 1) Nested get_online_cpus() invocations 2) Ordering problems vs. the cpus rwsem and the jump_label_mutex To cure these, the following lock order has been established; cpus_rwsem -> jump_label_lock -> text_mutex Even if not all architectures need protection against CPU hotplug, taking cpus_rwsem before jump_label_lock is now mandatory in code pathes which actually modify code and therefor need text_mutex protection. Move the get_online_cpus() invocations into the core jump label code and establish the proper lock order where required. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: "David S. Miller" <davem@davemloft.net> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Jason Baron <jbaron@akamai.com> Cc: Ralf Baechle <ralf@linux-mips.org> Link: http://lkml.kernel.org/r/20170524081549.025830817@linutronix.de	2017-05-26 10:10:45 +02:00
Sebastian Andrzej Siewior	547efeadd4	x86/mtrr: Remove get_online_cpus() from mtrr_save_state() mtrr_save_state() is invoked from native_cpu_up() which is in the context of a CPU hotplug operation and therefor calling get_online_cpus() is pointless. While this works in the current get_online_cpus() implementation it prevents from converting the hotplug locking to percpu rwsems. Remove it. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170524081547.651378834@linutronix.de	2017-05-26 10:10:38 +02:00
Mateusz Jurczyk	fc152d22d6	x86/alternatives: Prevent uninitialized stack byte read in apply_alternatives() In the current form of the code, if a->replacementlen is 0, the reference to *insnbuf for comparison touches potentially garbage memory. While it doesn't affect the execution flow due to the subsequent a->replacementlen comparison, it is (rightly) detected as use of uninitialized memory by a runtime instrumentation currently under my development, and could be detected as such by other tools in the future, too (e.g. KMSAN). Fix the "false-positive" by reordering the conditions to first check the replacement instruction length before referencing specific opcode bytes. Signed-off-by: Mateusz Jurczyk <mjurczyk@google.com> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Link: http://lkml.kernel.org/r/20170524135500.27223-1-mjurczyk@google.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-05-24 16:18:12 +02:00
Josh Poimboeuf	519fb5c335	x86/unwind: Add end-of-stack check for ftrace handlers Dave Jones and Steven Rostedt reported unwinder warnings like the following: WARNING: kernel stack frame pointer at ffff8800bda0ff30 in sshd:1090 has bad value 000055b32abf1fa8 In both cases, the unwinder was attempting to unwind from an ftrace handler into entry code. The callchain was something like: syscall entry code C function ftrace handler save_stack_trace() The problem is that the unwinder's end-of-stack logic gets confused by the way ftrace lays out the stack frame (with fentry enabled). I was able to recreate this warning with: echo call_usermodehelper_exec_async:stacktrace > /sys/kernel/debug/tracing/set_ftrace_filter (exit login session) I considered fixing this by changing the ftrace code to rewrite the stack to make the unwinder happy. But that seemed too intrusive after I implemented it. Instead, just add another check to the unwinder's end-of-stack logic to detect this special case. Side note: We could probably get rid of these end-of-stack checks by encoding the frame pointer for syscall entry just like we do for interrupt entry. That would be simpler, but it would also be a lot more intrusive since it would slightly affect the performance of every syscall. Reported-by: Dave Jones <davej@codemonkey.org.uk> Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: live-patching@vger.kernel.org Fixes: `c32c47c68a` ("x86/unwind: Warn on bad frame pointer") Link: http://lkml.kernel.org/r/671ba22fbc0156b8f7e0cfa5ab2a795e08bc37e1.1495553739.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-05-24 09:05:16 +02:00
Arnd Bergmann	5c3c2ea688	x86/tsc: Fold set_cyc2ns_scale() into caller The newly introduced wrapper function only has one caller, and this one is conditional, causing a harmless warning when CONFIG_CPU_FREQ is disabled: arch/x86/kernel/tsc.c:189:13: error: 'set_cyc2ns_scale' defined but not used [-Werror=unused-function] My first idea was to move the wrapper inside of that #ifdef, but on second thought it seemed nicer to remove it completely again and rename __set_cyc2ns_scale back to set_cyc2ns_scale, but leaving the extra argument. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: `615cd03373` ("x86/tsc: Fix sched_clock() sync") Link: http://lkml.kernel.org/r/20170517203949.2052220-1-arnd@arndb.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-05-23 10:11:04 +02:00
Thomas Gleixner	719b3680d1	x86/smp: Adjust system_state check To enable smp_processor_id() and might_sleep() debug checks earlier, it's required to add system states between SYSTEM_BOOTING and SYSTEM_RUNNING. Adjust the system_state check in announce_cpu() to handle the extra states. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20170516184735.191715856@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-05-23 10:01:35 +02:00
Ingo Molnar	386b554888	Merge branch 'linus' into sched/core, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-05-23 09:50:35 +02:00
Yazen Ghannam	84bcc1d57f	x86/mce/AMD: Carve out SMCA bank configuration Scalable MCA systems have a new MCA_CONFIG register that we use to configure each bank. We currently use this when we set up thresholding. However, this is logically separate. Group all SMCA-related initialization into a single function. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1493147772-2721-2-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-05-21 21:55:13 +02:00
Yazen Ghannam	37d43acfd7	x86/mce/AMD: Redo error logging from APIC LVT interrupt handlers We have support for the new SMCA MCA_DE{STAT,ADDR} registers in Linux. So we've used these registers in place of MCA_{STATUS,ADDR} on SMCA systems. However, the guidance for current SMCA implementations of is to continue using MCA_{STATUS,ADDR} and to use MCA_DE{STAT,ADDR} only if a Deferred error was not found in the former registers. If we logged a Deferred error in MCA_STATUS then we should also clear MCA_DESTAT. This also means we shouldn't clear MCA_CONFIG[LogDeferredInMcaStat]. Rework __log_error() to only log an error and add helpers for the different error types being logged from the corresponding interrupt handlers. Boris: carve out common functionality into a _log_error_bank(). Cleanup comments, check MCi_STATUS bits before reading MSRs. Streamline flow. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1493147772-2721-1-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-05-21 21:55:13 +02:00
Elena Reshetova	473e90b2e8	x86/mce: Convert threshold_bank.cpus from atomic_t to refcount_t The refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Suggested-by: Kees Cook <keescook@chromium.org> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com> Reviewed-by: David Windsor <dwindsor@gmail.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1492695536-5947-1-git-send-email-elena.reshetova@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-05-21 21:55:13 +02:00
Borislav Petkov	2d1f406139	x86/MCE: Export memory_error() Export the function which checks whether an MCE is a memory error to other users so that we can reuse the logic. Drop the boot_cpu_data use, while at it, as mce.cpuvendor already has the CPU vendor in there. Integrate a piece from a patch from Vishal Verma <vishal.l.verma@intel.com> to export it for modules (nfit). The main reason we're exporting it is that the nfit handler nfit_handle_mce() needs to detect a memory error properly before doing its recovery actions. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20170519093915.15413-2-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-05-21 21:39:58 +02:00
Wanpeng Li	a575813bfe	KVM: x86: Fix load damaged SSEx MXCSR register Reported by syzkaller: BUG: unable to handle kernel paging request at ffffffffc07f6a2e IP: report_bug+0x94/0x120 PGD 348e12067 P4D 348e12067 PUD 348e14067 PMD 3cbd84067 PTE 80000003f7e87161 Oops: 0003 [#1] SMP CPU: 2 PID: 7091 Comm: kvm_load_guest_ Tainted: G OE 4.11.0+ #8 task: ffff92fdfb525400 task.stack: ffffbda6c3d04000 RIP: 0010:report_bug+0x94/0x120 RSP: 0018:ffffbda6c3d07b20 EFLAGS: 00010202 do_trap+0x156/0x170 do_error_trap+0xa3/0x170 ? kvm_load_guest_fpu.part.175+0x12a/0x170 [kvm] ? mark_held_locks+0x79/0xa0 ? retint_kernel+0x10/0x10 ? trace_hardirqs_off_thunk+0x1a/0x1c do_invalid_op+0x20/0x30 invalid_op+0x1e/0x30 RIP: 0010:kvm_load_guest_fpu.part.175+0x12a/0x170 [kvm] ? kvm_load_guest_fpu.part.175+0x1c/0x170 [kvm] kvm_arch_vcpu_ioctl_run+0xed6/0x1b70 [kvm] kvm_vcpu_ioctl+0x384/0x780 [kvm] ? kvm_vcpu_ioctl+0x384/0x780 [kvm] ? sched_clock+0x13/0x20 ? __do_page_fault+0x2a0/0x550 do_vfs_ioctl+0xa4/0x700 ? up_read+0x1f/0x40 ? __do_page_fault+0x2a0/0x550 SyS_ioctl+0x79/0x90 entry_SYSCALL_64_fastpath+0x23/0xc2 SDM mentioned that "The MXCSR has several reserved bits, and attempting to write a 1 to any of these bits will cause a general-protection exception(#GP) to be generated". The syzkaller forks' testcase overrides xsave area w/ random values and steps on the reserved bits of MXCSR register. The damaged MXCSR register values of guest will be restored to SSEx MXCSR register before vmentry. This patch fixes it by catching userspace override MXCSR register reserved bits w/ random values and bails out immediately. Reported-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2017-05-15 16:08:56 +02:00
Peter Zijlstra	ac1e843f09	sched/clock: Remove unused argument to sched_clock_idle_wakeup_event() The argument to sched_clock_idle_wakeup_event() has not been used in a long time. Remove it. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-05-15 10:15:18 +02:00
Peter Zijlstra	b421b22b00	x86/tsc, sched/clock, clocksource: Use clocksource watchdog to provide stable sync points Currently we keep sched_clock_tick() active for stable TSC in order to keep the per-CPU state semi up-to-date. The (obvious) problem is that by the time we detect TSC is borked, our per-CPU state is also borked. So hook into the clocksource watchdog and call a method after we've found it to still be stable. There's the obvious race where the TSC goes wonky between finding it stable and us running the callback, but closing that is too much work and not really worth it, since we're already detecting TSC wobbles after the fact, so we cannot, per definition, fully avoid funny clock values. And since the watchdog runs less often than the tick, this is also an optimization. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-05-15 10:15:18 +02:00
Peter Zijlstra	aa7b630ea0	x86/tsc: Feed refined TSC calibration into sched_clock() For the (older) CPUs that still need the refined TSC calibration, also update the sched_clock() rate. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-05-15 10:15:16 +02:00
Peter Zijlstra	615cd03373	x86/tsc: Fix sched_clock() sync While looking through the code I noticed that we initialize the cyc2ns fields with a different cycle value for each CPU, resulting in a slightly different 0 point for each CPU. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-05-15 10:15:16 +02:00
Peter Zijlstra	59eaef78bf	x86/tsc: Remodel cyc2ns to use seqcount_latch() Replace the custom multi-value scheme with the more regular seqcount_latch() scheme. Along with scrapping a lot of lines, the latch scheme is better documented and used in more places. The immediate benefit however is not being limited on the update side. The current code has a limit where the writers block which is hit by future changes. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-05-15 10:15:15 +02:00
Peter Zijlstra	8309f86cd4	x86/tsc: Provide 'tsc=unstable' boot parameter Since the clocksource watchdog will only detect broken TSC after the fact, all TSC based clocks will likely have observed non-continuous values before/when switching away from TSC. Therefore only thing to fully avoid random clock movement when your BIOS randomly mucks with TSC values from SMI handlers is reporting the TSC as unstable at boot. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-05-15 10:15:14 +02:00
Andrew Morton	cea582247a	Tigran has moved Cc: Tigran Aivazian <aivazian.tigran@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-05-12 15:57:15 -07:00
Linus Torvalds	f1e0527d2d	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes: - two boot crash fixes - unwinder fixes - kexec related kernel direct mappings enhancements/fixes - more Clang support quirks - minor cleanups - Documentation fixes" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/intel_rdt: Fix a typo in Documentation x86/build: Don't add -maccumulate-outgoing-args w/o compiler support x86/boot/32: Fix UP boot on Quark and possibly other platforms x86/mm/32: Set the '__vmalloc_start_set' flag in initmem_init() x86/kexec/64: Use gbpages for identity mappings if available x86/mm: Add support for gbpages to kernel_ident_mapping_init() x86/boot: Declare error() as noreturn x86/mm/kaslr: Use the _ASM_MUL macro for multiplication to work around Clang incompatibility x86/mm: Fix boot crash caused by incorrect loop count calculation in sync_global_pgds() x86/asm: Don't use RBP as a temporary register in csum_partial_copy_generic() x86/microcode/AMD: Remove redundant NULL check on mc	2017-05-12 10:11:50 -07:00
Linus Torvalds	5836e422e5	xen: fixes for 4.12-rc0 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAABAgAGBQJZFWTzAAoJELDendYovxMv24cIAJ3U2OZ64d7WTKD37AT2O6nF 6R3j+zJ6apoKX4zHvhWUOHZ6jpTASTnaisiIskVc52JcgAK0f8ZYTg5nhyWPceAD Icf+JuXrI6uplD97qsjt7X9FbxUsRZninfsznoBkK6P8Cw8ZWlWIWIl6e3CrVwBD geyKcbsKkVG8+bMjWvmQd94CFq5r8Ivup0sCECumx0lqw3RhxdhQvUix9eBULEoG h/XAuPbMupdjU6phgqG4rvUjWd/R+9mIIDG1oY9Kpx4Kpn/7bHtoYZ//Qzs8bmuP 5ORujOedshdyAZqLGxQuQzo+/4E9gX3qVbaS6fPf1Ab+ra0k/iWtetUITZ0v2AQ= =gWpG -----END PGP SIGNATURE----- Merge tag 'for-linus-4.12b-rc0c-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: "This contains two fixes for booting under Xen introduced during this merge window and two fixes for older problems, where one is just much more probable due to another merge window change" * tag 'for-linus-4.12b-rc0c-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: adjust early dom0 p2m handling to xen hypervisor behavior x86/amd: don't set X86_BUG_SYSRET_SS_ATTRS when running under Xen xen/x86: Do not call xen_init_time_ops() until shared_info is initialized x86/xen: fix xsave capability setting	2017-05-12 10:09:14 -07:00
Juergen Gross	def9331a12	x86/amd: don't set X86_BUG_SYSRET_SS_ATTRS when running under Xen When running as Xen pv guest X86_BUG_SYSRET_SS_ATTRS must not be set on AMD cpus. This bug/feature bit is kind of special as it will be used very early when switching threads. Setting the bit and clearing it a little bit later leaves a critical window where things can go wrong. This time window has enlarged a little bit by using setup_clear_cpu_cap() instead of the hypervisor's set_cpu_features callback. It seems this larger window now makes it rather easy to hit the problem. The proper solution is to never set the bit in case of Xen. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Juergen Gross <jgross@suse.com>	2017-05-11 15:55:14 +02:00
Linus Torvalds	556d994a75	RTC for 4.12 Subsystem: - Add OF device ID table for i2c drivers New driver: - Motorola CPCAP PMIC RTC Drivers: - cmos: fix IRQ selection - ds1307: Add ST m41t0 support - ds1374: fix watchdog configuration - sh: Add rza series support -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEXx9Viay1+e7J/aM4AyWl4gNJNJIFAlkTf2sACgkQAyWl4gNJ NJLPvw//dGzo6oD3C96QIurfrgFx9512ZurEiJpGPIO15obTVLF0SNuswaMj7knm ezqQ23qX9VBEmu3si7LvkQVbE60giB3XnlJ/wpFi/LhtlM7SQ4o2Z8Go3rkL8tCw iPcj5l3ShbHgSF+TBK+jK5C/8ahR7RE32l2rtSi9xwzxOmKRySmSWg2iGmGJMNUU 7UHR4DRHVPS/h1ffM/rOWV+d3GVK9laNmeoIORhsWCa+iYwGRZr3XL3GXQzhehBO H5uFYewMVBHREADiqMNQ/ogHZI+ghXt1OSK7vhUFkYxosqU56P0YtU6SPH6UuFsH ryoiUmCgQQjjhptlvVv71D7Wj1txSCT6rByQU1YyVZ0yw9XpVuGTYBjFBY+D7nxb e3sR+Poe3diVLWDwFTXStrY0TtVlCTTCjs5T2kwUdYOJ188expQGHgj6wVl7PPTs gpeSIunekbop13KCPWV01TzmRLB8ne9ZiomsuiNnuAKhXP7KRf6AfuQd6kpyvpmH vhGcEIe7O0i4TwUIuB/dmdhLHmlOqCpLJpGQihNc+f0jJAxHv+akXEQ06H84FkJD kPkBYSVDp/2pEBdf7ig2mlpPEqANgoQY8GCu9SbEg976g0v8k6m+i9IlbR0m7hwE 0XF+8W45iNsaEIzoXcyHuB/lrUy1/0eNoG4KX8vyWIjITo5HQWg= =40TA -----END PGP SIGNATURE----- Merge tag 'rtc-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux Pull RTC updates from Alexandre Belloni: "RTC subsystem update: - Add OF device ID table for i2c drivers New RTC driver: - Motorola CPCAP PMIC RTC RTC driver updates: - cmos: fix IRQ selection - ds1307: Add ST m41t0 support - ds1374: fix watchdog configuration - sh: Add rza series support" * tag 'rtc-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (33 commits) rtc: gemini: add return value validation rtc: snvs: fix an incorrect check of return value rtc: ds1374: wdt: Fix stop/start ioctl always returning -EINVAL rtc: ds1374: wdt: Fix issue with timeout scaling from secs to wdt ticks rtc: sh: mark PM functions as unused rtc: hid-sensor-time: remove some dead code rtc: m41t80: Add proper compatible for rv4162 rtc: ds1307: Add m41t0 to OF device ID table rtc: ds1307: support m41t0 variant rtc: cpcap: fix improper use of IRQ_NONE for request_threaded_irq rtc: cmos: Do not assume irq 8 for rtc when there are no legacy irqs x86: i8259: export legacy_pic symbol dt-bindings: rtc: document the rtc-sh bindings rtc: sh: add support for rza series rtc: cpcap: kfreeing devm allocated memory rtc: wm8350: Remove unused to_wm8350_from_rtc_dev rtc: cpcap: new rtc driver dt-bindings: Add vendor prefix for Motorola rtc: omap: mark PM methods as __maybe_unused rtc: omap: remove incorrect __exit markups ...	2017-05-10 19:37:14 -07:00
Linus Torvalds	28b47809b2	IOMMU Updates for Linux v4.12 This includes: * Some code optimizations for the Intel VT-d driver * Code to switch off a previously enabled Intel IOMMU * Support for 'struct iommu_device' for OMAP, Rockchip and Mediatek IOMMUs * Some header optimizations for IOMMU core code headers and a few fixes that became necessary in other parts of the kernel because of that * ACPI/IORT updates and fixes * Some Exynos IOMMU optimizations * Code updates for the IOMMU dma-api code to bring it closer to use per-cpu iova caches * New command-line option to set default domain type allocated by the iommu core code * Another command line option to allow the Intel IOMMU switched off in a tboot environment * ARM/SMMU: TLB sync optimisations for SMMUv2, Support for using an IDENTITY domain in conjunction with DMA ops, Support for SMR masking, Support for 16-bit ASIDs (was previously broken) * Various other small fixes and improvements -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABAgAGBQJZEY4XAAoJECvwRC2XARrjth0QAKV56zjnFclv39aDo6eCq9CT 51+XT4bPY5VKQ2+Jx76TBNObHmGK+8KEMHfT9khpWJtFCDyy25SGckLry1nYqmZs tSTsbj4sOeCyKzOLITlRN9/OzKXkjKAxYuq+sQZZFDFYf3kCM/eag0dGAU6aVLNp tkIal3CSpGjCQ9M5JohrtQ1mwiGqCIkMIgvnBjRw+bfpLnQNG+VL6VU2G3RAkV2b 5Vbdoy+P7ZQnJSZr/bibYL2BaQs2diR4gOppT5YbsfniMq4QYSjheu1xBboGX8b7 sx8yuPi4370irSan0BDvlvdQdjBKIRiDjfGEKDhRwPhtvN6JREGakhEOC8MySQ37 mP96B72Lmd+a7DEl5udOL7tQILA0DcUCX0aOyF714khnZuFU5tVlCotb/36xeJ+T FPc3RbEVQ90m8dYU6MNJ+ahtb/ZapxGTRfisIigB6wlnZa0Evabp9EJSce6oJMkm whbBhDubeEU18n9XAaofMbu+P2LAzq8cxiRMlsDvT4mIy7jO86jjCmhpu1Tfn2GY 4wrEQZdWOMvhUsIhObXA0aC3BzC506uvnKPW3qy041RaxBuelWiBi29qzYbhxzkr DLDpWbUZNYPyFJjttpavyQb2/XRduBTJdVP1pQpkJNDsW5jLiBkpSqm9xNADapRY vLSYRX0JCIquaD+PAuxn =3aE8 -----END PGP SIGNATURE----- Merge tag 'iommu-updates-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull IOMMU updates from Joerg Roedel: - code optimizations for the Intel VT-d driver - ability to switch off a previously enabled Intel IOMMU - support for 'struct iommu_device' for OMAP, Rockchip and Mediatek IOMMUs - header optimizations for IOMMU core code headers and a few fixes that became necessary in other parts of the kernel because of that - ACPI/IORT updates and fixes - Exynos IOMMU optimizations - updates for the IOMMU dma-api code to bring it closer to use per-cpu iova caches - new command-line option to set default domain type allocated by the iommu core code - another command line option to allow the Intel IOMMU switched off in a tboot environment - ARM/SMMU: TLB sync optimisations for SMMUv2, Support for using an IDENTITY domain in conjunction with DMA ops, Support for SMR masking, Support for 16-bit ASIDs (was previously broken) - various other small fixes and improvements * tag 'iommu-updates-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (63 commits) soc/qbman: Move dma-mapping.h include to qman_priv.h soc/qbman: Fix implicit header dependency now causing build fails iommu: Remove trace-events include from iommu.h iommu: Remove pci.h include from trace/events/iommu.h arm: dma-mapping: Don't override dma_ops in arch_setup_dma_ops() ACPI/IORT: Fix CONFIG_IOMMU_API dependency iommu/vt-d: Don't print the failure message when booting non-kdump kernel iommu: Move report_iommu_fault() to iommu.c iommu: Include device.h in iommu.h x86, iommu/vt-d: Add an option to disable Intel IOMMU force on iommu/arm-smmu: Return IOVA in iova_to_phys when SMMU is bypassed iommu/arm-smmu: Correct sid to mask iommu/amd: Fix incorrect error handling in amd_iommu_bind_pasid() iommu: Make iommu_bus_notifier return NOTIFY_DONE rather than error code omap3isp: Remove iommu_group related code iommu/omap: Add iommu-group support iommu/omap: Make use of 'struct iommu_device' iommu/omap: Store iommu_dev pointer in arch_data iommu/omap: Move data structures to omap-iommu.h iommu/omap: Drop legacy-style device support ...	2017-05-09 15:15:47 -07:00
Andy Lutomirski	d2b6dc61a8	x86/boot/32: Fix UP boot on Quark and possibly other platforms This partially reverts commit: `23b2a4ddeb` ("x86/boot/32: Defer resyncing initial_page_table until per-cpu is set up") That commit had one definite bug and one potential bug. The definite bug is that setup_per_cpu_areas() uses a differnet generic implementation on UP kernels, so initial_page_table never got resynced. This was fine for access to percpu data (it's in the identity map on UP), but it breaks other users of initial_page_table. The potential bug is that helpers like efi_init() would be called before the tables were synced. Avoid both problems by just syncing the page tables in setup_arch() and setup_per_cpu_areas(). Reported-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Garnier <thgarnie@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-05-09 08:14:24 +02:00
Linus Torvalds	bf5f89463f	Merge branch 'akpm' (patches from Andrew) Merge more updates from Andrew Morton: - the rest of MM - various misc things - procfs updates - lib/ updates - checkpatch updates - kdump/kexec updates - add kvmalloc helpers, use them - time helper updates for Y2038 issues. We're almost ready to remove current_fs_time() but that awaits a btrfs merge. - add tracepoints to DAX * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (114 commits) drivers/staging/ccree/ssi_hash.c: fix build with gcc-4.4.4 selftests/vm: add a test for virtual address range mapping dax: add tracepoint to dax_insert_mapping() dax: add tracepoint to dax_writeback_one() dax: add tracepoints to dax_writeback_mapping_range() dax: add tracepoints to dax_load_hole() dax: add tracepoints to dax_pfn_mkwrite() dax: add tracepoints to dax_iomap_pte_fault() mtd: nand: nandsim: convert to memalloc_noreclaim_*() treewide: convert PF_MEMALLOC manipulations to new helpers mm: introduce memalloc_noreclaim_{save,restore} mm: prevent potential recursive reclaim due to clearing PF_MEMALLOC mm/huge_memory.c: deposit a pgtable for DAX PMD faults when required mm/huge_memory.c: use zap_deposited_table() more time: delete CURRENT_TIME_SEC and CURRENT_TIME gfs2: replace CURRENT_TIME with current_time apparmorfs: replace CURRENT_TIME with current_time() lustre: replace CURRENT_TIME macro fs: ubifs: replace CURRENT_TIME_SEC with current_time fs: ufs: use ktime_get_real_ts64() for birthtime ...	2017-05-08 18:17:56 -07:00
Laura Abbott	e6ccbff0e9	treewide: decouple cacheflush.h and set_memory.h Now that all call sites, completely decouple cacheflush.h and set_memory.h [sfr@canb.auug.org.au: kprobes/x86: merge fix for set_memory.h decoupling] Link: http://lkml.kernel.org/r/20170418180903.10300fd3@canb.auug.org.au Link: http://lkml.kernel.org/r/1488920133-27229-17-git-send-email-labbott@redhat.com Signed-off-by: Laura Abbott <labbott@redhat.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-05-08 17:15:14 -07:00
Laura Abbott	d11636511e	x86: use set_memory.h header set_memory_* functions have moved to set_memory.h. Switch to this explicitly. Link: http://lkml.kernel.org/r/1488920133-27229-6-git-send-email-labbott@redhat.com Signed-off-by: Laura Abbott <labbott@redhat.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-05-08 17:15:13 -07:00
Michal Hocko	19809c2da2	mm, vmalloc: use __GFP_HIGHMEM implicitly __vmalloc* allows users to provide gfp flags for the underlying allocation. This API is quite popular $ git grep "=[[:space:]]__vmalloc\\|return[[:space:]]*__vmalloc" \| wc -l 77 The only problem is that many people are not aware that they really want to give __GFP_HIGHMEM along with other flags because there is really no reason to consume precious lowmemory on CONFIG_HIGHMEM systems for pages which are mapped to the kernel vmalloc space. About half of users don't use this flag, though. This signals that we make the API unnecessarily too complex. This patch simply uses __GFP_HIGHMEM implicitly when allocating pages to be mapped to the vmalloc space. Current users which add __GFP_HIGHMEM are simplified and drop the flag. Link: http://lkml.kernel.org/r/20170307141020.29107-1-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: David Rientjes <rientjes@google.com> Cc: Cristopher Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-05-08 17:15:13 -07:00
Linus Torvalds	2d3e4866de	* ARM: HYP mode stub supports kexec/kdump on 32-bit; improved PMU support; virtual interrupt controller performance improvements; support for userspace virtual interrupt controller (slower, but necessary for KVM on the weird Broadcom SoCs used by the Raspberry Pi 3) * MIPS: basic support for hardware virtualization (ImgTec P5600/P6600/I6400 and Cavium Octeon III) * PPC: in-kernel acceleration for VFIO * s390: support for guests without storage keys; adapter interruption suppression * x86: usual range of nVMX improvements, notably nested EPT support for accessed and dirty bits; emulation of CPL3 CPUID faulting * generic: first part of VCPU thread request API; kvm_stat improvements -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJZEHUkAAoJEL/70l94x66DBeYH/09wrpJ2FjU4Rqv7FxmqgWfH 9WGi4wvn/Z+XzQSyfMJiu2SfZVzU69/Y67OMHudy7vBT6knB+ziM7Ntoiu/hUfbG 0g5KsDX79FW15HuvuuGh9kSjUsj7qsQdyPZwP4FW/6ZoDArV9mibSvdjSmiUSMV/ 2wxaoLzjoShdOuCe9EABaPhKK0XCrOYkygT6Paz1pItDxaSn8iW3ulaCuWMprUfG Niq+dFemK464E4yn6HVD88xg5j2eUM6bfuXB3qR3eTR76mHLgtwejBzZdDjLG9fk 32PNYKhJNomBxHVqtksJ9/7cSR6iNPs7neQ1XHemKWTuYqwYQMlPj1NDy0aslQU= =IsiZ -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM updates from Paolo Bonzini: "ARM: - HYP mode stub supports kexec/kdump on 32-bit - improved PMU support - virtual interrupt controller performance improvements - support for userspace virtual interrupt controller (slower, but necessary for KVM on the weird Broadcom SoCs used by the Raspberry Pi 3) MIPS: - basic support for hardware virtualization (ImgTec P5600/P6600/I6400 and Cavium Octeon III) PPC: - in-kernel acceleration for VFIO s390: - support for guests without storage keys - adapter interruption suppression x86: - usual range of nVMX improvements, notably nested EPT support for accessed and dirty bits - emulation of CPL3 CPUID faulting generic: - first part of VCPU thread request API - kvm_stat improvements" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (227 commits) kvm: nVMX: Don't validate disabled secondary controls KVM: put back #ifndef CONFIG_S390 around kvm_vcpu_kick Revert "KVM: Support vCPU-based gfn->hva cache" tools/kvm: fix top level makefile KVM: x86: don't hold kvm->lock in KVM_SET_GSI_ROUTING KVM: Documentation: remove VM mmap documentation kvm: nVMX: Remove superfluous VMX instruction fault checks KVM: x86: fix emulation of RSM and IRET instructions KVM: mark requests that need synchronization KVM: return if kvm_vcpu_wake_up() did wake up the VCPU KVM: add explicit barrier to kvm_vcpu_kick KVM: perform a wake_up in kvm_make_all_cpus_request KVM: mark requests that do not need a wakeup KVM: remove #ifndef CONFIG_S390 around kvm_vcpu_wake_up KVM: x86: always use kvm_make_request instead of set_bit KVM: add kvm_{test,clear}_request to replace {test,clear}_bit s390: kvm: Cpu model support for msa6, msa7 and msa8 KVM: x86: remove irq disablement around KVM_SET_CLOCK/KVM_GET_CLOCK kvm: better MWAIT emulation for guests KVM: x86: virtualize cpuid faulting ...	2017-05-08 12:37:56 -07:00
Xunlei Pang	8638100c52	x86/kexec/64: Use gbpages for identity mappings if available Kexec sets up all identity mappings before booting into the new kernel, and this will cause extra memory consumption for paging structures which is quite considerable on modern machines with huge memory sizes. E.g. on a 32TB machine that is kdumping, it could waste around 128MB (around 4MB/TB) from the reserved memory after kexec sets all the identity mappings using the current 2MB page. Add to that the memory needed for the loaded kdump kernel, initramfs, etc., and it causes a kexec syscall -NOMEM failure. As a result, we had to enlarge reserved memory via "crashkernel=X" to work around this problem. This causes some trouble for distributions that use policies to evaluate the proper "crashkernel=X" value for users. So enable gbpages for kexec mappings. Signed-off-by: Xunlei Pang <xlpang@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Young <dyoung@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yinghai Lu <yinghai@kernel.org> Cc: akpm@linux-foundation.org Cc: kexec@lists.infradead.org Link: http://lkml.kernel.org/r/1493862171-8799-2-git-send-email-xlpang@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-05-08 08:28:44 +02:00
Xunlei Pang	66aad4fdf2	x86/mm: Add support for gbpages to kernel_ident_mapping_init() Kernel identity mappings on x86-64 kernels are created in two ways: by the early x86 boot code, or by kernel_ident_mapping_init(). Native kernels (which is the dominant usecase) use the former, but the kexec and the hibernation code uses kernel_ident_mapping_init(). There's a subtle difference between these two ways of how identity mappings are created, the current kernel_ident_mapping_init() code creates identity mappings always using 2MB page(PMD level) - while the native kernel boot path also utilizes gbpages where available. This difference is suboptimal both for performance and for memory usage: kernel_ident_mapping_init() needs to allocate pages for the page tables when creating the new identity mappings. This patch adds 1GB page(PUD level) support to kernel_ident_mapping_init() to address these concerns. The primary advantage would be better TLB coverage/performance, because we'd utilize 1GB TLBs instead of 2MB ones. It is also useful for machines with large number of memory to save paging structure allocations(around 4MB/TB using 2MB page) when setting identity mappings for all the memory, after using 1GB page it will consume only 8KB/TB. ( Note that this change alone does not activate gbpages in kexec, we are doing that in a separate patch. ) Signed-off-by: Xunlei Pang <xlpang@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Young <dyoung@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yinghai Lu <yinghai@kernel.org> Cc: akpm@linux-foundation.org Cc: kexec@lists.infradead.org Link: http://lkml.kernel.org/r/1493862171-8799-1-git-send-email-xlpang@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-05-08 08:28:40 +02:00
Ingo Molnar	415812f2d6	Merge branch 'linus' into x86/urgent, to pick up dependent commits We are going to fix a bug introduced by a more recent commit, so refresh the tree. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-05-05 08:21:03 +02:00
Linus Torvalds	af82455f7d	char/misc patches for 4.12-rc1 Here is the big set of new char/misc driver drivers and features for 4.12-rc1. There's lots of new drivers added this time around, new firmware drivers from Google, more auxdisplay drivers, extcon drivers, fpga drivers, and a bunch of other driver updates. Nothing major, except if you happen to have the hardware for these drivers, and then you will be happy :) All of these have been in linux-next for a while with no reported issues. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWQvAgg8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+yknsACgzkAeyz16Z97J3UTaeejbR7nKUCAAoKY4WEHY 8O9f9pr9gj8GMBwxeZQa =OIfB -----END PGP SIGNATURE----- Merge tag 'char-misc-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver updates from Greg KH: "Here is the big set of new char/misc driver drivers and features for 4.12-rc1. There's lots of new drivers added this time around, new firmware drivers from Google, more auxdisplay drivers, extcon drivers, fpga drivers, and a bunch of other driver updates. Nothing major, except if you happen to have the hardware for these drivers, and then you will be happy :) All of these have been in linux-next for a while with no reported issues" * tag 'char-misc-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (136 commits) firmware: google memconsole: Fix return value check in platform_memconsole_init() firmware: Google VPD: Fix return value check in vpd_platform_init() goldfish_pipe: fix build warning about using too much stack. goldfish_pipe: An implementation of more parallel pipe fpga fr br: update supported version numbers fpga: region: release FPGA region reference in error path fpga altera-hps2fpga: disable/unprepare clock on error in alt_fpga_bridge_probe() mei: drop the TODO from samples firmware: Google VPD sysfs driver firmware: Google VPD: import lib_vpd source files misc: lkdtm: Add volatile to intentional NULL pointer reference eeprom: idt_89hpesx: Add OF device ID table misc: ds1682: Add OF device ID table misc: tsl2550: Add OF device ID table w1: Remove unneeded use of assert() and remove w1_log.h w1: Use kernel common min() implementation uio_mf624: Align memory regions to page size and set correct offsets uio_mf624: Refactor memory info initialization uio: Allow handling of non page-aligned memory regions hangcheck-timer: Fix typo in comment ...	2017-05-04 19:15:35 -07:00
Linus Torvalds	a96480723c	xen: fixes and featrues for 4.12 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAABAgAGBQJZChTBAAoJELDendYovxMvkXEIAJDpK5UKMsL1Ihgc0DL0OujQ UGxLfWJueSA1X7i8BgL/8vfgKxSEB9SUiM+ooHOKXS6oDhyk2RP4MuCe5+lhUbbv ZMK5KxHMlVUOD9EjYif8DhhiwRowBbWYEwr8XgY12s0Ya0a9TQLVC+noGsuzqNiH 1UyzeeWlBae4nulUMMim6urPNq5AEPVeQKNX3S8rlnDp74IKVZuoISMM62b2KRSr +R8FVBshXR/HO53YNY0+AfmmUa8T1+dyjL50Eo/QnsG0i+3igOqNrzSKSc6T+nBt Zl3KDUE5W3/OlxuR+CIdZZ1KKtjzoAiR3cvVlHs2z7MIio87bJcYJforAqe6Evo= =k6in -----END PGP SIGNATURE----- Merge tag 'for-linus-4.12b-rc0b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from Juergen Gross: "Xen fixes and featrues for 4.12. The main changes are: - enable building the kernel with Xen support but without enabling paravirtualized mode (Vitaly Kuznetsov) - add a new 9pfs xen frontend driver (Stefano Stabellini) - simplify Xen's cpuid handling by making use of cpu capabilities (Juergen Gross) - add/modify some headers for new Xen paravirtualized devices (Oleksandr Andrushchenko) - EFI reset_system support under Xen (Julien Grall) - and the usual cleanups and corrections" * tag 'for-linus-4.12b-rc0b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (57 commits) xen: Move xen_have_vector_callback definition to enlighten.c xen: Implement EFI reset_system callback arm/xen: Consolidate calls to shutdown hypercall in a single helper xen: Export xen_reboot xen/x86: Call xen_smp_intr_init_pv() on BSP xen: Revert commits `da72ff5bfc` and `72a9b18629` xen/pvh: Do not fill kernel's e820 map in init_pvh_bootparams() xen/scsifront: use offset_in_page() macro xen/arm,arm64: rename __generic_dma_ops to xen_get_dma_ops xen/arm,arm64: fix xen_dma_ops after `815dd18` "Consolidate get_dma_ops..." xen/9pfs: select CONFIG_XEN_XENBUS_FRONTEND x86/cpu: remove hypervisor specific set_cpu_features vmware: set cpu capabilities during platform initialization x86/xen: use capabilities instead of fake cpuid values for xsave x86/xen: use capabilities instead of fake cpuid values for x2apic x86/xen: use capabilities instead of fake cpuid values for mwait x86/xen: use capabilities instead of fake cpuid values for acpi x86/xen: use capabilities instead of fake cpuid values for acc x86/xen: use capabilities instead of fake cpuid values for mtrr x86/xen: use capabilities instead of fake cpuid values for aperf ...	2017-05-04 11:37:09 -07:00
Joerg Roedel	2c0248d688	Merge branches 'arm/exynos', 'arm/omap', 'arm/rockchip', 'arm/mediatek', 'arm/smmu', 'arm/core', 'x86/vt-d', 'x86/amd' and 'core' into next	2017-05-04 18:06:17 +02:00
Linus Torvalds	4c174688ee	New features for this release: o Pretty much a full rewrite of the processing of function plugins. i.e. echo do_IRQ:stacktrace > set_ftrace_filter o The rewrite was needed to add plugins to be unique to tracing instances. i.e. mkdir instance/foo; cd instances/foo; echo do_IRQ:stacktrace > set_ftrace_filter The old way was written very hacky. This removes a lot of those hacks. o New "function-fork" tracing option. When set, pids in the set_ftrace_pid will have their children added when the processes with their pids listed in the set_ftrace_pid file forks. o Exposure of "maxactive" for kretprobe in kprobe_events o Allow for builtin init functions to be traced by the function tracer (via the kernel command line). Module init function tracing will come in the next release. o Added more selftests, and have selftests also test in an instance. -----BEGIN PGP SIGNATURE----- iQExBAABCAAbBQJZCRchFBxyb3N0ZWR0QGdvb2RtaXMub3JnAAoJEMm5BfJq2Y3L zuIH/RsLUb8Hj6GmhAvn/tblUDzWyqlXX2h79VVlo/XrWayHYNHnKOmua1WwMZC6 xESXb/AffAc89VWTkKsrwaK7yfRPG6+w8zTZOcFuXSBpqSGG/oey9Fxj5Wqqpche oJ2UY7ngxANAipkP5GxdYTafFSoWhGZGfUUtW+5tAHoFHzqO2lOjO8olbXP69sON kVX/b461S20cVvRe5H/F0klXLSc37Tlp5YznXy4H4V4HcJSN1Fb6/uozOXALZ4se SBpVMWmVVoGJorzj+ic7gVOeohvC8RnR400HbeMVwaI0Lj50noidDj/5Hv8F7T+D h1B8vATNZLFAFUOSHINCBIu6Vj0= =t8mg -----END PGP SIGNATURE----- Merge tag 'trace-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "New features for this release: - Pretty much a full rewrite of the processing of function plugins. i.e. echo do_IRQ:stacktrace > set_ftrace_filter - The rewrite was needed to add plugins to be unique to tracing instances. i.e. mkdir instance/foo; cd instances/foo; echo do_IRQ:stacktrace > set_ftrace_filter The old way was written very hacky. This removes a lot of those hacks. - New "function-fork" tracing option. When set, pids in the set_ftrace_pid will have their children added when the processes with their pids listed in the set_ftrace_pid file forks. - Exposure of "maxactive" for kretprobe in kprobe_events - Allow for builtin init functions to be traced by the function tracer (via the kernel command line). Module init function tracing will come in the next release. - Added more selftests, and have selftests also test in an instance" * tag 'trace-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (60 commits) ring-buffer: Return reader page back into existing ring buffer selftests: ftrace: Allow some event trigger tests to run in an instance selftests: ftrace: Have some basic tests run in a tracing instance too selftests: ftrace: Have event tests also run in an tracing instance selftests: ftrace: Make func_event_triggers and func_traceonoff_triggers tests do instances selftests: ftrace: Allow some tests to be run in a tracing instance tracing/ftrace: Allow for instances to trigger their own stacktrace probes tracing/ftrace: Allow for the traceonoff probe be unique to instances tracing/ftrace: Enable snapshot function trigger to work with instances tracing/ftrace: Allow instances to have their own function probes tracing/ftrace: Add a better way to pass data via the probe functions ftrace: Dynamically create the probe ftrace_ops for the trace_array tracing: Pass the trace_array into ftrace_probe_ops functions tracing: Have the trace_array hold the list of registered func probes ftrace: If the hash for a probe fails to update then free what was initialized ftrace: Have the function probes call their own function ftrace: Have each function probe use its own ftrace_ops ftrace: Have unregister_ftrace_function_probe_func() return a value ftrace: Add helper function ftrace_hash_move_and_update_ops() ftrace: Remove data field from ftrace_func_probe structure ...	2017-05-03 18:41:21 -07:00
Linus Torvalds	2f34c1231b	main drm pull request for 4.12 kernel -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJZCTzvAAoJEAx081l5xIa+9kcQAJsQiija4/7QGx6IzakOMqjx WulJ3zYG/cU/HLwCBcuWRDF6wAj+7iWNeLCPmolHwEazcI8tQVdgMlWtbdMbDh8U ckzD3FBXsEVfIfab+u6tyoUkm3l/VDhMXbjkUK7NTo/+dkRqe5LuFfZPCGN09jft Y+5salkRXzDhXPSFsqmjfzhx1v7PTgf0a5HUenKWEWOv+sJQaW4/iPvcDSIcg5qR l9WjAqro1NpFYhUodnh6DkLeledL1U5whdtp/yvrUAck8y+WP/jwGYmQ7pZ0UkQm f0M3kV6K67ox9eqN++jsGX5o8sB1qF01Uh95kBAnyzYzsw4ZlMCx6pV7PDX+J88M UBNMEqX10hrLkNJA9lGjPWx+/6fudcwg9anKvTRO3Uyx7MbYoJAgjzAM+yBqqtV0 8Otxa4Bw0V2pmUD+0lqJDERRvE77VCXkLb8SaI5lQo0MHpQqT2cZA+GD+B+rZHO6 Ie5LDFY87vM2GG1IECufG+xOa3v6sn2FfQ1ouu1KNGKOAMBKcQCQyQx3kGVuNW2i HDACVXALJgXdRlVLm4jydOCZdRoguX7AWmRjtdwxgaO+lBcGfLhkXdjLQ7Ho+29p 32ArJfkZPfA53vMB6lHxAfbtrs1q2RzyVnPHj/KqeJnGZbABKTsF2HQ5BQc4Xq/J mqXoz6Oubdvk4Pwyx7Ne =UxFF -----END PGP SIGNATURE----- Merge tag 'drm-for-v4.12' of git://people.freedesktop.org/~airlied/linux Pull drm u pdates from Dave Airlie: "This is the main drm pull request for v4.12. Apart from two fixes pulls, everything should have been in drm-next for at least 2 weeks. The biggest thing in here is AMD released the public headers for their upcoming VEGA GPUs. These as always are quite a sizeable chunk of header files. They've also added initial non-display support for those GPUs, though they aren't available in production yet. Otherwise it's pretty much normal. New bridge drivers: - megachips-stdpxxxx-ge-b850v3-fw LVDS->DP++ - generic LVDS bridge support. Core: - Displayport link train failure reporting to userspace - debugfs interface cleaned up - subsystem TODO in kerneldoc now - Extended fbdev support (flipping and vblank wait) - drm_platform removed - EDP CRC support in helper - HF-VSDB SCDC support in EDID parser - Lots of code cleanups and header extraction - Thunderbolt external GPU awareness - Atomic helper improvements - Documentation improvements panel: - Sitronix and Samsung new panel support amdgpu: - Preliminary vega10 support - Multi-level page table support - GPU sensor support for userspace - PRT support for sparse buffers - SR-IOV improvements - Non-contig VRAM CPU mapping i915: - Atomic modesetting enabled by default on Gen5+ - LSPCON improvements - Atomic state handling for cdclk - GPU reset improvements - In-kernel unit tests - Geminilake improvements and color manager support - Designware i2c fixes - vblank evasion improvements - Hotplug safe connector iterators - GVT scheduler QoS support - GVT Kabylake support nouveau: - Acceleration support for Pascal (GP10x). - Rearchitecture of code handling proprietary signed firmware - Fix GTX 970 with odd MMU configuration - GP10B support - GP107 acceleration support vmwgfx: - Atomic modesetting support for vmwgfx omapdrm: - Support for render nodes - Refactor omapdss code - Fix some probe ordering issues - Fix too dark RGB565 rendering sunxi: - prelim rework for multiple pipes. mali-dp: - Color management support - Plane scaling - Power management improvements imx-drm: - Prefetch Resolve Engine/Gasket on i.MX6QP - Deferred plane disabling - Separate alpha support mediatek: - Mediatek SoC MT2701 support rcar-du: - Gen3 HDMI support msm: - 4k support for newer chips - OPP bindings for gpu - prep work for per-process pagetables vc4: - HDMI audio support - fixes qxl: - minor fixes. dw-hdmi: - PHY improvements - CSC fixes - Amlogic GX SoC support" * tag 'drm-for-v4.12' of git://people.freedesktop.org/~airlied/linux: (1778 commits) drm/nouveau/fb/gf100-: Fix 32 bit wraparound in new ram detection drm/nouveau/secboot/gm20b: fix the error return code in gm20b_secboot_tegra_read_wpr() drm/nouveau/kms: Increase max retries in scanout position queries. drm/nouveau/bios/bitP: check that table is long enough for optional pointers drm/nouveau/fifo/nv40: no ctxsw for pre-nv44 mpeg engine drm: mali-dp: use div_u64 for expensive 64-bit divisions drm/i915: Confirm the request is still active before adding it to the await drm/i915: Avoid busy-spinning on VLV_GLTC_PW_STATUS mmio drm/i915/selftests: Allocate inode/file dynamically drm/i915: Fix system hang with EI UP masked on Haswell drm/i915: checking for NULL instead of IS_ERR() in mock selftests drm/i915: Perform link quality check unconditionally during long pulse drm/i915: Fix use after free in lpe_audio_platdev_destroy() drm/i915: Use the right mapping_gfp_mask for final shmem allocation drm/i915: Make legacy cursor updates more unsynced drm/i915: Apply a cond_resched() to the saturated signaler drm/i915: Park the signaler before sleeping drm: mali-dp: Check the mclk rate and allow up/down scaling drm: mali-dp: Enable image enhancement when scaling drm: mali-dp: Add plane upscaling support ...	2017-05-03 11:44:24 -07:00
Linus Torvalds	76f1948a79	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching Pull livepatch updates from Jiri Kosina: - a per-task consistency model is being added for architectures that support reliable stack dumping (extending this, currently rather trivial set, is currently in the works). This extends the nature of the types of patches that can be applied by live patching infrastructure. The code stems from the design proposal made [1] back in November 2014. It's a hybrid of SUSE's kGraft and RH's kpatch, combining advantages of both: it uses kGraft's per-task consistency and syscall barrier switching combined with kpatch's stack trace switching. There are also a number of fallback options which make it quite flexible. Most of the heavy lifting done by Josh Poimboeuf with help from Miroslav Benes and Petr Mladek [1] https://lkml.kernel.org/r/20141107140458.GA21774@suse.cz - module load time patch optimization from Zhou Chengming - a few assorted small fixes * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching: livepatch: add missing printk newlines livepatch: Cancel transition a safe way for immediate patches livepatch: Reduce the time of finding module symbols livepatch: make klp_mutex proper part of API livepatch: allow removal of a disabled patch livepatch: add /proc/<pid>/patch_state livepatch: change to a per-task consistency model livepatch: store function sizes livepatch: use kstrtobool() in enabled_store() livepatch: move patching functions into patch.c livepatch: remove unnecessary object loaded check livepatch: separate enabled and patched states livepatch/s390: add TIF_PATCH_PENDING thread flag livepatch/s390: reorganize TIF thread flag bits livepatch/powerpc: add TIF_PATCH_PENDING thread flag livepatch/x86: add TIF_PATCH_PENDING thread flag livepatch: create temporary klp_update_patch_state() stub x86/entry: define _TIF_ALLWORK_MASK flags explicitly stacktrace/x86: add function for detecting reliable stack traces	2017-05-02 18:24:16 -07:00
Juergen Gross	65f9d65443	x86/cpu: remove hypervisor specific set_cpu_features There is no user of x86_hyper->set_cpu_features() any more. Remove it. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Juergen Gross <jgross@suse.com>	2017-05-02 11:14:30 +02:00
Juergen Gross	d40342a2ac	vmware: set cpu capabilities during platform initialization There is no need to set the same capabilities for each cpu individually. This can be done for all cpus in platform initialization. Cc: Alok Kataria <akataria@vmware.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Cc: virtualization@lists.linux-foundation.org Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Alok Kataria <akataria@vmware.com> Signed-off-by: Juergen Gross <jgross@suse.com>	2017-05-02 11:14:24 +02:00
Vitaly Kuznetsov	5e57f1d607	x86/xen: add CONFIG_XEN_PV to Kconfig All code to support Xen PV will get under this new option. For the beginning, check for it in the common code. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>	2017-05-02 10:50:19 +02:00
Vitaly Kuznetsov	0991d22d5e	x86/xen: separate PV and HVM hypervisors As a preparation to splitting the code we need to untangle it: x86_hyper_xen -> x86_hyper_xen_hvm and x86_hyper_xen_pv xen_platform() -> xen_platform_hvm() and xen_platform_pv() xen_cpu_up_prepare() -> xen_cpu_up_prepare_pv() and xen_cpu_up_prepare_hvm() xen_cpu_dead() -> xen_cpu_dead_pv() and xen_cpu_dead_pv_hvm() Add two parameters to xen_cpuhp_setup() to pass proper cpu_up_prepare and cpu_dead hooks. xen_set_cpu_features() is now PV-only so the redundant xen_pv_domain() check can be dropped. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>	2017-05-02 10:49:44 +02:00
Linus Torvalds	d3b5d35290	Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm updates from Ingo Molnar: "The main x86 MM changes in this cycle were: - continued native kernel PCID support preparation patches to the TLB flushing code (Andy Lutomirski) - various fixes related to 32-bit compat syscall returning address over 4Gb in applications, launched from 64-bit binaries - motivated by C/R frameworks such as Virtuozzo. (Dmitry Safonov) - continued Intel 5-level paging enablement: in particular the conversion of x86 GUP to the generic GUP code. (Kirill A. Shutemov) - x86/mpx ABI corner case fixes/enhancements (Joerg Roedel) - ... plus misc updates, fixes and cleanups" * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (62 commits) mm, zone_device: Replace {get, put}_zone_device_page() with a single reference to fix pmem crash x86/mm: Fix flush_tlb_page() on Xen x86/mm: Make flush_tlb_mm_range() more predictable x86/mm: Remove flush_tlb() and flush_tlb_current_task() x86/vm86/32: Switch to flush_tlb_mm_range() in mark_screen_rdonly() x86/mm/64: Fix crash in remove_pagetable() Revert "x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation" x86/boot/e820: Remove a redundant self assignment x86/mm: Fix dump pagetables for 4 levels of page tables x86/mpx, selftests: Only check bounds-vs-shadow when we keep shadow x86/mpx: Correctly report do_mpx_bt_fault() failures to user-space Revert "x86/mm/numa: Remove numa_nodemask_from_meminfo()" x86/espfix: Add support for 5-level paging x86/kasan: Extend KASAN to support 5-level paging x86/mm: Add basic defines/helpers for CONFIG_X86_5LEVEL=y x86/paravirt: Add 5-level support to the paravirt code x86/mm: Define virtual memory map for 5-level paging x86/asm: Remove __VIRTUAL_MASK_SHIFT==47 assert x86/boot: Detect 5-level paging support x86/mm/numa: Remove numa_nodemask_from_meminfo() ...	2017-05-01 23:54:56 -07:00
Linus Torvalds	888411be09	Merge branch 'x86-irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 irq update from Ingo Molnar: "A single commit that micro-optimizes an IRQ vectors code path in the CPU offlining code" * 'x86-irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/irq: Optimize free vector check in the CPU offline path	2017-05-01 23:03:17 -07:00
Linus Torvalds	7d6a31c394	Merge branch 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 debug updates from Ingo Molnar: "The biggest update is the addition of USB3 debug port based early-console. Greg was fine with the USB changes and with the routing of these patches: https://www.spinics.net/lists/linux-usb/msg155093.html" * 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: usb/doc: Add document for USB3 debug port usage usb/serial: Add DBC debug device support to usb_debug x86/earlyprintk: Add support for earlyprintk via USB3 debug port usb/early: Add driver for xhci debug capability x86/timers: Add simple udelay calibration	2017-05-01 23:00:21 -07:00
Linus Torvalds	2cc12e2e8c	Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: "A handful of small cleanups" * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/irq: Remove a redundant #ifdef directive x86/smp: Remove the redundant #ifdef CONFIG_SMP directive x86/smp: Reduce code duplication x86/pci-calgary: Use setup_timer() instead of open coding it.	2017-05-01 22:34:52 -07:00
Linus Torvalds	3fb9268e43	Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 asm updates from Ingo Molnar: "The main changes in this cycle were: - unwinder fixes and enhancements - improve ftrace interaction with the unwinder - optimize the code footprint of WARN() and related debugging constructs - ... plus misc updates, cleanups and fixes" * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) x86/unwind: Dump all stacks in unwind_dump() x86/unwind: Silence more entry-code related warnings x86/ftrace: Fix ebp in ftrace_regs_caller that screws up unwinder x86/unwind: Remove unused 'sp' parameter in unwind_dump() x86/unwind: Prepend hex mask value with '0x' in unwind_dump() x86/unwind: Properly zero-pad 32-bit values in unwind_dump() x86/unwind: Ensure stack pointer is aligned debug: Avoid setting BUGFLAG_WARNING twice x86/unwind: Silence entry-related warnings x86/unwind: Read stack return address in update_stack_state() x86/unwind: Move common code into update_stack_state() debug: Fix __bug_table[] in arch linker scripts debug: Add _ONCE() logic to report_bug() x86/debug: Define BUG() again for !CONFIG_BUG x86/debug: Implement __WARN() using UD0 x86/ftrace: Use Makefile logic instead of #ifdef for compiling ftrace_*.o x86/ftrace: Add -mfentry support to x86_32 with DYNAMIC_FTRACE set x86/ftrace: Clean up ftrace_regs_caller x86/ftrace: Add stack frame pointer to ftrace_caller x86/ftrace: Move the ftrace specific code out of entry_32.S ...	2017-05-01 22:07:51 -07:00
Linus Torvalds	12ca7c8db3	Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 apic updates from Ingo Molnar: "Two small cleanups" * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/apic: Fix a comment in init_apic_mappings() x86/apic: Remove the SET_APIC_ID(x) macro	2017-05-01 21:41:07 -07:00
Linus Torvalds	a52bbaf4a3	Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpu updates from Ingo Molnar: "The biggest changes are an extension of the Intel RDT code to extend it with Intel Memory Bandwidth Allocation CPU support: MBA allows bandwidth allocation between cores, while CBM (already upstream) allows CPU cache partitioning. There's also misc smaller fixes and updates" * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) x86/intel_rdt: Return error for incorrect resource names in schemata x86/intel_rdt: Trim whitespace while parsing schemata input x86/intel_rdt: Fix padding when resource is enabled via mount x86/intel_rdt: Get rid of anon union x86/cpu: Keep model defines sorted by model number x86/intel_rdt/mba: Add schemata file support for MBA x86/intel_rdt: Make schemata file parsers resource specific x86/intel_rdt/mba: Add info directory files for Memory Bandwidth Allocation x86/intel_rdt: Make information files resource specific x86/intel_rdt/mba: Add primary support for Memory Bandwidth Allocation (MBA) x86/intel_rdt/mba: Memory bandwith allocation feature detect x86/intel_rdt: Add resource specific msr update function x86/intel_rdt: Move CBM specific data into a struct x86/intel_rdt: Cleanup namespace to support multiple resource types Documentation, x86: Intel Memory bandwidth allocation x86/intel_rdt: Organize code properly x86/intel_rdt: Init padding only if a device exists x86/intel_rdt: Add cpus_list rdtgroup file x86/intel_rdt: Cleanup kernel-doc x86/intel_rdt: Update schemata read to show data in tabular format ...	2017-05-01 21:15:50 -07:00
Linus Torvalds	16b76293c5	Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 boot updates from Ingo Molnar: "The biggest changes in this cycle were: - reworking of the e820 code: separate in-kernel and boot-ABI data structures and apply a whole range of cleanups to the kernel side. No change in functionality. - enable KASLR by default: it's used by all major distros and it's out of the experimental stage as well. - ... misc fixes and cleanups" * 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (63 commits) x86/KASLR: Fix kexec kernel boot crash when KASLR randomization fails x86/reboot: Turn off KVM when halting a CPU x86/boot: Fix BSS corruption/overwrite bug in early x86 kernel startup x86: Enable KASLR by default boot/param: Move next_arg() function to lib/cmdline.c for later reuse x86/boot: Fix Sparse warning by including required header file x86/boot/64: Rename start_cpu() x86/xen: Update e820 table handling to the new core x86 E820 code x86/boot: Fix pr_debug() API braindamage xen, x86/headers: Add <linux/device.h> dependency to <asm/xen/page.h> x86/boot/e820: Simplify e820__update_table() x86/boot/e820: Separate the E820 ABI structures from the in-kernel structures x86/boot/e820: Fix and clean up e820_type switch() statements x86/boot/e820: Rename the remaining E820 APIs to the e820__() prefix x86/boot/e820: Remove unnecessary #include's x86/boot/e820: Rename e820_mark_nosave_regions() to e820__register_nosave_regions() x86/boot/e820: Rename e820_reserve_resources() to e820__reserve_resources*() x86/boot/e820: Use bool in query APIs x86/boot/e820: Document e820__reserve_setup_data() x86/boot/e820: Clean up __e820__update_table() et al ...	2017-05-01 20:51:12 -07:00
Linus Torvalds	3dee9fb2a4	Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS updates from Ingo Molnar: "The main changes in this cycle were: - add the 'Corrected Errors Collector' kernel feature which collect and monitor correctable errors statistics and will preemptively (soft-)offline physical pages that have a suspiciously high error count. - handle MCE errors during kexec() more gracefully - factor out and deprecate the /dev/mcelog driver - ... plus misc fixes and cleanpus" * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Check MCi_STATUS[MISCV] for usable addr on Intel only ACPI/APEI: Use setup_deferrable_timer() x86/mce: Update notifier priority check x86/mce: Enable PPIN for Knights Landing/Mill x86/mce: Do not register notifiers with invalid prio x86/mce: Factor out and deprecate the /dev/mcelog driver RAS: Add a Corrected Errors Collector x86/mce: Rename mce_log to mce_log_buffer x86/mce: Rename mce_log()'s argument x86/mce: Init some CPU features early x86/mce: Handle broadcasted MCE gracefully with kexec	2017-05-01 20:48:33 -07:00
Linus Torvalds	7c8c03bfc7	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "The main changes in this cycle were: Kernel side changes: - Kprobes and uprobes changes: - Make their trampolines read-only while they are used - Make UPROBES_EVENTS default-y which is the distro practice - Apply misc fixes and robustization to probe point insertion. - add support for AMD IOMMU events - extend hw events on Intel Goldmont CPUs - ... plus misc fixes and updates. Tooling side changes: - support s390 jump instructions in perf annotate (Christian Borntraeger) - vendor hardware events updates (Andi Kleen) - add argument support for SDT events in powerpc (Ravi Bangoria) - beautify the statx syscall arguments in 'perf trace' (Arnaldo Carvalho de Melo) - handle inline functions in callchains (Jin Yao) - enable sorting by srcline as key (Milian Wolff) - add 'brstackinsn' field in 'perf script' to reuse the x86 instruction decoder used in the Intel PT code to study hot paths to samples (Andi Kleen) - add PERF_RECORD_NAMESPACES so that the kernel can record information required to associate samples to namespaces, helping in container problem characterization. (Hari Bathini) - allow sorting by symbol_size in 'perf report' and 'perf top' (Charles Baylis) - in perf stat, make system wide (-a) the default option if no target was specified and one of following conditions is met: - no workload specified (current behaviour) - a workload is specified but all requested events are system wide ones, like uncore ones. (Jiri Olsa) - ... plus lots of other updates, enhancements, cleanups and fixes" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (235 commits) perf tools: Fix the code to strip command name tools arch x86: Sync cpufeatures.h tools arch: Sync arch/x86/lib/memcpy_64.S with the kernel tools: Update asm-generic/mman-common.h copy from the kernel perf tools: Use just forward declarations for struct thread where possible perf tools: Add the right header to obtain PERF_ALIGN() perf tools: Remove poll.h and wait.h from util.h perf tools: Remove string.h, unistd.h and sys/stat.h from util.h perf tools: Remove stale prototypes from builtin.h perf tools: Remove string.h from util.h perf tools: Remove sys/ioctl.h from util.h perf tools: Remove a few more needless includes from util.h perf tools: Include sys/param.h where needed perf callchain: Move callchain specific routines from util.[ch] perf tools: Add compress.h for the *_decompress_to_file() headers perf mem: Fix display of data source snoop indication perf debug: Move dump_stack() and sighandler_dump_stack() to debug.h perf kvm: Make function only used by 'perf kvm' static perf tools: Move timestamp routines from util.h to time-utils.h perf tools: Move units conversion/formatting routines to separate object ...	2017-05-01 20:23:17 -07:00
Linus Torvalds	6dc2cce932	Merge branch 'x86-process-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pul x86/process updates from Ingo Molnar: "The main change in this cycle was to add the ARCH_[GET\|SET]_CPUID prctl() ABI extension to control the availability of the CPUID instruction, analogously to the existing PR_GET\|SET_TSC ABI that controls RDTSC. Motivation: the 'rr' user-space record-and-replay execution debugger would like to trap and emulate the CPUID instruction - which instruction is normally unprivileged. Trapping CPUID is possible on IvyBridge and later Intel CPUs - expose this hardware capability" * 'x86-process-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/syscalls/32: Ignore arch_prctl for other architectures um/arch_prctl: Fix fallout from x86 arch_prctl() rework x86/arch_prctl: Add ARCH_[GET\|SET]_CPUID x86/cpufeature: Detect CPUID faulting support x86/syscalls/32: Wire up arch_prctl on x86-32 x86/arch_prctl: Add do_arch_prctl_common() x86/arch_prctl/64: Rename do_arch_prctl() to do_arch_prctl_64() x86/arch_prctl/64: Use SYSCALL_DEFINE2 to define sys_arch_prctl() x86/arch_prctl: Rename 'code' argument to 'option' x86/msr: Rename MISC_FEATURE_ENABLES to MISC_FEATURES_ENABLES x86/process: Optimize TIF_NOTSC switch x86/process: Correct and optimize TIF_BLOCKSTEP switch x86/process: Optimize TIF checks in __switch_to_xtra()	2017-05-01 19:57:58 -07:00
Linus Torvalds	3527d3e951	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "The main changes in this cycle were: - another round of rq-clock handling debugging, robustization and fixes - PELT accounting improvements - CPU hotplug related ->cpus_allowed affinity handling fixes all around the tree - ... plus misc fixes, cleanups and updates" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (35 commits) sched/x86: Update reschedule warning text crypto: N2 - Replace racy task affinity logic cpufreq/sparc-us2e: Replace racy task affinity logic cpufreq/sparc-us3: Replace racy task affinity logic cpufreq/sh: Replace racy task affinity logic cpufreq/ia64: Replace racy task affinity logic ACPI/processor: Replace racy task affinity logic ACPI/processor: Fix error handling in __acpi_processor_start() sparc/sysfs: Replace racy task affinity logic powerpc/smp: Replace open coded task affinity logic ia64/sn/hwperf: Replace racy task affinity logic ia64/salinfo: Replace racy task affinity logic workqueue: Provide work_on_cpu_safe() ia64/topology: Remove cpus_allowed manipulation sched/fair: Move the PELT constants into a generated header sched/fair: Increase PELT accuracy for small tasks sched/fair: Fix comments sched/Documentation: Add 'sched-pelt' tool sched/fair: Fix corner case in __accumulate_sum() sched/core: Remove 'task' parameter and rename tsk_restore_flags() to current_restore_flags() ...	2017-05-01 19:12:53 -07:00
Linus Torvalds	3711c94fd6	Merge branch 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI updates from Ingo Molnar: "The main changes in this cycle were: - move BGRT handling to drivers/acpi so it can be shared between x86 and ARM - bring the EFI stub's initrd and FDT allocation logic in line with the latest changes to the arm64 boot protocol - improvements and fixes to the EFI stub's command line parsing routines - randomize the virtual mapping of the UEFI runtime services on ARM/arm64 - ... and other misc enhancements, cleanups and fixes" * 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi/libstub/arm: Don't use TASK_SIZE when randomizing the RT space ef/libstub/arm/arm64: Randomize the base of the UEFI rt services region efi/libstub/arm/arm64: Disable debug prints on 'quiet' cmdline arg efi/libstub: Unify command line param parsing efi/libstub: Fix harmless command line parsing bug efi/arm32-stub: Allow boot-time allocations in the vmlinux region x86/efi: Clean up a minor mistake in comment efi/pstore: Return error code (if any) from efi_pstore_write() efi/bgrt: Enable ACPI BGRT handling on arm64 x86/efi/bgrt: Move efi-bgrt handling out of arch/x86 efi/arm-stub: Round up FDT allocation to mapping size efi/arm-stub: Correct FDT and initrd allocation rules for arm64	2017-05-01 18:20:03 -07:00
Linus Torvalds	174ddfd5df	Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "The timer departement delivers: - more year 2038 rework - a massive rework of the arm achitected timer - preparatory patches to allow NTP correction of clock event devices to avoid early expiry - the usual pile of fixes and enhancements all over the place" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (91 commits) timer/sysclt: Restrict timer migration sysctl values to 0 and 1 arm64/arch_timer: Mark errata handlers as __maybe_unused Clocksource/mips-gic: Remove redundant non devicetree init MIPS/Malta: Probe gic-timer via devicetree clocksource: Use GENMASK_ULL in definition of CLOCKSOURCE_MASK acpi/arm64: Add SBSA Generic Watchdog support in GTDT driver clocksource: arm_arch_timer: add GTDT support for memory-mapped timer acpi/arm64: Add memory-mapped timer support in GTDT driver clocksource: arm_arch_timer: simplify ACPI support code. acpi/arm64: Add GTDT table parse driver clocksource: arm_arch_timer: split MMIO timer probing. clocksource: arm_arch_timer: add structs to describe MMIO timer clocksource: arm_arch_timer: move arch_timer_needs_of_probing into DT init call clocksource: arm_arch_timer: refactor arch_timer_needs_probing clocksource: arm_arch_timer: split dt-only rate handling x86/uv/time: Set ->min_delta_ticks and ->max_delta_ticks unicore32/time: Set ->min_delta_ticks and ->max_delta_ticks um/time: Set ->min_delta_ticks and ->max_delta_ticks tile/time: Set ->min_delta_ticks and ->max_delta_ticks score/time: Set ->min_delta_ticks and ->max_delta_ticks ...	2017-05-01 16:15:18 -07:00
Linus Torvalds	89d1cf89c8	* An EDAC driver for Cavium ThunderX RAS IP (Sergey Temerkhanov) * Removal of DRAM error reporting through PCI SERR NMI (Borislav Petkov) * Misc small fixes (Jan Glauber, Thor Thayer) -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAlkGtO0ACgkQEsHwGGHe VUoXfA/+JLgcHpI04KcvJtTMpNWE3p04xLdzw7hvgvWPLg+JDHF1jXxA4HRy7usI BAsEZIcpyk/9tzYjKm4zc/8nhlrjx/ic9cU+hZa8zCy/47uArX9HlrsxAUpgVxcx YmWzZ2gyo9Jsi/44wZwnp4dNWibvyG5ECrgis7AFOihT1qyi74YajNfqJWWUbG/H W3DkCVs2JVzelue3rI9J8f9MSZk5sL3C9vfFWxk6ifiqr+rlUphoSNFdF+mRnBdr dvk555G4Xmmz97ZiBAOM12M1trn+4lCkyfuQuMw0cZYt7F/nS7ZdLqAKK8H1KIoE mGl29p85svZRhIM25Cd759LSharAetqpNyxicjAwONwLcKiXVf2UuR5NohVj3y1f Dbrh4zRx0OVJctaAKzLEHhW3Re/VA6lU8JUuvjBytKV5fr64jBpqSXFDL8J4y7p2 RJnKNbPkoXB75LukNqxDgpL+YEnJjzlslqxLqgPVgHFtrsUjpNHAJ9rKDeJQoW3b wC2wVBZmwx+4ShyHjJePJC7C6a/gDktbDos2/XW11DHa4w8ZbZ2Q4ep9oYegBKcd szliytm0LWlUTUDVNoc9DW/ka0NAh43kjvCqcmUcfC+4lhMO28eajvj35PP7fcic hmCAQnJz6M8t1VgxO7xvWi4jAwhvbzXM5IV1O3tIDMYHJQhrLBw= =vGf1 -----END PGP SIGNATURE----- Merge tag 'edac_for_4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp Pull EDAC updates from Borislav Petkov: - an EDAC driver for Cavium ThunderX RAS IP (Sergey Temerkhanov) - removal of DRAM error reporting through PCI SERR NMI (Borislav Petkov) - misc small fixes (Jan Glauber, Thor Thayer) * tag 'edac_for_4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: EDAC, ghes: Do not enable it by default EDAC: Rename report status accessors EDAC: Delete edac_stub.c EDAC: Update Kconfig help text EDAC: Remove EDAC_MM_EDAC EDAC: Issue tracepoint only when it is defined ACPI/extlog: Add EDAC dependency EDAC: Move edac_op_state to edac_mc.c EDAC: Remove edac_err_assert EDAC: Get rid of edac_handlers x86/nmi, EDAC: Get rid of DRAM error reporting thru PCI SERR NMI EDAC, highbank: Align Makefile directives EDAC, thunderx: Remove unused code EDAC, thunderx: Change LMC index calculation EDAC, altera: Fix peripheral warnings for Cyclone5 EDAC, thunderx: Fix L2C MCI interrupt disable EDAC, thunderx: Add Cavium ThunderX EDAC driver	2017-05-01 11:36:00 -07:00
Ingo Molnar	bfb8c6e495	Merge branch 'x86/microcode' into x86/urgent, to pick up cleanup Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-05-01 13:40:23 +02:00
Linus Torvalds	97ce89f8a4	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "The final fixes for 4.11: - prevent a triple fault with function graph tracing triggered via suspend to ram - prevent optimizing for size when function graph tracing is enabled and the compiler does not support -mfentry - prevent mwaitx() being called with a zero timeout as mwaitx() might never return. Observed on the new Ryzen CPUs" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Prevent timer value 0 for MWAITX x86/build: convert function graph '-Os' error to warning ftrace/x86: Fix triple fault with graph tracing and suspend-to-ram	2017-04-30 11:44:18 -07:00
Shaohua Li	bfd20f1cc8	x86, iommu/vt-d: Add an option to disable Intel IOMMU force on IOMMU harms performance signficantly when we run very fast networking workloads. It's 40GB networking doing XDP test. Software overhead is almost unaware, but it's the IOTLB miss (based on our analysis) which kills the performance. We observed the same performance issue even with software passthrough (identity mapping), only the hardware passthrough survives. The pps with iommu (with software passthrough) is only about ~30% of that without it. This is a limitation in hardware based on our observation, so we'd like to disable the IOMMU force on, but we do want to use TBOOT and we can sacrifice the DMA security bought by IOMMU. I must admit I know nothing about TBOOT, but TBOOT guys (cc-ed) think not eabling IOMMU is totally ok. So introduce a new boot option to disable the force on. It's kind of silly we need to run into intel_iommu_init even without force on, but we need to disable TBOOT PMR registers. For system without the boot option, nothing is changed. Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2017-04-26 23:57:53 +02:00
Andy Lutomirski	9ccee2373f	x86/vm86/32: Switch to flush_tlb_mm_range() in mark_screen_rdonly() mark_screen_rdonly() is the last remaining caller of flush_tlb(). flush_tlb_mm_range() is potentially faster and isn't obsolete. Compile-tested only because I don't know whether software that uses this mechanism even exists. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nadav Amit <namit@vmware.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/791a644076fc3577ba7f7b7cafd643cc089baa7d.1492844372.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-04-26 10:02:06 +02:00
Josh Poimboeuf	262fa734a0	x86/unwind: Dump all stacks in unwind_dump() Currently unwind_dump() dumps only the most recently accessed stack. But it has a few issues. In some cases, 'first_sp' can get out of sync with 'stack_info', causing unwind_dump() to start from the wrong address, flood the printk buffer, and eventually read a bad address. In other cases, dumping only the most recently accessed stack doesn't give enough data to diagnose the error. Fix both issues by dumping all stacks involved in the trace, not just the last one. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: `8b5e99f022` ("x86/unwind: Dump stack data on warnings") Link: http://lkml.kernel.org/r/016d6a9810d7d1bfc87ef8c0e6ee041c6744c909.1493171120.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-04-26 08:19:05 +02:00
Josh Poimboeuf	b0d50c7b5d	x86/unwind: Silence more entry-code related warnings Borislav Petkov reported the following unwinder warning: WARNING: kernel stack regs at ffffc9000024fea8 in udevadm:92 has bad 'bp' value 00007fffc4614d30 unwind stack type:0 next_sp: (null) mask:0x6 graph_idx:0 ffffc9000024fea8: 000055a6100e9b38 (0x55a6100e9b38) ffffc9000024feb0: 000055a6100e9b35 (0x55a6100e9b35) ffffc9000024feb8: 000055a6100e9f68 (0x55a6100e9f68) ffffc9000024fec0: 000055a6100e9f50 (0x55a6100e9f50) ffffc9000024fec8: 00007fffc4614d30 (0x7fffc4614d30) ffffc9000024fed0: 000055a6100eaf50 (0x55a6100eaf50) ffffc9000024fed8: 0000000000000000 ... ffffc9000024fee0: 0000000000000100 (0x100) ffffc9000024fee8: ffff8801187df488 (0xffff8801187df488) ffffc9000024fef0: 00007ffffffff000 (0x7ffffffff000) ffffc9000024fef8: 0000000000000000 ... ffffc9000024ff10: ffffc9000024fe98 (0xffffc9000024fe98) ffffc9000024ff18: 00007fffc4614d00 (0x7fffc4614d00) ffffc9000024ff20: ffffffffffffff10 (0xffffffffffffff10) ffffc9000024ff28: ffffffff811c6c1f (SyS_newlstat+0xf/0x10) ffffc9000024ff30: 0000000000000010 (0x10) ffffc9000024ff38: 0000000000000296 (0x296) ffffc9000024ff40: ffffc9000024ff50 (0xffffc9000024ff50) ffffc9000024ff48: 0000000000000018 (0x18) ffffc9000024ff50: ffffffff816b2e6a (entry_SYSCALL_64_fastpath+0x18/0xa8) ... It unwinded from an interrupt which came in right after entry code called into a C syscall handler, before it had a chance to set up the frame pointer, so regs->bp still had its user space value. Add a check to silence warnings in such a case, where an interrupt has occurred and regs->sp is almost at the end of the stack. Reported-by: Borislav Petkov <bp@suse.de> Tested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: `c32c47c68a` ("x86/unwind: Warn on bad frame pointer") Link: http://lkml.kernel.org/r/c695f0d0d4c2cfe6542b90e2d0520e11eb901eb5.1493171120.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-04-26 08:19:05 +02:00
Paolo Bonzini	8afd74c296	Merge branch 'x86/process' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into HEAD Required for KVM support of the CPUID faulting feature. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-04-21 11:55:06 +02:00
Steven Rostedt (VMware)	dc912c3035	x86/ftrace: Fix ebp in ftrace_regs_caller that screws up unwinder Fengguang Wu's zero day bot triggered a stack unwinder dump. This can be easily triggered when CONFIG_FRAME_POINTERS is enabled and -mfentry is in use on x86_32. ># cd /sys/kernel/debug/tracing ># echo 'p:schedule schedule' > kprobe_events ># echo stacktrace > events/kprobes/schedule/trigger This is because the code that implemented fentry in the ftrace_regs_caller tried to use the least amount of #ifdefs, and modified ebp when CC_USE_FENTRY was defined to point to the parent ip as it does when CC_USE_FENTRY is not defined. But when CONFIG_FRAME_POINTERS is set, it corrupts the ebp register for this frame while doing the tracing. NOTE, it does not corrupt ebp in any other way. It is just a bad frame pointer when calling into the tracing infrastructure. The original ebp is restored before returning from the fentry call. But if a stack trace is performed inside the tracing, the unwinder will notice the bad ebp. Instead of toying with ebp with CC_USING_FENTRY, just slap the parent ip into the second parameter (%edx), and have an #else that does it the original way. The unwinder will unfortunately miss the function being traced, as the stack frame is not set up yet for it, as it is for x86_64. But fixing that is a bit more complex and did not work before anyway. This has been tested with and without FRAME_POINTERS being set while using -mfentry, as well as using an older compiler that uses mcount. Analyzed-by: Josh Poimboeuf <jpoimboe@redhat.com> Fixes: `644e0e8dc7` ("x86/ftrace: Add -mfentry support to x86_32 with DYNAMIC_FTRACE set") Reported-by: kernel test robot <fengguang.wu@intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lists.01.org/pipermail/lkp/2017-April/006165.html Link: http://lkml.kernel.org/r/20170420172236.7af7f6e5@gandalf.local.home Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-04-21 09:48:16 +02:00
Vikas Shivappa	4797b7dfdf	x86/intel_rdt: Return error for incorrect resource names in schemata When schemata parses the resource names it does not return an error if it detects incorrect resource names and fails quietly. This happens because for_each_enabled_rdt_resource(r) leaves "r" pointing beyond the end of the rdt_resources_all[] array, and the check for !r->name results in an out of bounds access. Split the resource parsing part into a helper function to avoid the issue. [ tglx: Made it readable by splitting the parser loop out into a function ] Reported-by: Prakhya, Sai Praneeth <sai.praneeth.prakhya@intel.com> Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Tested-by: Prakhya, Sai Praneeth <sai.praneeth.prakhya@intel.com> Cc: fenghua.yu@intel.com Cc: tony.luck@intel.com Cc: ravi.v.shankar@intel.com Cc: vikas.shivappa@intel.com Link: http://lkml.kernel.org/r/1492645804-17465-4-git-send-email-vikas.shivappa@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-04-20 15:57:59 +02:00
Vikas Shivappa	634b0e0491	x86/intel_rdt: Trim whitespace while parsing schemata input Schemata is displayed in tabular format which introduces some whitespace to show data in a tabular format. Writing back the same data fails as the parser does not handle the whitespace. Trim the leading and trailing whitespace before parsing. Reported-by: Prakhya, Sai Praneeth <sai.praneeth.prakhya@intel.com> Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Tested-by: Prakhya, Sai Praneeth <sai.praneeth.prakhya@intel.com> Cc: fenghua.yu@intel.com Cc: tony.luck@intel.com Cc: ravi.v.shankar@intel.com Cc: vikas.shivappa@intel.com Link: http://lkml.kernel.org/r/1492645804-17465-3-git-send-email-vikas.shivappa@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-04-20 15:57:59 +02:00
Vikas Shivappa	adcbdd7030	x86/intel_rdt: Fix padding when resource is enabled via mount Currently max width of 'resource name' and 'resource data' is being initialized based on 'enabled resources' during boot. But the mount can enable different capable resources at a later time which upsets the tabular format of schemata. Fix this to be based on 'all capable' resources. Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Tested-by: Prakhya, Sai Praneeth <sai.praneeth.prakhya@intel.com> Cc: fenghua.yu@intel.com Cc: tony.luck@intel.com Cc: ravi.v.shankar@intel.com Cc: vikas.shivappa@intel.com Link: http://lkml.kernel.org/r/1492645804-17465-2-git-send-email-vikas.shivappa@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-04-20 15:57:59 +02:00
Chen Yu	c0edbd4a16	x86/irq: Optimize free vector check in the CPU offline path Before offlining a CPU its required to check whether there are enough free irq vectors available so interrupts can be migrated away from the CPU. This check is executed whether its required or not and neither stops searching when the number of required free vectors are reached. Bypass the free vector check if the current CPU has no irq to migrate and leave the for_each_online_cpu() loop when the free vector count reaches the number of required vectors. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Len Brown <lenb@kernel.orq> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Link: http://lkml.kernel.org/r/1492357410-510-1-git-send-email-yu.c.chen@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-04-20 15:25:09 +02:00
Prarit Bhargava	21173d0b4d	sched/x86: Update reschedule warning text Modify the reschedule warning to output the offline CPU number and use a better debug message. Signed-off-by: Prarit Bhargava <prarit@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/1492518305-3808-1-git-send-email-prarit@redhat.com [ Tweaked the warning message. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-04-20 10:14:30 +02:00
Ingo Molnar	afa7a17f3a	Merge branch 'WIP.x86/process' into perf/core	2017-04-20 10:07:12 +02:00
Tiantian Feng	fba4f472b3	x86/reboot: Turn off KVM when halting a CPU A CPU in VMX root mode will ignore INIT signals and will fail to bring up the APs after reboot. Therefore, on a panic we disable VMX on all CPUs before rebooting or triggering kdump. Do this when halting the machine as well, in case a firmware-level reboot does not perform a cold reset for all processors. Without doing this, rebooting the host may hang. Signed-off-by: Tiantian Feng <fengtiantian@huawei.com> Signed-off-by: Xishi Qiu <qiuxishi@huawei.com> [ Rewritten commit message. ] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kvm@vger.kernel.org Link: http://lkml.kernel.org/r/20170419161839.30550-1-pbonzini@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-04-20 10:06:57 +02:00
Borislav Petkov	c6a9583fb4	x86/mce: Check MCi_STATUS[MISCV] for usable addr on Intel only mce_usable_address() does a bunch of basic sanity checks to verify whether the address reported with the error is usable for further processing. However, we do check MCi_STATUS[MISCV] and that is not needed on AMD as that bit says that there's additional information about the logged error in the MCi_MISCj banks. But we don't need that to know whether the address is usable - we only need to know whether the physical address is valid - i.e., ADDRV. On Intel the MISCV bit is needed to perform additional checks to determine whether the reported address is a physical one, etc. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Yazen Ghannam <yazen.ghannam@amd.com> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170418183924.6agjkebilwqj26or@pd.tnic Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-04-19 12:04:46 +02:00
Josh Poimboeuf	aa4f853461	x86/unwind: Remove unused 'sp' parameter in unwind_dump() The 'sp' parameter to unwind_dump() is unused. Remove it. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/08cb36b004629f6bbcf44c267ae4a609242ebd0b.1492520933.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-04-19 09:59:47 +02:00
Josh Poimboeuf	4ea3d7410c	x86/unwind: Prepend hex mask value with '0x' in unwind_dump() In unwind_dump(), the stack mask value is printed in hex, but is confusingly not prepended with '0x'. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/e7fe41be19d73c9f99f53082486473febfe08ffa.1492520933.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-04-19 09:59:47 +02:00
Josh Poimboeuf	9b135b2346	x86/unwind: Properly zero-pad 32-bit values in unwind_dump() On x86-32, 32-bit stack values printed by unwind_dump() are confusingly zero-padded to 16 characters (64 bits): unwind stack type:0 next_sp: (null) mask:a graph_idx:0 f50cdebc: 00000000f50cdec4 (0xf50cdec4) f50cdec0: 00000000c40489b7 (irq_exit+0x87/0xa0) ... Instead, base the field width on the size of a long integer so that it looks right on both x86-32 and x86-64. x86-32: unwind stack type:1 next_sp: (null) mask:0x2 graph_idx:0 c0ee9d98: c0ee9de0 (init_thread_union+0x1de0/0x2000) c0ee9d9c: c043fd90 (__save_stack_trace+0x50/0xe0) ... x86-64: unwind stack type:1 next_sp: (null) mask:0x2 graph_idx:0 ffffffff81e03b88: ffffffff81e03c10 (init_thread_union+0x3c10/0x4000) ffffffff81e03b90: ffffffff81048f8e (__save_stack_trace+0x5e/0x100) ... Reported-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/36b743812e7eb291d74af4e5067736736622daad.1492520933.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-04-19 09:59:47 +02:00
Josh Poimboeuf	a5859c6d7b	x86/build: convert function graph '-Os' error to warning For pre-4.6.0 versions of GCC, which don't have '-mfentry', the '-maccumulate-outgoing-args' option is required for function graph tracing in order to avoid GCC bug 42109. However, GCC ignores '-maccumulate-outgoing-args' when '-Os' is also set. Currently we force a build error to prevent that scenario, but that breaks randconfigs. So change the error to a warning which also disables CONFIG_CC_OPTIMIZE_FOR_SIZE. Reported-by: Andi Kleen <andi@firstfloor.org> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kbuild test robot <fengguang.wu@intel.com> Cc: kbuild-all@01.org Link: http://lkml.kernel.org/r/20170418214429.o7fbwbmf4nqosezy@treble Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-04-19 09:57:23 +02:00
Dave Airlie	856ee92e86	Linux 4.11-rc7 -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJY881cAAoJEHm+PkMAQRiGG4UH+wa2z6Qet36Uc4nXFZuSMYrO ErUWs1QpTDDv4a+LE4fgyMvM3j9XqtpfQLy1n70jfD14IqPBhHe4gytasAf+8lg1 YvddFx0Yl3sygVu3dDBNigWeVDbfwepW59coN0vI5nrMo+wrei8aVIWcFKOxdMuO n72u9vuhrkEnLJuQk7SF+t4OQob9McXE3s7QgyRopmlKhKo7mh8On7K2BRI5uluL t0j5kZM0a43EUT5rq9xR8f5pgtyfTMG/FO2MuzZn43MJcZcyfmnOP/cTSIvAKA5U 1i12lxlokYhURNUe+S6jm8A47TrqSRSJxaQJZRlfGJksZ0LJa8eUaLDCviBQEoE= =6QWZ -----END PGP SIGNATURE----- Merge tag 'v4.11-rc7' into drm-next Backmerge Linux 4.11-rc7 from Linus tree, to fix some conflicts that were causing problems with the rerere cache in drm-tip.	2017-04-19 11:07:14 +10:00
Vishal Verma	0dc9c639e6	x86/mce: Make the MCE notifier a blocking one The NFIT MCE handler callback (for handling media errors on NVDIMMs) takes a mutex to add the location of a memory error to a list. But since the notifier call chain for machine checks (x86_mce_decoder_chain) is atomic, we get a lockdep splat like: BUG: sleeping function called from invalid context at kernel/locking/mutex.c:620 in_atomic(): 1, irqs_disabled(): 0, pid: 4, name: kworker/0:0 [..] Call Trace: dump_stack ___might_sleep __might_sleep mutex_lock_nested ? __lock_acquire nfit_handle_mce notifier_call_chain atomic_notifier_call_chain ? atomic_notifier_call_chain mce_gen_pool_process Convert the notifier to a blocking one which gets to run only in process context. Boris: remove the notifier call in atomic context in print_mce(). For now, let's print the MCE on the atomic path so that we can make sure they go out and get logged at least. Fixes: `6839a6d96f` ("nfit: do an ARS scrub on hitting a latent media error") Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Acked-by: Tony Luck <tony.luck@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: x86-ml <x86@kernel.org> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20170411224457.24777-1-vishal.l.verma@intel.com Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-04-18 22:23:48 +02:00
Josh Poimboeuf	e335bb51cc	x86/unwind: Ensure stack pointer is aligned With frame pointers disabled, on some older versions of GCC (like 4.8.3), it's possible for the stack pointer to get aligned at a half-word boundary: 00000000000004d0 <fib_table_lookup>: 4d0: 41 57 push %r15 4d2: 41 56 push %r14 4d4: 41 55 push %r13 4d6: 41 54 push %r12 4d8: 55 push %rbp 4d9: 53 push %rbx 4da: 48 83 ec 24 sub $0x24,%rsp In such a case, the unwinder ends up reading the entire stack at the wrong alignment. Then the last read goes past the end of the stack, hitting the stack guard page: BUG: stack guard page was hit at ffffc900217c4000 (stack is ffffc900217c0000..ffffc900217c3fff) kernel stack overflow (page fault): 0000 [#1] SMP ... Fix it by ensuring the stack pointer is properly aligned before unwinding. Reported-by: Jirka Hladky <jhladky@redhat.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: `7c7900f897` ("x86/unwind: Add new unwind interface and implementations") Link: http://lkml.kernel.org/r/cff33847cc9b02fa548625aa23268ac574460d8d.1492436590.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-04-18 10:30:23 +02:00
Borislav Petkov	415601b191	x86/mce: Update notifier priority check Update the check which enforces the registration of MCE decoder notifier callbacks with valid priority only, to include mcelog's priority. Reported-by: kernel test robot <xiaolong.ye@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: lkp@01.org Link: http://lkml.kernel.org/r/20170418073820.i6kl5tggcntwlisa@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-04-18 10:27:52 +02:00
Dou Liyang	0ccecd95e7	x86/irq: Remove a redundant #ifdef directive The call to irq_ctx_init() is wrapped in #ifdef CONFIG_X86_32. The declaration of irq_ctx_init in irq.h provides already a stub inline for the X86_32=n case. Remove the redundant #ifdef in the code. [ tglx: Massaged changelog ] Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Link: http://lkml.kernel.org/r/1491811500-30307-1-git-send-email-douly.fnst@cn.fujitsu.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-04-14 22:43:01 +02:00
Nicolai Stange	747d04b30e	x86/apic/timer: Set ->min_delta_ticks and ->max_delta_ticks In preparation for making the clockevents core NTP correction aware, all clockevent device drivers must set ->min_delta_ticks and ->max_delta_ticks rather than ->min_delta_ns and ->max_delta_ns: a clockevent device's rate is going to change dynamically and thus, the ratio of ns to ticks ceases to stay invariant. Make the x86 arch's apic clockevent driver initialize these fields properly. This patch alone doesn't introduce any change in functionality as the clockevents core still looks exclusively at the (untouched) ->min_delta_ns and ->max_delta_ns. As soon as this has changed, a followup patch will purge the initialization of ->min_delta_ns and ->max_delta_ns from this driver. Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Stephen Boyd <sboyd@codeaurora.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org CC: Dou Liyang <douly.fnst@cn.fujitsu.com> Cc: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Nicolai Stange <nicstange@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org>	2017-04-14 13:11:17 -07:00
Vikas Shivappa	64e8ed3d4a	x86/intel_rdt/mba: Add schemata file support for MBA Add support to update the MBA bandwidth values for the domains via the schemata file. - Verify that the bandwidth value is valid - Round to the next control step depending on the bandwidth granularity of the hardware - Convert the bandwidth to delay values and write the delay values to the corresponding domain PQOS_MSRs. [ tglx: Massaged changelog ] Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: vikas.shivappa@intel.com Link: http://lkml.kernel.org/r/1491611637-20417-9-git-send-email-vikas.shivappa@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-04-14 16:10:09 +02:00
Vikas Shivappa	c6ea67de52	x86/intel_rdt: Make schemata file parsers resource specific The schemata files are the user space interface to update resource controls. The parser is hardwired to support only cache resources, which do not fit the requirements of memory resources. Add a function pointer for a parser to the struct rdt_resource and switch the cache parsing over. [ tglx: Massaged changelog ] Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: vikas.shivappa@intel.com Link: http://lkml.kernel.org/r/1491611637-20417-8-git-send-email-vikas.shivappa@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-04-14 16:10:09 +02:00
Vikas Shivappa	db69ef6563	x86/intel_rdt/mba: Add info directory files for Memory Bandwidth Allocation The files in the info directory for MBA are as follows: num_closids The maximum number of CLOSids available for MBA min_bandwidth The minimum memory bandwidth percentage value bandwidth_gran The granularity of the bandwidth control in percent for the particular CPU SKU. Intermediate values entered are rounded off to the previous control step available. Available bandwidth control steps are minimum_bandwidth + N * bandwidth_gran. delay_linear When set, the OS writes a linear percentage based value to the control MSRs ranging from minimum_bandwidth to 100 percent. This value is informational and has no influence on the values written to the schemata files. The values written to the schemata are always bandwidth percentage that is requested. Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: vikas.shivappa@intel.com Link: http://lkml.kernel.org/r/1491611637-20417-7-git-send-email-vikas.shivappa@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-04-14 16:10:08 +02:00
Vikas Shivappa	6a507a6ad8	x86/intel_rdt: Make information files resource specific Cache allocation and memory bandwidth allocation require different information files in the resctrl/info directory, but the current implementation does not allow to have files per resource. Add the necessary fields to the resource struct and assign the files dynamically depending on the resource type. Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: vikas.shivappa@intel.com Link: http://lkml.kernel.org/r/1491611637-20417-6-git-send-email-vikas.shivappa@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-04-14 16:10:08 +02:00
Vikas Shivappa	05b93417ce	x86/intel_rdt/mba: Add primary support for Memory Bandwidth Allocation (MBA) The MBA feature details like minimum bandwidth supported, bandwidth granularity etc are obtained via executing CPUID with EAX=10H ,ECX=3. Setup and initialize the MBA specific extensions to data structures like global list of RDT resources, RDT resource structure and RDT domain structure. [ tglx: Split out the seperate structure and the CBM related parts ] Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: vikas.shivappa@intel.com Link: http://lkml.kernel.org/r/1491611637-20417-5-git-send-email-vikas.shivappa@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-04-14 16:10:08 +02:00
Vikas Shivappa	ab66a33b03	x86/intel_rdt/mba: Memory bandwith allocation feature detect Detect MBA feature if CPUID.(EAX=10H, ECX=0):EBX.L2[bit 3] = 1. Add supporting data structures to detect feature details which is done in later patch using CPUID with EAX=10H, ECX= 3. Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: vikas.shivappa@intel.com Link: http://lkml.kernel.org/r/1491611637-20417-4-git-send-email-vikas.shivappa@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-04-14 16:10:07 +02:00
Thomas Gleixner	0921c54769	x86/intel_rdt: Add resource specific msr update function Updating of Cache and Memory bandwidth QOS MSRs is different. Add a function pointer to struct rdt_resource and convert the cache part over. Based on Vikas all in one patch^Wmess. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: vikas.shivappa@intel.com	2017-04-14 16:10:07 +02:00
Thomas Gleixner	d3e11b4d6f	x86/intel_rdt: Move CBM specific data into a struct Memory bandwidth allocation requires different information than cache allocation. To avoid a lump of data in struct rdt_resource, move all cache related information into a seperate structure and add that to struct rdt_resource. Sanitize the data types while at it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: vikas.shivappa@intel.com	2017-04-14 16:10:07 +02:00
Vikas Shivappa	2545e9f51e	x86/intel_rdt: Cleanup namespace to support multiple resource types Lot of data structures and functions are named after cache specific resources(named after cbm, cache etc). In many cases other non cache resources may need to share the same data structures/functions. Generalize such naming to prepare to add more resources like memory bandwidth. Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: vikas.shivappa@intel.com Link: http://lkml.kernel.org/r/1491611637-20417-3-git-send-email-vikas.shivappa@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-04-14 16:10:07 +02:00
Thomas Gleixner	70a1ee9256	x86/intel_rdt: Organize code properly Having init functions at random places in the middle of the code is unintuitive. Move them close to the init routine and mark them __init. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: vikas.shivappa@intel.com	2017-04-14 16:10:06 +02:00
Thomas Gleixner	06b57e4550	x86/intel_rdt: Init padding only if a device exists If no device exists it's pointless to calculate the padding data for the schemata files. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: vikas.shivappa@intel.com	2017-04-14 16:10:06 +02:00
Hans de Goede	7ee06cb2f8	x86: i8259: export legacy_pic symbol The classic PC rtc-coms driver has a workaround for broken ACPI device nodes for it which lack an irq resource. This workaround used to unconditionally hardcode the irq to 8 in these cases. This was causing irq conflict problems on systems without a legacy-pic so a recent patch added an if (nr_legacy_irqs()) guard to the workaround to avoid this irq conflict. nr_legacy_irqs() uses the legacy_pic symbol under the hood causing an undefined symbol error if the rtc-cmos code is build as a module. This commit exports the legacy_pic symbol to fix this. Cc: rtc-linux@googlegroups.com Cc: alexandre.belloni@free-electrons.com Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>	2017-04-14 12:08:51 +02:00
Josh Poimboeuf	34a477e529	ftrace/x86: Fix triple fault with graph tracing and suspend-to-ram On x86-32, with CONFIG_FIRMWARE and multiple CPUs, if you enable function graph tracing and then suspend to RAM, it will triple fault and reboot when it resumes. The first fault happens when booting a secondary CPU: startup_32_smp() load_ucode_ap() prepare_ftrace_return() ftrace_graph_is_dead() (accesses 'kill_ftrace_graph') The early head_32.S code calls into load_ucode_ap(), which has an an ftrace hook, so it calls prepare_ftrace_return(), which calls ftrace_graph_is_dead(), which tries to access the global 'kill_ftrace_graph' variable with a virtual address, causing a fault because the CPU is still in real mode. The fix is to add a check in prepare_ftrace_return() to make sure it's running in protected mode before continuing. The check makes sure the stack pointer is a virtual kernel address. It's a bit of a hack, but it's not very intrusive and it works well enough. For reference, here are a few other (more difficult) ways this could have potentially been fixed: - Move startup_32_smp()'s call to load_ucode_ap() down to after paging is enabled. (No idea what that would break.) - Track down load_ucode_ap()'s entire callee tree and mark all the functions 'notrace'. (Probably not realistic.) - Pause graph tracing in ftrace_suspend_notifier_call() or bringup_cpu() or __cpu_up(), and ensure that the pause facility can be queried from real mode. Reported-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: Paul Menzel <pmenzel@molgen.mpg.de> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net> Cc: linux-acpi@vger.kernel.org Cc: Borislav Petkov <bp@alien8.de> Cc: stable@kernel.org Cc: Len Brown <lenb@kernel.org> Link: http://lkml.kernel.org/r/5c1272269a580660703ed2eccf44308e790c7a98.1492123841.git.jpoimboe@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-04-14 11:48:51 +02:00
Colin King	ace2fb5a8b	x86/boot/e820: Remove a redundant self assignment Remove a redundant self assignment of table->nr_entries, it does nothing and is an artifact of code simplification re-work. Detected by CoverityScan, CID#1428450 ("Self assignment") Fixes: `441ac2f33d` ("x86/boot/e820: Simplify e820__update_table()") Signed-off-by: Colin Ian King <colin.king@canonical.com> Cc: kernel-janitors@vger.kernel.org Cc: Denys Vlasenko <dvlasenk@redhat.com> Link: http://lkml.kernel.org/r/20170413155912.12078-1-colin.king@canonical.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-04-14 11:43:21 +02:00
Piotr Luc	9ea74f7c70	x86/mce: Enable PPIN for Knights Landing/Mill Intel Xeon Phi processors (KNL and KNM) support PPIN as well, so add their CPUIDs to the whitelist of supported processors. Signed-off-by: Piotr Luc <piotr.luc@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170408172004.8463-1-piotr.luc@intel.com Link: http://lkml.kernel.org/r/20170413201056.10525-1-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-04-14 10:46:12 +02:00
Josh Poimboeuf	a8b7a92318	x86/unwind: Silence entry-related warnings A few people have reported unwinder warnings like the following: WARNING: kernel stack frame pointer at ffffc90000fe7ff0 in rsync:1157 has bad value (null) unwind stack type:0 next_sp: (null) mask:2 graph_idx:0 ffffc90000fe7f98: ffffc90000fe7ff0 (0xffffc90000fe7ff0) ffffc90000fe7fa0: ffffffffb7000f56 (trace_hardirqs_off_thunk+0x1a/0x1c) ffffc90000fe7fa8: 0000000000000246 (0x246) ffffc90000fe7fb0: 0000000000000000 ... ffffc90000fe7fc0: 00007ffe3af639bc (0x7ffe3af639bc) ffffc90000fe7fc8: 0000000000000006 (0x6) ffffc90000fe7fd0: 00007f80af433fc5 (0x7f80af433fc5) ffffc90000fe7fd8: 00007ffe3af638e0 (0x7ffe3af638e0) ffffc90000fe7fe0: 00007ffe3af638e0 (0x7ffe3af638e0) ffffc90000fe7fe8: 00007ffe3af63970 (0x7ffe3af63970) ffffc90000fe7ff0: 0000000000000000 ... ffffc90000fe7ff8: ffffffffb7b74b9a (entry_SYSCALL_64_after_swapgs+0x17/0x4f) This warning can happen when unwinding a code path where an interrupt occurred in x86 entry code before it set up the first stack frame. Silently ignore any warnings for this case. Reported-by: Daniel Borkmann <daniel@iogearbox.net> Reported-by: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: `c32c47c68a` ("x86/unwind: Warn on bad frame pointer") Link: http://lkml.kernel.org/r/dbd6838826466a60dc23a52098185bc973ce2f1e.1492020577.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-04-14 10:20:06 +02:00
Josh Poimboeuf	6bcdf9d51b	x86/unwind: Read stack return address in update_stack_state() Instead of reading the return address when unwind_get_return_address() is called, read it from update_stack_state() and store it in the unwind state. This enables the next patch to check the return address from unwind_next_frame() so it can detect an entry code frame. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/af0c5e4560c49c0343dca486ea26c4fa92bc4e35.1492020577.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-04-14 10:19:59 +02:00
Josh Poimboeuf	5ed8d8bb38	x86/unwind: Move common code into update_stack_state() The __unwind_start() and unwind_next_frame() functions have some duplicated functionality. They both call decode_frame_pointer() and set state->regs and state->bp accordingly. Move that functionality to a common place in update_stack_state(). Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/a2ee4801113f6d2300d58f08f6b69f85edf4eb43.1492020577.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-04-14 10:19:49 +02:00
Wanpeng Li	5a48a62271	x86/kvm: virt_xxx memory barriers instead of mandatory barriers virt_xxx memory barriers are implemented trivially using the low-level __smp_xxx macros, __smp_xxx is equal to a compiler barrier for strong TSO memory model, however, mandatory barriers will unconditional add memory barriers, this patch replaces the rmb() in kvm_steal_clock() by virt_rmb(). Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2017-04-12 20:17:38 +02:00
Masami Hiramatsu	a8d11cd071	kprobes/x86: Consolidate insn decoder users for copying code Consolidate x86 instruction decoder users on the path of copying original code for kprobes. Kprobes decodes the same instruction a maximum of 3 times when preparing the instruction buffer: - The first time for getting the length of the instruction, - the 2nd for adjusting displacement, - and the 3rd for checking whether the instruction is boostable or not. For each time, the actual decoding target address is slightly different (1st is original address or recovered instruction buffer, 2nd and 3rd are pointing to the copied buffer), but all have the same instruction. Thus, this patch also changes the target address to the copied buffer at first and reuses the decoded "insn" for displacement adjusting and checking boostability. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David S . Miller <davem@davemloft.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ye Xiaolong <xiaolong.ye@intel.com> Link: http://lkml.kernel.org/r/149076389643.22469.13151892839998777373.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-04-12 09:23:47 +02:00
Masami Hiramatsu	ea1e34fc36	kprobes/x86: Use probe_kernel_read() instead of memcpy() Use probe_kernel_read() for avoiding unexpected faults while copying kernel text in __recover_probed_insn(), __recover_optprobed_insn() and __copy_instruction(). Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David S . Miller <davem@davemloft.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ye Xiaolong <xiaolong.ye@intel.com> Link: http://lkml.kernel.org/r/149076382624.22469.10091613887942958518.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-04-12 09:23:47 +02:00
Masami Hiramatsu	d0381c81c2	kprobes/x86: Set kprobes pages read-only Set the pages which is used for kprobes' singlestep buffer and optprobe's trampoline instruction buffer to readonly. This can prevent unexpected (or unintended) instruction modification. This also passes rodata_test as below. Without this patch, rodata_test shows a warning: WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:235 note_page+0x7a9/0xa20 x86/mm: Found insecure W+X mapping at address ffffffffa0000000/0xffffffffa0000000 With this fix, no W+X pages are found: x86/mm: Checked W+X mappings: passed, no W+X pages found. rodata_test: all tests were successful Reported-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David S . Miller <davem@davemloft.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ye Xiaolong <xiaolong.ye@intel.com> Link: http://lkml.kernel.org/r/149076375592.22469.14174394514338612247.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-04-12 09:23:46 +02:00
Masami Hiramatsu	490154bc68	kprobes/x86: Make boostable flag boolean Make arch_specific_insn.boostable to boolean, since it has only 2 states, boostable or not. So it is better to use boolean from the viewpoint of code readability. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David S . Miller <davem@davemloft.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ye Xiaolong <xiaolong.ye@intel.com> Link: http://lkml.kernel.org/r/149076368566.22469.6322906866458231844.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-04-12 09:23:46 +02:00
Masami Hiramatsu	804dec5bda	kprobes/x86: Do not modify singlestep buffer while resuming Do not modify singlestep execution buffer (kprobe.ainsn.insn) while resuming from single-stepping, instead, modifies the buffer to add a jump back instruction at preparing buffer. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David S . Miller <davem@davemloft.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ye Xiaolong <xiaolong.ye@intel.com> Link: http://lkml.kernel.org/r/149076361560.22469.1610155860343077495.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-04-12 09:23:45 +02:00
Masami Hiramatsu	17880e4d57	kprobes/x86: Use instruction decoder for booster Use x86 instruction decoder for checking whether the probed instruction is able to boost or not, instead of hand-written code. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David S . Miller <davem@davemloft.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ye Xiaolong <xiaolong.ye@intel.com> Link: http://lkml.kernel.org/r/149076354563.22469.13379472209338986858.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-04-12 09:23:45 +02:00
Masami Hiramatsu	129d17e8e8	kprobes/x86: Fix the description of __copy_instruction() Fix the description comment of __copy_instruction() function since it has already been changed to return the length of the copied instruction. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David S . Miller <davem@davemloft.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ye Xiaolong <xiaolong.ye@intel.com> Link: http://lkml.kernel.org/r/149076347582.22469.3775133607244923462.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-04-12 09:23:45 +02:00
Masami Hiramatsu	bd0b90676c	kprobes/x86: Fix kprobe-booster not to boost far call instructions Fix the kprobe-booster not to boost far call instruction, because a call may store the address in the single-step execution buffer to the stack, which should be modified after single stepping. Currently, this instruction will be filtered as not boostable in resume_execution(), so this is not a critical issue. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David S . Miller <davem@davemloft.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ye Xiaolong <xiaolong.ye@intel.com> Link: http://lkml.kernel.org/r/149076340615.22469.14066273186134229909.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-04-12 09:23:44 +02:00
Ingo Molnar	b6466d53af	Merge branch 'x86/urgent' into x86/cpu, to resolve conflict Conflicts: arch/x86/kernel/cpu/intel_rdt_schemata.c Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-04-11 10:47:28 +02:00
Jiri Olsa	7f00f38871	x86/intel_rdt: Fix locking in rdtgroup_schemata_write() The schemata lock is released before freeing the resource's temporary tmp_cbms allocation. That's racy versus another write which allocates and uses new temporary storage, resulting in memory leaks, freeing in use memory, double a free or any combination of those. Move the unlock after the release code. Fixes: `60ec2440c6` ("x86/intel_rdt: Add schemata file") Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Shaohua Li <shli@fb.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20170411071446.15241-1-jolsa@kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-04-11 09:48:12 +02:00
Markus Trippelsdorf	1c99a68741	x86/debug: Fix the printk() debug output of signal_fault(), do_trap() and do_general_protection() Since commit: `4bcc595ccd` "printk: reinstate KERN_CONT for printing" ... the debug output of signal_fault(), do_trap() and do_general_protection() looks garbled, e.g.: traps: conftest[9335] trap invalid opcode ip:400428 sp:7ffeaba1b0d8 error:0 in conftest[400000+1000] (note the unintended line break.) Fix the bug by adding KERN_CONTs. Signed-off-by: Markus Trippelsdorf <markus@trippelsdorf.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-04-11 09:11:13 +02:00
Ingo Molnar	e5185a76a2	Merge branch 'x86/boot' into x86/mm, to avoid conflict There's a conflict between ongoing level-5 paging support and the E820 rewrite. Since the E820 rewrite is essentially ready, merge it into x86/mm to reduce tree conflicts. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-04-11 08:56:05 +02:00
Ingo Molnar	4729277156	Merge branch 'WIP.x86/boot' into x86/boot, to pick up ready branch The E820 rework in WIP.x86/boot has gone through a couple of weeks of exposure in -tip, merge it in a wider fashion. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-04-11 08:49:31 +02:00
Dave Airlie	b769fefb68	Linux 4.11-rc6 -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJY6mY1AAoJEHm+PkMAQRiGB14IAImsH28JPjxJVDasMIRPBxVc euPPlZgoBieu7sNt+kEsEqdkXuu0MLk6gln0IGxWLeoB2S+u3Tz5LMa2YArVqV9Z tWzOnI9auE73P2Pz/tUMOdyMs5tO0PolQxX3uljbULBozOHjHRh13fsXchX2yQvl mFeFCDqpPV0KhWRH/ciA8uIHdvYPhMpkKgRtmR8jXL0yzqLp6+2J+Bs8nHG4NNng HMVxZPC8jOE/TgWq6k/GmXgxh3H/AideFdHFbLKYnIFJW41ZGOI8a262zq3NmjPd lywpVU7O7RMhSITY5PnuR3LpNV8ftw1hz2y6t35unyFK1P02adOSj5GJ3hGdhaQ= =Xz5O -----END PGP SIGNATURE----- Backmerge tag 'v4.11-rc6' into drm-next Linux 4.11-rc6 drm-misc needs 4.11-rc5, may as well fix conflicts with rc6.	2017-04-11 07:40:42 +10:00
Jiri Olsa	4ffa3c977b	x86/intel_rdt: Add cpus_list rdtgroup file The resource control filesystem provides only a bitmask based cpus file for assigning CPUs to a resource group. That's cumbersome with large cpumasks and non-intuitive when modifying the file from the command line. Range based cpu lists are commonly used along with bitmask based cpu files in various subsystems throughout the kernel. Add 'cpus_list' file which is CPU range based. # cd /sys/fs/resctrl/ # echo 1-10 > krava/cpus_list # cat krava/cpus_list 1-10 # cat krava/cpus 0007fe # cat cpus fffff9 # cat cpus_list 0,3-23 [ tglx: Massaged changelog and replaced "bitmask lists" by "CPU ranges" ] Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Shaohua Li <shli@fb.com> Link: http://lkml.kernel.org/r/20170410145232.GF25354@krava Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-04-10 19:10:25 +02:00
Borislav Petkov	db47d5f856	x86/nmi, EDAC: Get rid of DRAM error reporting thru PCI SERR NMI Apparently, some machines used to report DRAM errors through a PCI SERR NMI. This is why we have a call into EDAC in the NMI handler. See `c0d1217202` ("drivers/edac: add new nmi rescan"). From looking at the patch above, that's two drivers: e752x_edac.c and e7xxx_edac.c. Now, I wanna say those are old machines which are probably decommissioned already. Tony says that "[t]the newest CPU supported by either of those drivers is the Xeon E7520 (a.k.a. "Nehalem") released in Q1'2010. Possibly some folks are still using these ... but people that hold onto h/w for 7 years generally cling to old s/w too ... so I'd guess it unlikely that we will get complaints for breaking these in upstream." So even if there is a small number still in use, we did load EDAC with edac_op_state == EDAC_OPSTATE_POLL by default (we still do, in fact) which means a default EDAC setup without any parameters supplied on the command line or otherwise would never even log the error in the NMI handler because we're polling by default: inline int edac_handler_set(void) { if (edac_op_state == EDAC_OPSTATE_POLL) return 0; return atomic_read(&edac_handlers); } So, long story short, I'd like to get rid of that nastiness called edac_stub.c and confine all the EDAC drivers solely to drivers/edac/. If we ever have to do stuff like that again, it should be notifiers we're using and not some insanity like this one. Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com>	2017-04-10 17:13:48 +02:00
K. Y. Srinivasan	a33fd4c27b	Drivers: hv: Issue explicit EOI when autoeoi is not enabled When auto EOI is not enabled; issue an explicit EOI for hyper-v interrupts. Fixes: `6c248aad81` ("Drivers: hv: Base autoeoi enablement based on hypervisor hints") Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-04-08 18:07:51 +02:00
Vikas Shivappa	de016df88f	x86/intel_rdt: Update schemata read to show data in tabular format The schemata file displays data from different resources on all domains. Its cumbersome to read since they are not tabular and data/names could be of different widths. Make the schemata file to display data in a tabular format thereby making it nice and simple to read. Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: vikas.shivappa@intel.com Cc: h.peter.anvin@intel.com Link: http://lkml.kernel.org/r/1491255857-17213-4-git-send-email-vikas.shivappa@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-04-05 17:22:31 +02:00
Tony Luck	c4026b7b95	x86/intel_rdt: Implement "update" mode when writing schemata file The schemata file can have multiple lines and it is cumbersome to update all lines. Remove code that requires that the user provides values for every resource (in the right order). If the user provides values for just a few resources, update them and leave the rest unchanged. Side benefit: we now check which values were updated and only send IPIs to cpus that actually have updates. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Tested-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Cc: ravi.v.shankar@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: vikas.shivappa@intel.com Cc: h.peter.anvin@intel.com Link: http://lkml.kernel.org/r/1491255857-17213-3-git-send-email-vikas.shivappa@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-04-05 17:22:31 +02:00
Bhupesh Sharma	6e7300cff1	efi/bgrt: Enable ACPI BGRT handling on arm64 Now that the ACPI BGRT handling code has been made generic, we can enable it for arm64. Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com> [ Updated commit log to reflect that BGRT is only enabled for arm64, and added missing 'return' statement to the dummy acpi_parse_bgrt() function. ] Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/20170404160245.27812-8-ard.biesheuvel@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-04-05 12:27:25 +02:00
Joerg Roedel	cfac6dfa42	x86/signals: Fix lower/upper bound reporting in compat siginfo Put the right values from the original siginfo into the userspace compat-siginfo. This fixes the 32-bit MPX "tabletest" testcase on 64-bit kernels. Signed-off-by: Joerg Roedel <jroedel@suse.de> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: <stable@vger.kernel.org> # v4.8+ Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: `a4455082dc` ('x86/signals: Add missing signal_compat code for x86 features') Link: http://lkml.kernel.org/r/1491322501-5054-1-git-send-email-joro@8bytes.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-04-05 10:16:43 +02:00
Kirill A. Shutemov	1d33b21956	x86/espfix: Add support for 5-level paging We don't need extra virtual address space for ESPFIX, so it stays within one PUD page table for both 4- and 5-level paging. Redefining ESPFIX_BASE_ADDR using P4D_SHIFT instead of PGDIR_SHIFT would make it stay in the same place regarding of paging mode. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170330080731.65421-8-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-04-04 08:22:34 +02:00
Kirill A. Shutemov	335437fbf7	x86/paravirt: Add 5-level support to the paravirt code Add operations to allocate/release p4ds. Xen requires more work. We will need to come back to it. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170330080731.65421-5-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-04-04 08:22:34 +02:00
Linus Torvalds	3ccfcdc9ef	Merge branch 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS fix from Thomas Gleixner: "Prevent dmesg from being spammed when MCE logging is active" * 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Don't print MCEs when mcelog is active	2017-04-03 08:36:24 -07:00
Ingo Molnar	7f75540ff2	Linux 4.11-rc5 -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJY4ZYkAAoJEHm+PkMAQRiGsq4H/R4PMXDoe2XhSSk7IoT97pXV /A8np/scAPjzEgYUidbb54OSqWwsPRuPGWONTFeSrE2u0L4wln/REI91jg7QetLq IisncExlYeJ/XQ+iO0ZZh9fLbqwIlEJFdSXmyIFr3m/TBxe8a61C8j93oNgM1tHT yuwzlq7c3sLq2hsmUG2HyL2kJsEfRasv4Rk0yhFuti12zVsBoTW4qmZuMauq+gdf f7cSYgiHhPTdb2o+azg5O7uYNHaQQBxdUMlIuhhYtVOUq+pFDO23SLHSFIW2NwOm Zn5R6CFSrLsCw0Bx0v8Xlc151QUbaRK4h9lhUhkBr6d3uNShU1NQ9JojpSvYwBo= =vP6E -----END PGP SIGNATURE----- Merge tag 'v4.11-rc5' into x86/mm, to refresh the branch Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-04-03 16:36:32 +02:00
Peter Zijlstra	b5effd3815	debug: Fix __bug_table[] in arch linker scripts The kbuild test robot reported this build failure on a number of architectures: > make.cross ARCH=arm > lib/lib.a(bug.o): In function `find_bug': > >> lib/bug.c:135: undefined reference to `__start___bug_table' > >> lib/bug.c:135: undefined reference to `__stop___bug_table' Caused by: `19d436268d` ("debug: Add _ONCE() logic to report_bug()") Which moved the BUG_TABLE from RO_DATA_SECTION() to RW_DATA_SECTION(), but a number of architectures don't use RW_DATA_SECTION(), so they ended up with no __bug_table[] ... Ideally all those would use RW_DATA_SECTION() in their linker scripts, but that's for another day. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kbuild test robot <fengguang.wu@intel.com> Cc: kbuild-all@01.org Cc: tipbuild@zytor.com Link: http://lkml.kernel.org/r/20170330154927.o6qmgfp4bdhrajbm@hirez.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-04-03 10:22:40 +02:00
Linus Torvalds	496dcc5091	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "This update provides: - prevent KASLR from randomizing EFI regions - restrict the usage of -maccumulate-outgoing-args and document when and why it is required. - make the Global Physical Address calculation for UV4 systems work correctly. - address a copy->paste->forgot-edit problem in the MCE exception table entries. - assign a name to AMD MCA bank 3, so the sysfs file registration works. - add a missing include in the boot code" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot: Include missing header file x86/mce/AMD: Give a name to MCA bank 3 when accessed with legacy MSRs x86/build: Mostly disable '-maccumulate-outgoing-args' x86/mm/KASLR: Exclude EFI region from KASLR VA space randomization x86/mce: Fix copy/paste error in exception table entries x86/platform/uv: Fix calculation of Global Physical Address	2017-04-02 09:27:02 -07:00
Dmitry Safonov	ada26481df	x86/mm: Make in_compat_syscall() work during exec The x86 mmap() code selects the mmap base for an allocation depending on the bitness of the syscall. For 64bit sycalls it select mm->mmap_base and for 32bit mm->mmap_compat_base. On execve the registers of the task invoking exec() are copied to the child pt_regs. So child->pt_regs->orig_ax contains the execve syscall number of the parent. exec() calls mmap() which in turn uses in_compat_syscall() to check whether the mapping is for a 32bit or a 64bit task. The decision is made on the following criteria: ia32 child->thread.status & TS_COMPAT x32 child->pt_regs.orig_ax & __X32_SYSCALL_BIT ia64 !ia32 && !x32 child->thread.status is corretly set up in set_personality_(), but the syscall number in child->pt_regs.orig_ax is left unmodified. Therefore the parent/child combinations work or fail in the following way: Parent Child Child->thread_status child->pt_regs.orig_ax in_compat() Works ia64 ia64 TS_COMPAT == 0 __X32_SYSCALL_BIT == 0 false Y ia64 ia32 TS_COMPAT == 1 __X32_SYSCALL_BIT == 0 true Y ia64 x32 TS_COMPAT == 0 __X32_SYSCALL_BIT == 0 false N ia32 ia64 TS_COMPAT == 0 __X32_SYSCALL_BIT == 0 false Y ia32 ia32 TS_COMPAT == 1 __X32_SYSCALL_BIT == 0 true Y ia32 x32 TS_COMPAT == 0 __X32_SYSCALL_BIT == 0 false N x32 ia64 TS_COMPAT == 0 __X32_SYSCALL_BIT == 1 true N x32 ia32 TS_COMPAT == 1 __X32_SYSCALL_BIT == 1 true Y x32 x32 TS_COMPAT == 0 __X32_SYSCALL_BIT == 1 true Y Make set_personality_() store the syscall number incl. __X32_SYSCALL_BIT which corresponds to the newly started ELF executable in the childs pt_regs, i.e. pretend that the exec was invoked from a task with the same executable format. So both thread.status and pt_regs.orig_ax correspond to the new ELF format and in_compat_syscall() returns the correct result. [ tglx: Rewrote changelog ] Fixes: commit `1b028f784e` ("x86/mm: Introduce mmap_compat_base() for 32-bit mmap()") Reported-by: Adam Borowski <kilobyte@angband.pl> Suggested-by: H. Peter Anvin <hpa@zytor.com> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: 0x7f454c46@gmail.com Cc: linux-mm@kvack.org Cc: Andrei Vagin <avagin@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Borislav Petkov <bp@suse.de> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Link: http://lkml.kernel.org/r/20170331111137.28170-1-dsafonov@virtuozzo.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-03-31 16:53:02 +02:00
Geliang Tang	352ef03ca0	x86/pci-calgary: Use setup_timer() instead of open coding it. Use setup_timer() instead of init_timer() to simplify the code. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Cc: iommu@lists.linux-foundation.org Cc: Jon Mason <jdmason@kudzu.us> Cc: Muli Ben-Yehuda <mulix@mulix.org> Link: http://lkml.kernel.org/r/e4f1888b9e4a87f6a6345f86ed23071483763b22.1490340972.git.geliangtang@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-03-31 10:21:04 +02:00
Yazen Ghannam	29f72ce3e4	x86/mce/AMD: Give a name to MCA bank 3 when accessed with legacy MSRs MCA bank 3 is reserved on systems pre-Fam17h, so it didn't have a name. However, MCA bank 3 is defined on Fam17h systems and can be accessed using legacy MSRs. Without a name we get a stack trace on Fam17h systems when trying to register sysfs files for bank 3 on kernels that don't recognize Scalable MCA. Call MCA bank 3 "decode_unit" since this is what it represents on Fam17h. This will allow kernels without SMCA support to see this bank on Fam17h+ and prevent the stack trace. This will not affect older systems since this bank is reserved on them, i.e. it'll be ignored. Tested on AMD Fam15h and Fam17h systems. WARNING: CPU: 26 PID: 1 at lib/kobject.c:210 kobject_add_internal kobject: (ffff88085bb256c0): attempted to be registered with empty name! ... Call Trace: kobject_add_internal kobject_add kobject_create_and_add threshold_create_device threshold_init_device Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1490102285-3659-1-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-03-31 10:09:44 +02:00
Josh Poimboeuf	3f135e57a4	x86/build: Mostly disable '-maccumulate-outgoing-args' The GCC '-maccumulate-outgoing-args' flag is enabled for most configs, mostly because of issues which are no longer relevant. For most configs, and with most recent versions of GCC, it's no longer needed. Clarify which cases need it, and only enable it for those cases. Also produce a compile-time error for the ftrace graph + mcount + '-Os' case, which will otherwise cause runtime failures. The main benefit of '-maccumulate-outgoing-args' is that it prevents an ugly prologue for functions which have aligned stacks. But removing the option also has some benefits: more readable argument saves, smaller text size, and (presumably) slightly improved performance. Here are the object size savings for 32-bit and 64-bit defconfig kernels: text data bss dec hex filename 10006710 3543328 1773568 15323606 e9d1d6 vmlinux.x86-32.before 9706358 3547424 1773568 15027350 e54c96 vmlinux.x86-32.after text data bss dec hex filename 10652105 4537576 843776 16033457 f4a6b1 vmlinux.x86-64.before 10639629 4537576 843776 16020981 f475f5 vmlinux.x86-64.after That comes out to a 3% text size improvement on x86-32 and a 0.1% text size improvement on x86-64. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andrew Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pavel Machek <pavel@ucw.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170316193133.zrj6gug53766m6nn@treble Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-30 11:53:04 +02:00
Ingo Molnar	73fa1362a7	Merge branch 'x86/cpu' into x86/mm, before applying dependent patch Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-30 09:07:54 +02:00
Steven Rostedt (VMware)	2b87965a1e	ftrace/x86: Do no run CPU sync when there is only one CPU online Moving enabling of function tracing to early boot, even before scheduling is enabled, means that it is not safe to enable interrupts. When function tracing was enabled at boot up, it use to happen after scheduling and the other CPUs were brought up. That required running a sync across all CPUs when modifying the function hook locations in the code. To do the synchronization, interrupts had to be enabled. Now function tracing can be started before the other CPUs are brought up, and enabling interrupts in that case is dangerous. As only tho boot CPU is active, there is no reason to run the synchronization. If the online CPU count is one, do not bother doing the synchronization. This removes the need to enable interrupts. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-03-28 12:35:16 -04:00
Borislav Petkov	32b40a82e8	x86/mce: Do not register notifiers with invalid prio This is just a defensive precaution: do not register notifiers with a priority which would disrupt the error handling in the notifiers with prio higher than MCE_PRIO_EDAC. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170327093304.10683-7-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-28 08:55:15 +02:00
Tony Luck	5de97c9f6d	x86/mce: Factor out and deprecate the /dev/mcelog driver Move all code relating to /dev/mcelog to a separate source file. /dev/mcelog driver can now operate from the machine check notifier with lowest prio. Signed-off-by: Tony Luck <tony.luck@intel.com> [ Move the mce_helper and trigger functionality behind CONFIG_X86_MCELOG_LEGACY. ] Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170327093304.10683-6-bp@alien8.de [ Renamed CONFIG_X86_MCELOG to CONFIG_X86_MCELOG_LEGACY. ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-28 08:55:01 +02:00
Borislav Petkov	011d826111	RAS: Add a Corrected Errors Collector Introduce a simple data structure for collecting correctable errors along with accessors. More detailed description in the code itself. The error decoding is done with the decoding chain now and mce_first_notifier() gets to see the error first and the CEC decides whether to log it and then the rest of the chain doesn't hear about it - basically the main reason for the CE collector - or to continue running the notifiers. When the CEC hits the action threshold, it will try to soft-offine the page containing the ECC and then the whole decoding chain gets to see the error. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170327093304.10683-5-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-28 08:54:48 +02:00
Borislav Petkov	e64edfcce9	x86/mce: Rename mce_log to mce_log_buffer It is confusing when staring at "struct mce_log mcelog" and then there's also a function called mce_log(). So call the buffer what it is. No functionality change. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170327093304.10683-4-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-28 08:54:42 +02:00
Borislav Petkov	fe3ed20fdd	x86/mce: Rename mce_log()'s argument We call it everywhere "struct mce *m". Adjust that here too to avoid confusion. No functionality change. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170327093304.10683-3-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-28 08:54:38 +02:00
Ingo Molnar	4a96d1a5f0	Merge branch 'ras/urgent' into ras/core, to pick up fix Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-28 08:54:25 +02:00
Andi Kleen	cc66afea58	x86/mce: Don't print MCEs when mcelog is active Since: `cd9c57cad3` ("x86/MCE: Dump MCE to dmesg if no consumers") all MCEs are printed even when mcelog is running. Fix the regression to not print to dmesg when mcelog is running as it is a consumer too. Signed-off-by: Andi Kleen <ak@linux.intel.com> [ Massage commit message. ] Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: stable@vger.kernel.org # 4.10.. Fixes: `cd9c57cad3` ("x86/MCE: Dump MCE to dmesg if no consumers") Link: http://lkml.kernel.org/r/20170327093304.10683-2-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-28 08:53:52 +02:00
Peter Zijlstra	9a93848fe7	x86/debug: Implement __WARN() using UD0 By using "UD0" for WARN()s we remove the function call and its possible __FILE__ and __LINE__ immediate arguments from the instruction stream. Total image size will not change much, what we win in the instruction stream we'll lose because of the __bug_table entries. Still, saves on I$ footprint and the total image size does go down a bit. text data filename 10702123 4530992 defconfig-build/vmlinux.orig 10682460 4530992 defconfig-build/vmlinux.patched (UML didn't seem to use GENERIC_BUG at all, so remove it) Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Richard Weinberger <richard.weinberger@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-27 10:20:28 +02:00
Kirill A. Shutemov	f2a6a70501	x86: Convert the rest of the code to support p4d_t This patch converts x86 to use proper folding of a new (fifth) page table level with <asm-generic/pgtable-nop4d.h>. That's a bit of a kitchen sink patch, but I don't see how to split it further without hurting bisectability. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170317185515.8636-7-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-27 08:56:58 +02:00
Kirill A. Shutemov	7f68904182	x86/kexec: Add 5-level paging support Handle additional page table level in the kexec code. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170317185515.8636-2-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-27 08:56:13 +02:00
Steven Rostedt (VMware)	1fa9d67a2f	x86/ftrace: Use Makefile logic instead of #ifdef for compiling ftrace_*.o Currently ftrace_32.S and ftrace_64.S are compiled even when CONFIG_FUNCTION_TRACER is not set. This means there's an unnecessary #ifdef to protect the code. Instead of using preprocessor directives, only compile those files when FUNCTION_TRACER is defined. Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20170316210043.peycxdxktwwn6cid@treble Link: http://lkml.kernel.org/r/20170323143446.217684991@goodmis.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-03-24 10:14:08 +01:00
Steven Rostedt (VMware)	644e0e8dc7	x86/ftrace: Add -mfentry support to x86_32 with DYNAMIC_FTRACE set x86_64 has had fentry support for some time. I did not add support to x86_32 as I was unsure if it will be used much in the future. It is still very much used, and there's issues with function graph tracing with gcc playing around with the mcount frames, causing function graph to panic. The fentry code does not have this issue, and is able to cope as there is no frame to mess up. Note, this only adds support for fentry when DYNAMIC_FTRACE is set. There's really no reason to not have that set, because the performance of the machine drops significantly when it's not enabled. Keep !DYNAMIC_FTRACE around to test it off, as there's still some archs that have FTRACE but not DYNAMIC_FTRACE. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20170323143446.052202377@goodmis.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-03-24 10:14:07 +01:00
Steven Rostedt (VMware)	ff04b440d2	x86/ftrace: Clean up ftrace_regs_caller When ftrace_regs_caller was created, it was designed to preserve flags as much as possible as it needed to act just like a breakpoint triggered on the same location. But the design is over complicated as it treated all operations as modifying flags. But push, mov and lea do not modify flags. This means the code can become more simplified by allowing flags to be stored further down. Making ftrace_regs_caller simpler will also be useful in implementing fentry logic. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20170316135328.36123c3e@gandalf.local.home Link: http://lkml.kernel.org/r/20170323143445.917292592@goodmis.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-03-24 10:14:07 +01:00
Steven Rostedt (VMware)	e6928e58d4	x86/ftrace: Add stack frame pointer to ftrace_caller The function hook ftrace_caller does not create its own stack frame, and this causes the ftrace stack trace to miss the first function when doing stack traces. # echo schedule:stacktrace > /sys/kernel/tracing/set_ftrace_filter Before: <idle>-0 [002] .N.. 29.865807: <stack trace> => cpu_startup_entry => start_secondary => startup_32_smp <...>-7 [001] .... 29.866509: <stack trace> => kthread => ret_from_fork <...>-1 [000] .... 29.865377: <stack trace> => poll_schedule_timeout => do_select => core_sys_select => SyS_select => do_fast_syscall_32 => entry_SYSENTER_32 After: <idle>-0 [002] .N.. 31.234853: <stack trace> => do_idle => cpu_startup_entry => start_secondary => startup_32_smp <...>-7 [003] .... 31.235140: <stack trace> => rcu_gp_kthread => kthread => ret_from_fork <...>-1819 [000] .... 31.264172: <stack trace> => schedule_hrtimeout_range => poll_schedule_timeout => do_sys_poll => SyS_ppoll => do_fast_syscall_32 => entry_SYSENTER_32 Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20170323143445.771707773@goodmis.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-03-24 10:14:07 +01:00
Steven Rostedt (VMware)	3d82c59c6e	x86/ftrace: Move the ftrace specific code out of entry_32.S The function tracing hook code for ftrace is not an entry point from userspace and does not belong in the entry_*.S files. It has already been moved out of entry_64.S. Move it out of entry_32.S into its own ftrace_32.S file. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20170323143445.645218946@goodmis.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-03-24 10:14:07 +01:00
Steven Rostedt (VMware)	db65d7b6dc	x86/ftrace: Rename mcount_64.S to ftrace_64.S With the advent of -mfentry that uses the new "fentry" hook over mcount, the mcount name is obsolete. Having the code file that ftrace hooks into called "mcount*.S" is rather misleading. Rename it to ftrace_64.S and remove the file name reference. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20170323143445.490601451@goodmis.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-03-24 10:14:06 +01:00
Ingo Molnar	1f9ca18404	Merge branch 'x86/process' into x86/mm, to create new base for further patches Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-23 08:28:19 +01:00
Andy Lutomirski	b23adb7d3f	x86/xen/gdt: Use X86_FEATURE_XENPV instead of globals for the GDT fixup Xen imposes special requirements on the GDT. Rather than using a global variable for the pgprot, just use an explicit special case for Xen -- this makes it clearer what's going on. It also debloats 64-bit kernels very slightly. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Garnier <thgarnie@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/e9ea96abbfd6a8c87753849171bb5987ecfeb523.1490218061.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-23 08:25:08 +01:00
Andy Lutomirski	23b2a4ddeb	x86/boot/32: Defer resyncing initial_page_table until per-cpu is set up The x86 smpboot trampoline expects initial_page_table to have the GDT mapped. If the GDT ends up in a virtually mapped per-cpu page, then it won't be in the page tables at all until perc-pu areas are set up. The result will be a triple fault the first time that the CPU attempts to access the GDT after LGDT loads the perc-pu GDT. This appears to be an old bug, but somehow the GDT fixmap rework is triggering it. This seems to have something to do with the memory layout. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Garnier <thgarnie@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/a553264a5972c6a86f9b5caac237470a0c74a720.1490218061.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-23 08:25:08 +01:00
Andy Lutomirski	aa4ea67552	x86/gdt: Fix setup_fixmap_gdt() to use the correct PA __pa() cannot be used on percpu pointers because they may be virtually mapped. Use per_cpu_ptr_to_phys() instead. This fixes a boot crash on a some 32-bit configurations. I assume this is related to which allocation strategy is chosen by the percpu core. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Garnier <thgarnie@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: `69218e4799` x86: ("Remap GDT tables in the fixmap section") Link: http://lkml.kernel.org/r/22e0069c29fba31998f193201e359eebfdac4960.1490218061.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-23 08:25:07 +01:00
Peter Zijlstra	698eff6355	sched/clock, x86/perf: Fix "perf test tsc" People reported that commit: `5680d8094f` ("sched/clock: Provide better clock continuity") broke "perf test tsc". That commit added another offset to the reported clock value; so take that into account when computing the provided offset values. Reported-by: Adrian Hunter <adrian.hunter@intel.com> Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org> Tested-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: `5680d8094f` ("sched/clock: Provide better clock continuity") Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-23 07:31:49 +01:00
Dave Airlie	65d1086c44	Linux 4.11-rc3 -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJYzznuAAoJEHm+PkMAQRiGAzMIAJDBo5otTMMLhg8eKj8Cnab4 2NyaoWDN6mtU427rzEKEfZlTtp3gIBVdFex5x442weIdw6BgRQW0dvF/uwEn08yI 9Wx7VJmIUyH9M8VmhDtkUTFrhwUGr29qb3JhENMd7tv/CiJaehGRHCT3xqo5BDdu xiyPcwSkwP/NH24TS91G87gV6r0I0oKLSAxu+KifEFESrb8gaZaduslzpEj3m/Ds o9EPpfzaiGAdW5EdNfPtviYbBk7ZOXwtxdMV+zlvsLcaqtYnFEsJZd2WyZL0zGML VXBVxaYtlyTeA7Mt8YYUL+rDHELSOtCeN5zLfxUvYt+Yc0Y6LFBLDOE5h8b3eCw= =uKUo -----END PGP SIGNATURE----- BackMerge tag 'v4.11-rc3' into drm-next Linux 4.11-rc3 as requested by Daniel	2017-03-23 12:05:13 +10:00
Mike Travis	ad4830051a	x86/platform/uv: Fix calculation of Global Physical Address The calculation of the global physical address (GPA) on UV4 is incorrect. The gnode_extra/upper global offset should only be applied for fixed address space systems (UV1..3). Tested-by: John Estabrook <john.estabrook@hpe.com> Signed-off-by: Mike Travis <mike.travis@hpe.com> Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: John Estabrook <estabrook@sgi.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russ Anderson <russ.anderson@hpe.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170321231646.667689538@asylum.americas.sgi.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-22 07:41:10 +01:00
Lu Baolu	1b5aeebf3a	x86/earlyprintk: Add support for earlyprintk via USB3 debug port Add support for earlyprintk by writing debug messages to the USB3 debug port. Users can use this type of early printk by specifying the kernel parameter of "earlyprintk=xdbc". This gives users a chance of providing debugging output. The hardware for USB3 debug port requires DMA memory blocks. This requires to delay setting up debugging hardware and registering boot console until the memblocks are filled. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathias Nyman <mathias.nyman@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-usb@vger.kernel.org Link: http://lkml.kernel.org/r/1490083293-3792-4-git-send-email-baolu.lu@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-21 12:30:16 +01:00
Lu Baolu	dd759d93f4	x86/timers: Add simple udelay calibration Add a simple udelay calibration in x86 architecture-specific boot-time initializations. This will get a workable estimate for loops_per_jiffy. Hence, udelay() could be used after this initialization. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathias Nyman <mathias.nyman@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-usb@vger.kernel.org Link: http://lkml.kernel.org/r/1490083293-3792-2-git-send-email-baolu.lu@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-21 12:28:45 +01:00
Kyle Huey	e9ea1e7f53	x86/arch_prctl: Add ARCH_[GET\|SET]_CPUID Intel supports faulting on the CPUID instruction beginning with Ivy Bridge. When enabled, the processor will fault on attempts to execute the CPUID instruction with CPL>0. Exposing this feature to userspace will allow a ptracer to trap and emulate the CPUID instruction. When supported, this feature is controlled by toggling bit 0 of MSR_MISC_FEATURES_ENABLES. It is documented in detail in Section 2.3.2 of https://bugzilla.kernel.org/attachment.cgi?id=243991 Implement a new pair of arch_prctls, available on both x86-32 and x86-64. ARCH_GET_CPUID: Returns the current CPUID state, either 0 if CPUID faulting is enabled (and thus the CPUID instruction is not available) or 1 if CPUID faulting is not enabled. ARCH_SET_CPUID: Set the CPUID state to the second argument. If cpuid_enabled is 0 CPUID faulting will be activated, otherwise it will be deactivated. Returns ENODEV if CPUID faulting is not supported on this system. The state of the CPUID faulting flag is propagated across forks, but reset upon exec. Signed-off-by: Kyle Huey <khuey@kylehuey.com> Cc: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com> Cc: kvm@vger.kernel.org Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: linux-kselftest@vger.kernel.org Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Robert O'Callahan <robert@ocallahan.org> Cc: Richard Weinberger <richard@nod.at> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Len Brown <len.brown@intel.com> Cc: Shuah Khan <shuah@kernel.org> Cc: user-mode-linux-devel@lists.sourceforge.net Cc: Jeff Dike <jdike@addtoit.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: user-mode-linux-user@lists.sourceforge.net Cc: David Matlack <dmatlack@google.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: linux-fsdevel@vger.kernel.org Cc: Paolo Bonzini <pbonzini@redhat.com> Link: http://lkml.kernel.org/r/20170320081628.18952-9-khuey@kylehuey.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-03-20 16:10:34 +01:00
Kyle Huey	90218ac77d	x86/cpufeature: Detect CPUID faulting support Intel supports faulting on the CPUID instruction beginning with Ivy Bridge. When enabled, the processor will fault on attempts to execute the CPUID instruction with CPL>0. This will allow a ptracer to emulate the CPUID instruction. Bit 31 of MSR_PLATFORM_INFO advertises support for this feature. It is documented in detail in Section 2.3.2 of https://bugzilla.kernel.org/attachment.cgi?id=243991 Detect support for this feature and expose it as X86_FEATURE_CPUID_FAULT. Signed-off-by: Kyle Huey <khuey@kylehuey.com> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com> Cc: kvm@vger.kernel.org Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: linux-kselftest@vger.kernel.org Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Robert O'Callahan <robert@ocallahan.org> Cc: Richard Weinberger <richard@nod.at> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Len Brown <len.brown@intel.com> Cc: Shuah Khan <shuah@kernel.org> Cc: user-mode-linux-devel@lists.sourceforge.net Cc: Jeff Dike <jdike@addtoit.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: user-mode-linux-user@lists.sourceforge.net Cc: David Matlack <dmatlack@google.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: linux-fsdevel@vger.kernel.org Cc: Paolo Bonzini <pbonzini@redhat.com> Link: http://lkml.kernel.org/r/20170320081628.18952-8-khuey@kylehuey.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-03-20 16:10:34 +01:00
Kyle Huey	79170fda31	x86/syscalls/32: Wire up arch_prctl on x86-32 Hook up arch_prctl to call do_arch_prctl() on x86-32, and in 32 bit compat mode on x86-64. This allows to have arch_prctls that are not specific to 64 bits. On UML, simply stub out this syscall. Signed-off-by: Kyle Huey <khuey@kylehuey.com> Cc: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com> Cc: kvm@vger.kernel.org Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: linux-kselftest@vger.kernel.org Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Robert O'Callahan <robert@ocallahan.org> Cc: Richard Weinberger <richard@nod.at> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Len Brown <len.brown@intel.com> Cc: Shuah Khan <shuah@kernel.org> Cc: user-mode-linux-devel@lists.sourceforge.net Cc: Jeff Dike <jdike@addtoit.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: user-mode-linux-user@lists.sourceforge.net Cc: David Matlack <dmatlack@google.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: linux-fsdevel@vger.kernel.org Cc: Paolo Bonzini <pbonzini@redhat.com> Link: http://lkml.kernel.org/r/20170320081628.18952-7-khuey@kylehuey.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-03-20 16:10:33 +01:00
Kyle Huey	b0b9b01401	x86/arch_prctl: Add do_arch_prctl_common() Add do_arch_prctl_common() to handle arch_prctls that are not specific to 64 bit mode. Call it from the syscall entry point, but not any of the other callsites in the kernel, which all want one of the existing 64 bit only arch_prctls. Signed-off-by: Kyle Huey <khuey@kylehuey.com> Cc: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com> Cc: kvm@vger.kernel.org Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: linux-kselftest@vger.kernel.org Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Robert O'Callahan <robert@ocallahan.org> Cc: Richard Weinberger <richard@nod.at> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Len Brown <len.brown@intel.com> Cc: Shuah Khan <shuah@kernel.org> Cc: user-mode-linux-devel@lists.sourceforge.net Cc: Jeff Dike <jdike@addtoit.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: user-mode-linux-user@lists.sourceforge.net Cc: David Matlack <dmatlack@google.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: linux-fsdevel@vger.kernel.org Cc: Paolo Bonzini <pbonzini@redhat.com> Link: http://lkml.kernel.org/r/20170320081628.18952-6-khuey@kylehuey.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-03-20 16:10:33 +01:00
Kyle Huey	17a6e1b8e8	x86/arch_prctl/64: Rename do_arch_prctl() to do_arch_prctl_64() In order to introduce new arch_prctls that are not 64 bit only, rename the existing 64 bit implementation to do_arch_prctl_64(). Also rename the second argument of that function from 'addr' to 'arg2', because it will no longer always be an address. Signed-off-by: Kyle Huey <khuey@kylehuey.com> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com> Cc: kvm@vger.kernel.org Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: linux-kselftest@vger.kernel.org Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Robert O'Callahan <robert@ocallahan.org> Cc: Richard Weinberger <richard@nod.at> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Len Brown <len.brown@intel.com> Cc: Shuah Khan <shuah@kernel.org> Cc: user-mode-linux-devel@lists.sourceforge.net Cc: Jeff Dike <jdike@addtoit.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: user-mode-linux-user@lists.sourceforge.net Cc: David Matlack <dmatlack@google.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: linux-fsdevel@vger.kernel.org Cc: Paolo Bonzini <pbonzini@redhat.com> Link: http://lkml.kernel.org/r/20170320081628.18952-5-khuey@kylehuey.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-03-20 16:10:32 +01:00
Kyle Huey	ff3f097eef	x86/arch_prctl/64: Use SYSCALL_DEFINE2 to define sys_arch_prctl() Use the SYSCALL_DEFINE2 macro instead of manually defining it. Signed-off-by: Kyle Huey <khuey@kylehuey.com> Cc: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com> Cc: kvm@vger.kernel.org Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: linux-kselftest@vger.kernel.org Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Robert O'Callahan <robert@ocallahan.org> Cc: Richard Weinberger <richard@nod.at> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Len Brown <len.brown@intel.com> Cc: Shuah Khan <shuah@kernel.org> Cc: user-mode-linux-devel@lists.sourceforge.net Cc: Jeff Dike <jdike@addtoit.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: user-mode-linux-user@lists.sourceforge.net Cc: David Matlack <dmatlack@google.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: linux-fsdevel@vger.kernel.org Cc: Paolo Bonzini <pbonzini@redhat.com> Link: http://lkml.kernel.org/r/20170320081628.18952-4-khuey@kylehuey.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-03-20 16:10:32 +01:00
Kyle Huey	dd93938a92	x86/arch_prctl: Rename 'code' argument to 'option' The x86 specific arch_prctl() arbitrarily changed prctl's 'option' to 'code'. Before adding new options, rename it. Signed-off-by: Kyle Huey <khuey@kylehuey.com> Cc: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com> Cc: kvm@vger.kernel.org Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: linux-kselftest@vger.kernel.org Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Robert O'Callahan <robert@ocallahan.org> Cc: Richard Weinberger <richard@nod.at> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Len Brown <len.brown@intel.com> Cc: Shuah Khan <shuah@kernel.org> Cc: user-mode-linux-devel@lists.sourceforge.net Cc: Jeff Dike <jdike@addtoit.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: user-mode-linux-user@lists.sourceforge.net Cc: David Matlack <dmatlack@google.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: linux-fsdevel@vger.kernel.org Cc: Paolo Bonzini <pbonzini@redhat.com> Link: http://lkml.kernel.org/r/20170320081628.18952-3-khuey@kylehuey.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-03-20 16:10:32 +01:00
Kyle Huey	ab6d946863	x86/msr: Rename MISC_FEATURE_ENABLES to MISC_FEATURES_ENABLES This matches the only public Intel documentation of this MSR, in the "Virtualization Technology FlexMigration Application Note" (preserved at https://bugzilla.kernel.org/attachment.cgi?id=243991) Signed-off-by: Kyle Huey <khuey@kylehuey.com> Cc: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com> Cc: kvm@vger.kernel.org Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: linux-kselftest@vger.kernel.org Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Robert O'Callahan <robert@ocallahan.org> Cc: Richard Weinberger <richard@nod.at> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Len Brown <len.brown@intel.com> Cc: Shuah Khan <shuah@kernel.org> Cc: user-mode-linux-devel@lists.sourceforge.net Cc: Jeff Dike <jdike@addtoit.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: user-mode-linux-user@lists.sourceforge.net Cc: David Matlack <dmatlack@google.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: linux-fsdevel@vger.kernel.org Cc: Paolo Bonzini <pbonzini@redhat.com> Link: http://lkml.kernel.org/r/20170320081628.18952-2-khuey@kylehuey.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-03-20 16:10:32 +01:00
Andy Lutomirski	5b781c7e31	x86/tls: Forcibly set the accessed bit in TLS segments For mysterious historical reasons, struct user_desc doesn't indicate whether segments are accessed. set_thread_area() has always programmed segments as non-accessed, so the first write will set the accessed bit. This will fault if the GDT is read-only. Fix it by making TLS segments start out accessed. If this ends up breaking something, we could, in principle, leave TLS segments non-accessed and fix them up when we get the page fault. I'd be surprised, though -- AFAIK all the nasty legacy segmented programs (DOSEMU, Wine, things that run on DOSEMU and Wine, etc.) do their nasty segmented things using the LDT and not the GDT. I assume this is mainly because old OSes (Linux and otherwise) didn't historically provide APIs to do nasty things in the GDT. Fixes: `45fc8757d1` ("x86: Make the GDT remapping read-only on 64-bit") Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Garnier <thgarnie@google.com> Link: http://lkml.kernel.org/r/62b7748542df0164af7e0a5231283b9b13858c45.1489900519.git.luto@kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-03-19 12:14:35 +01:00
Yazen Ghannam	5204bf1703	x86/mce: Init some CPU features early When the MCA banks in __mcheck_cpu_init_generic() are polled for leftover errors logged during boot or from the previous boot, its required to have CPU features detected sufficiently so that the reading out and handling of those early errors is done correctly. If those features are not available, the decoding may miss some information and get incomplete errors logged. For example, on SMCA systems the MCA_IPID and MCA_SYND registers are not logged and MCA_ADDR is not masked appropriately. To cure that, do a subset of the basic feature detection early while the rest happens in its usual place in __mcheck_cpu_init_vendor(). Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/1489599055-20756-1-git-send-email-Yazen.Ghannam@amd.com [ Massage commit message and simplify. ] Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-03-18 13:03:44 +01:00
Colin Ian King	cf8178f786	x86/microcode/AMD: Remove redundant NULL check on mc mc is a pointer to the static u8 array amd_ucode_patch and therefore can never be null, so the check is redundant. Remove it. Detected by CoverityScan, CID#1372871 ("Logically Dead Code") Signed-off-by: Colin Ian King <colin.king@canonical.com> Cc: kernel-janitors@vger.kernel.org Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/20170315171010.17536-1-colin.king@canonical.com Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-03-18 12:31:58 +01:00
Thomas Garnier	f991376e44	x86/mm: Correct fixmap header usage on adaptable MODULES_END This patch removes fixmap header usage on non-x86 code that was introduced by the adaptable MODULE_END change. Signed-off-by: Thomas Garnier <thgarnie@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170317175034.4701-1-thgarnie@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-18 09:48:00 +01:00
Linus Torvalds	eab60d4e5b	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "An assorted pile of fixes along with some hardware enablement: - a fix for a KASAN / branch profiling related boot failure - some more fallout of the PUD rework - a fix for the Always Running Timer which is not initialized when the TSC frequency is known at boot time (via MSR/CPUID) - a resource leak fix for the RDT filesystem - another unwinder corner case fixup - removal of the warning for duplicate NMI handlers because there are legitimate cases where more than one handler can be registered at the last level - make a function static - found by sparse - a set of updates for the Intel MID platform which got delayed due to merge ordering constraints. It's hardware enablement for a non mainstream platform, so there is no risk" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mpx: Make unnecessarily global function static x86/intel_rdt: Put group node in rdtgroup_kn_unlock x86/unwind: Fix last frame check for aligned function stacks mm, x86: Fix native_pud_clear build error x86/kasan: Fix boot with KASAN=y and PROFILE_ANNOTATED_BRANCHES=y x86/platform/intel-mid: Add power button support for Merrifield x86/platform/intel-mid: Use common power off sequence x86/platform: Remove warning message for duplicate NMI handlers x86/tsc: Fix ART for TSC_KNOWN_FREQ x86/platform/intel-mid: Correct MSI IRQ line for watchdog device	2017-03-17 14:05:03 -07:00
Linus Torvalds	ae13373319	Merge branch 'x86-acpi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 acpi fixes from Thomas Gleixner: "This update deals with the fallout of the recent work to make cpuid/node mappings persistent. It turned out that the boot time ACPI based mapping tripped over ACPI inconsistencies and caused regressions. It's partially reverted and the fragile part replaced by an implementation which makes the mapping persistent when a CPU goes online for the first time" * 'x86-acpi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: acpi/processor: Check for duplicate processor ids at hotplug time acpi/processor: Implement DEVICE operator for processor enumeration x86/acpi: Restore the order of CPU IDs Revert"x86/acpi: Enable MADT APIs to return disabled apicids" Revert "x86/acpi: Set persistent cpuid <-> nodeid mapping when booting"	2017-03-17 14:01:40 -07:00
Thomas Garnier	45fc8757d1	x86: Make the GDT remapping read-only on 64-bit This patch makes the GDT remapped pages read-only, to prevent accidental (or intentional) corruption of this key data structure. This change is done only on 64-bit, because 32-bit needs it to be writable for TSS switches. The native_load_tr_desc function was adapted to correctly handle a read-only GDT. The LTR instruction always writes to the GDT TSS entry. This generates a page fault if the GDT is read-only. This change checks if the current GDT is a remap and swap GDTs as needed. This function was tested by booting multiple machines and checking hibernation works properly. KVM SVM and VMX were adapted to use the writeable GDT. On VMX, the per-cpu variable was removed for functions to fetch the original GDT. Instead of reloading the previous GDT, VMX will reload the fixmap GDT as expected. For testing, VMs were started and restored on multiple configurations. Signed-off-by: Thomas Garnier <thgarnie@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@suse.de> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Len Brown <len.brown@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Luis R . Rodriguez <mcgrof@kernel.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Michal Hocko <mhocko@suse.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Rafael J . Wysocki <rjw@rjwysocki.net> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: kasan-dev@googlegroups.com Cc: kernel-hardening@lists.openwall.com Cc: kvm@vger.kernel.org Cc: lguest@lists.ozlabs.org Cc: linux-doc@vger.kernel.org Cc: linux-efi@vger.kernel.org Cc: linux-mm@kvack.org Cc: linux-pm@vger.kernel.org Cc: xen-devel@lists.xenproject.org Cc: zijun_hu <zijun_hu@htc.com> Link: http://lkml.kernel.org/r/20170314170508.100882-3-thgarnie@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-16 09:06:35 +01:00
Thomas Garnier	69218e4799	x86: Remap GDT tables in the fixmap section Each processor holds a GDT in its per-cpu structure. The sgdt instruction gives the base address of the current GDT. This address can be used to bypass KASLR memory randomization. With another bug, an attacker could target other per-cpu structures or deduce the base of the main memory section (PAGE_OFFSET). This patch relocates the GDT table for each processor inside the fixmap section. The space is reserved based on number of supported processors. For consistency, the remapping is done by default on 32 and 64-bit. Each processor switches to its remapped GDT at the end of initialization. For hibernation, the main processor returns with the original GDT and switches back to the remapping at completion. This patch was tested on both architectures. Hibernation and KVM were both tested specially for their usage of the GDT. Thanks to Boris Ostrovsky <boris.ostrovsky@oracle.com> for testing and recommending changes for Xen support. Signed-off-by: Thomas Garnier <thgarnie@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@suse.de> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Len Brown <len.brown@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Luis R . Rodriguez <mcgrof@kernel.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Michal Hocko <mhocko@suse.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Rafael J . Wysocki <rjw@rjwysocki.net> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: kasan-dev@googlegroups.com Cc: kernel-hardening@lists.openwall.com Cc: kvm@vger.kernel.org Cc: lguest@lists.ozlabs.org Cc: linux-doc@vger.kernel.org Cc: linux-efi@vger.kernel.org Cc: linux-mm@kvack.org Cc: linux-pm@vger.kernel.org Cc: xen-devel@lists.xenproject.org Cc: zijun_hu <zijun_hu@htc.com> Link: http://lkml.kernel.org/r/20170314170508.100882-2-thgarnie@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-16 09:06:35 +01:00
Thomas Garnier	f06bdd4001	x86/mm: Adapt MODULES_END based on fixmap section size This patch aligns MODULES_END to the beginning of the fixmap section. It optimizes the space available for both sections. The address is pre-computed based on the number of pages required by the fixmap section. It will allow GDT remapping in the fixmap section. The current MODULES_END static address does not provide enough space for the kernel to support a large number of processors. Signed-off-by: Thomas Garnier <thgarnie@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@suse.de> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Len Brown <len.brown@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Luis R . Rodriguez <mcgrof@kernel.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Michal Hocko <mhocko@suse.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Rafael J . Wysocki <rjw@rjwysocki.net> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: kasan-dev@googlegroups.com Cc: kernel-hardening@lists.openwall.com Cc: kvm@vger.kernel.org Cc: lguest@lists.ozlabs.org Cc: linux-doc@vger.kernel.org Cc: linux-efi@vger.kernel.org Cc: linux-mm@kvack.org Cc: linux-pm@vger.kernel.org Cc: xen-devel@lists.xenproject.org Cc: zijun_hu <zijun_hu@htc.com> Link: http://lkml.kernel.org/r/20170314170508.100882-1-thgarnie@google.com [ Small build fix. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-16 09:06:24 +01:00
Jiri Olsa	49ec8f5b6a	x86/intel_rdt: Put group node in rdtgroup_kn_unlock The rdtgroup_kn_unlock waits for the last user to release and put its node. But it's calling kernfs_put on the node which calls the rdtgroup_kn_unlock, which might not be the group's directory node, but another group's file node. This race could be easily reproduced by running 2 instances of following script: mount -t resctrl resctrl /sys/fs/resctrl/ pushd /sys/fs/resctrl/ mkdir krava echo "krava" > krava/schemata rmdir krava popd umount /sys/fs/resctrl It triggers the slub debug error message with following command line config: slub_debug=,kernfs_node_cache. Call kernfs_put on the group's node to fix it. Fixes: `60cf5e101f` ("x86/intel_rdt: Add mkdir to resctrl file system") Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Shaohua Li <shli@fb.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1489501253-20248-1-git-send-email-jolsa@kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-03-14 21:51:58 +01:00
Josh Poimboeuf	87a6b2975f	x86/unwind: Fix last frame check for aligned function stacks Pavel Machek reported the following warning on x86-32: WARNING: kernel stack frame pointer at f50cdf98 in swapper/2:0 has bad value (null) The warning is caused by the unwinder not realizing that it reached the end of the stack, due to an unusual prologue which gcc sometimes generates for aligned stacks. The prologue is based on a gcc feature called the Dynamic Realign Argument Pointer (DRAP). It's almost always enabled for aligned stacks when -maccumulate-outgoing-args isn't set. This issue is similar to the one fixed by the following commit: `8023e0e2a4` ("x86/unwind: Adjust last frame check for aligned function stacks") ... but that fix was specific to x86-64. Make the fix more generic to cover x86-32 as well, and also ensure that the return address referred to by the frame pointer is a copy of the original return address. Fixes: `acb4608ad1` ("x86/unwind: Create stack frames for saved syscall registers") Reported-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/50d4924db716c264b14f1633037385ec80bf89d2.1489465609.git.jpoimboe@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-03-14 21:51:57 +01:00
Dmitry Safonov	e13b73dd9c	x86/hugetlb: Adjust to the new native/compat mmap bases Commit `1b028f784e` introduced two mmap() bases for 32-bit syscalls and for 64-bit syscalls. The mmap() code in x86 was modified to handle the separation, but the patch series missed to update the hugetlb code. As a consequence a 32bit application mapping a file on hugetlbfs uses the 64-bit mmap base for address space allocation, which fails. Adjust the hugetlb mapping code to use the proper bases depending on the syscall invocation mode (64-bit or compat). [ tglx: Massaged changelog and switched from asm/compat.h to linux/compat.h ] Fixes: commit `1b028f784e` ("x86/mm: Introduce mmap_compat_base() for 32-bit mmap()") Reported-by: kernel test robot <xiaolong.ye@intel.com> Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: 0x7f454c46@gmail.com Cc: linux-mm@kvack.org Cc: Andy Lutomirski <luto@kernel.org> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Borislav Petkov <bp@suse.de> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Link: http://lkml.kernel.org/r/20170314114126.9280-1-dsafonov@virtuozzo.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-03-14 16:29:16 +01:00
Kirill A. Shutemov	e0c4f6750e	x86/mm: Convert trivial cases of page table walk to 5-level paging This patch only covers simple cases. Less trivial cases will be converted with separate patches. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170313143309.16020-3-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-14 08:45:08 +01:00
Andrey Ryabinin	be3606ff73	x86/kasan: Fix boot with KASAN=y and PROFILE_ANNOTATED_BRANCHES=y The kernel doesn't boot with both PROFILE_ANNOTATED_BRANCHES=y and KASAN=y options selected. With branch profiling enabled we end up calling ftrace_likely_update() before kasan_early_init(). ftrace_likely_update() is built with KASAN instrumentation, so calling it before kasan has been initialized leads to crash. Use DISABLE_BRANCH_PROFILING define to make sure that we don't call ftrace_likely_update() from early code before kasan_early_init(). Fixes: `ef7f0d6a6c` ("x86_64: add KASan support") Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: kasan-dev@googlegroups.com Cc: Alexander Potapenko <glider@google.com> Cc: stable@vger.kernel.org Cc: Andrew Morton <akpm@linux-foundation.org> Cc: lkp@01.org Cc: Dmitry Vyukov <dvyukov@google.com> Link: http://lkml.kernel.org/r/20170313163337.1704-1-aryabinin@virtuozzo.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-03-14 00:00:55 +01:00
Dou Liyang	5ba039a554	x86/apic: Fix a comment in init_apic_mappings() commit `c0104d38a7` ("x86, apic: Unify identical register_lapic_address() functions") renames acpi_register_lapic_address to register_lapic_address. But acpi_register_lapic_address remains in a comment, and renaming it to register_lapic_address is not suitable for this comment. Remove acpi_register_lapic_address and rewrite the comment. [ tglx: LAPIC address can be registered either by ACPI/MADT or MP info ] Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Cc: yinghai@kernel.org Link: http://lkml.kernel.org/r/1488805690-5055-1-git-send-email-douly.fnst@cn.fujitsu.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-03-13 21:42:11 +01:00
Dou Liyang	5d64d209c4	x86/apic: Remove the SET_APIC_ID(x) macro The SET_APIC_ID() macro obfusates the code. Remove it to increase readability and add a comment to the apic struct to document that the callback is required on 64-bit. Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Link: http://lkml.kernel.org/r/1488971270-14359-1-git-send-email-douly.fnst@cn.fujitsu.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-03-13 21:28:38 +01:00
Mike Travis	0d443b70cc	x86/platform: Remove warning message for duplicate NMI handlers Remove the WARNING message associated with multiple NMI handlers as there are at least two that are legitimate. These are the KGDB and the UV handlers and both want to be called if the NMI has not been claimed by any other NMI handler. Use of the UNKNOWN NMI call chain dramatically lowers the NMI call rate when high frequency NMI tools are in use, notably the perf tools. It is required on systems that cannot sustain a high NMI call rate without adversely affecting the system operation. Signed-off-by: Mike Travis <mike.travis@hpe.com> Reviewed-by: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Russ Anderson <russ.anderson@hpe.com> Cc: Frank Ramsay <frank.ramsay@hpe.com> Cc: Tony Ernst <tony.ernst@hpe.com> Link: http://lkml.kernel.org/r/20170307210841.730959611@asylum.americas.sgi.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-03-13 20:45:18 +01:00
Xunlei Pang	5bc329503e	x86/mce: Handle broadcasted MCE gracefully with kexec When we are about to kexec a crash kernel and right then and there a broadcasted MCE fires while we're still in the first kernel and while the other CPUs remain in a holding pattern, the #MC handler of the first kernel will timeout and then panic due to never completing MCE synchronization. Handle this in a similar way as to when the CPUs are offlined when that broadcasted MCE happens. [ Boris: rewrote commit message and comments. ] Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Xunlei Pang <xlpang@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Tony Luck <tony.luck@intel.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: kexec@lists.infradead.org Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1487857012-9059-1-git-send-email-xlpang@redhat.com Link: http://lkml.kernel.org/r/20170313095019.19351-1-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-03-13 20:18:07 +01:00
Peter Zijlstra	44fee88cea	x86/tsc: Fix ART for TSC_KNOWN_FREQ Subhransu reported that convert_art_to_tsc() isn't working for him. The ART to TSC relation is only set up for systems which use the refined TSC calibration. Systems with known TSC frequency (available via CPUID 15) are not using the refined calibration and therefor the ART to TSC relation is never established. Add the setup to the known frequency init path which skips ART calibration. The init code needs to be duplicated as for systems which use refined calibration the ART setup must be delayed until calibration has been done. The problem has been there since the ART support was introdduced, but only detected now because Subhransu tested the first time on hardware which has TSC frequency enumerated via CPUID 15. Note for stable: The conditional has changed from TSC_RELIABLE to TSC_KNOWN_FREQUENCY. [ tglx: Rewrote changelog and identified the proper 'Fixes' commit ] Fixes: `f9677e0f83` ("x86/tsc: Always Running Timer (ART) correlated clocksource") Reported-by: "Prusty, Subhransu S" <subhransu.s.prusty@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Cc: christopher.s.hall@intel.com Cc: kevin.b.stanton@intel.com Cc: john.stultz@linaro.org Cc: akataria@vmware.com Link: http://lkml.kernel.org/r/20170313145712.GI3312@twins.programming.kicks-ass.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-03-13 19:50:23 +01:00
Dmitry Safonov	3e6ef9c809	x86/mm: Make mmap(MAP_32BIT) work correctly mmap(MAP_32BIT) is broken due to the dependency on the TIF_ADDR32 thread flag. For 64bit applications MAP_32BIT will force legacy bottom-up allocations and the 1GB address space restriction even if the application issued a compat syscall, which should not be subject of these restrictions. For 32bit applications, which issue 64bit syscalls the newly introduced mmap base separation into 64-bit and compat bases changed the behaviour because now a 64-bit mapping is returned, but due to the TIF_ADDR32 dependency MAP_32BIT is ignored. Before the separation a 32-bit mapping was returned, so the MAP_32BIT handling was irrelevant. Replace the check for TIF_ADDR32 with a check for the compat syscall. That solves both the 64-bit issuing a compat syscall and the 32-bit issuing a 64-bit syscall problems. [ tglx: Massaged changelog ] Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: 0x7f454c46@gmail.com Cc: linux-mm@kvack.org Cc: Andy Lutomirski <luto@kernel.org> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Borislav Petkov <bp@suse.de> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Link: http://lkml.kernel.org/r/20170306141721.9188-5-dsafonov@virtuozzo.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-03-13 14:59:23 +01:00
Dmitry Safonov	1b028f784e	x86/mm: Introduce mmap_compat_base() for 32-bit mmap() mmap() uses a base address, from which it starts to look for a free space for allocation. The base address is stored in mm->mmap_base, which is calculated during exec(). The address depends on task's size, set rlimit for stack, ASLR randomization. The base depends on the task size and the number of random bits which are different for 64-bit and 32bit applications. Due to the fact, that the base address is fixed, its mmap() from a compat (32bit) syscall issued by a 64bit task will return a address which is based on the 64bit base address and does not fit into the 32bit address space (4GB). The returned pointer is truncated to 32bit, which results in an invalid address. To solve store a seperate compat address base plus a compat legacy address base in mm_struct. These bases are calculated at exec() time and can be used later to address the 32bit compat mmap() issued by 64 bit applications. As a consequence of this change 32-bit applications issuing a 64-bit syscall (after doing a long jump) will get a 64-bit mapping now. Before this change 32-bit applications always got a 32bit mapping. [ tglx: Massaged changelog and added a comment ] Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: 0x7f454c46@gmail.com Cc: linux-mm@kvack.org Cc: Andy Lutomirski <luto@kernel.org> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Borislav Petkov <bp@suse.de> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Link: http://lkml.kernel.org/r/20170306141721.9188-4-dsafonov@virtuozzo.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-03-13 14:59:22 +01:00
Linus Torvalds	5a45a5a881	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: - a fix for the kexec/purgatory regression which was introduced in the merge window via an innocent sparse fix. We could have reverted that commit, but on deeper inspection it turned out that the whole machinery is neither documented nor robust. So a proper cleanup was done instead - the fix for the TLB flush issue which was discovered recently - a simple typo fix for a reboot quirk * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/tlb: Fix tlb flushing when lguest clears PGE kexec, x86/purgatory: Unbreak it and clean it up x86/reboot/quirks: Fix typo in ASUS EeeBook X205TA reboot quirk	2017-03-12 14:18:49 -07:00
Dou Liyang	2b85b3d229	x86/acpi: Restore the order of CPU IDs The following commits: `f7c28833c2` ("x86/acpi: Enable acpi to register all possible cpus at boot time") and `8f54969dc8` ("x86/acpi: Introduce persistent storage for cpuid <-> apicid mapping") ... registered all the possible CPUs at boot time via ACPI tables to make the mapping of cpuid <-> apicid fixed. Both enabled and disabled CPUs could have a logical CPU ID after boot time. But, ACPI tables are unreliable. the number amd order of Local APIC entries which depends on the firmware is often inconsistent with the physical devices. Even if they are consistent, The disabled CPUs which take up some logical CPU IDs will also make the order discontinuous. Revert the part of disabled CPUs registration, keep the allocation logic of logical CPU IDs and also keep some code location changes. Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Tested-by: Xiaolong Ye <xiaolong.ye@intel.com> Cc: rjw@rjwysocki.net Cc: linux-acpi@vger.kernel.org Cc: guzheng1@huawei.com Cc: izumi.taku@jp.fujitsu.com Cc: lenb@kernel.org Link: http://lkml.kernel.org/r/1488528147-2279-4-git-send-email-douly.fnst@cn.fujitsu.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-03-11 14:41:19 +01:00
Dou Liyang	c962cff17d	Revert "x86/acpi: Set persistent cpuid <-> nodeid mapping when booting" Revert: `dc6db24d24` ("x86/acpi: Set persistent cpuid <-> nodeid mapping when booting") The mapping of "cpuid <-> nodeid" is established at boot time via ACPI tables to keep associations of workqueues and other node related items consistent across cpu hotplug. But, ACPI tables are unreliable and failures with that boot time mapping have been reported on machines where the ACPI table and the physical information which is retrieved at actual hotplug is inconsistent. Revert the mapping implementation so it can be replaced with a less error prone approach. Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Tested-by: Xiaolong Ye <xiaolong.ye@intel.com> Cc: rjw@rjwysocki.net Cc: linux-acpi@vger.kernel.org Cc: guzheng1@huawei.com Cc: izumi.taku@jp.fujitsu.com Cc: lenb@kernel.org Link: http://lkml.kernel.org/r/1488528147-2279-2-git-send-email-douly.fnst@cn.fujitsu.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-03-11 14:41:18 +01:00
Mathias Krause	6415813bae	x86/cpu: Drop wp_works_ok member of struct cpuinfo_x86 Remove the wp_works_ok member of struct cpuinfo_x86. It's an optimization back from Linux v0.99 times where we had no fixup support yet and did the CR0.WP test via special code in the page fault handler. The < 0 test was an optimization to not do the special casing for each NULL ptr access violation but just for the first one doing the WP test. Today it serves no real purpose as the test no longer needs special code in the page fault handler and the only call side -- mem_init() -- calls it just once, anyway. However, Xen pre-initializes it to 1, to skip the test. Doing the test again for Xen should be no issue at all, as even the commit introducing skipping the test (commit `d560bc6157` ("x86, xen: Suppress WP test on Xen")) mentioned it being ban aid only. And, in fact, testing the patch on Xen showed nothing breaks. The pre-fixup times are long gone and with the removal of the fallback handling code in commit `a5c2a893db` ("x86, 386 removal: Remove CONFIG_X86_WP_WORKS_OK") the kernel requires a working CR0.WP anyway. So just get rid of the "optimization" and do the test unconditionally. Signed-off-by: Mathias Krause <minipli@googlemail.com> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Arnd Hannemann <hannemann@nets.rwth-aachen.de> Cc: Mikael Starvik <starvik@axis.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "David S. Miller" <davem@davemloft.net> Link: http://lkml.kernel.org/r/1486933932-585-3-git-send-email-minipli@googlemail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-03-11 14:30:24 +01:00
Thomas Gleixner	5a920155e3	x86/process: Optimize TIF_NOTSC switch Provide and use a toggle helper instead of doing it with a branch. x86_64: arch/x86/kernel/process.o text data bss dec hex 3008 8577 16 11601 2d51 Before 2976 8577 16 11569 2d31 After i386: arch/x86/kernel/process.o text data bss dec hex 2925 8673 8 11606 2d56 Before 2893 8673 8 11574 2d36 After Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Link: http://lkml.kernel.org/r/20170214081104.9244-4-khuey@kylehuey.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-03-11 12:45:18 +01:00
Kyle Huey	b9894a2f5b	x86/process: Correct and optimize TIF_BLOCKSTEP switch The debug control MSR is "highly magical" as the blockstep bit can be cleared by hardware under not well documented circumstances. So a task switch relying on the bit set by the previous task (according to the previous tasks thread flags) can trip over this and not update the flag for the next task. To fix this its required to handle DEBUGCTLMSR_BTF when either the previous or the next or both tasks have the TIF_BLOCKSTEP flag set. While at it avoid branching within the TIF_BLOCKSTEP case and evaluating boot_cpu_data twice in kernels without CONFIG_X86_DEBUGCTLMSR. x86_64: arch/x86/kernel/process.o text data bss dec hex 3024 8577 16 11617 2d61 Before 3008 8577 16 11601 2d51 After i386: No change [ tglx: Made the shift value explicit, use a local variable to make the code readable and massaged changelog] Originally-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Kyle Huey <khuey@kylehuey.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Link: http://lkml.kernel.org/r/20170214081104.9244-3-khuey@kylehuey.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-03-11 12:45:18 +01:00
Kyle Huey	af8b3cd393	x86/process: Optimize TIF checks in __switch_to_xtra() Help the compiler to avoid reevaluating the thread flags for each checked bit by reordering the bit checks and providing an explicit xor for evaluation. With default defconfigs for each arch, x86_64: arch/x86/kernel/process.o text data bss dec hex 3056 8577 16 11649 2d81 Before 3024 8577 16 11617 2d61 After i386: arch/x86/kernel/process.o text data bss dec hex 2957 8673 8 11638 2d76 Before 2925 8673 8 11606 2d56 After Originally-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Kyle Huey <khuey@kylehuey.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Link: http://lkml.kernel.org/r/20170214081104.9244-2-khuey@kylehuey.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-03-11 12:45:17 +01:00
Thomas Gleixner	40c50c1fec	kexec, x86/purgatory: Unbreak it and clean it up The purgatory code defines global variables which are referenced via a symbol lookup in the kexec code (core and arch). A recent commit addressing sparse warnings made these static and thereby broke kexec_file. Why did this happen? Simply because the whole machinery is undocumented and lacks any form of forward declarations. The variable names are unspecific and lack a prefix, so adding forward declarations creates shadow variables in the core code. Aside of that the code relies on magic constants and duplicate struct definitions with no way to ensure that these things stay in sync. The section placement of the purgatory variables happened by chance and not by design. Unbreak kexec and cleanup the mess: - Add proper forward declarations and document the usage - Use common struct definition - Use the proper common defines instead of magic constants - Add a purgatory_ prefix to have a proper name space - Use ARRAY_SIZE() instead of a homebrewn reimplementation - Add proper sections to the purgatory variables [ From Mike ] Fixes: `72042a8c7b` ("x86/purgatory: Make functions and variables static") Reported-by: Mike Galbraith <<efault@gmx.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Nicholas Mc Guire <der.herr@hofr.at> Cc: Borislav Petkov <bp@alien8.de> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: "Tobin C. Harding" <me@tobin.cc> Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1703101315140.3681@nanos Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-03-10 20:55:09 +01:00
Matjaz Hegedic	bba8376aea	x86/reboot/quirks: Fix typo in ASUS EeeBook X205TA reboot quirk The reboot quirk for ASUS EeeBook X205TA contains a typo in DMI_PRODUCT_NAME, improperly referring to X205TAW instead of X205TA, which prevents the quirk from being triggered. The model X205TAW already has a reboot quirk of its own. This fix simply removes the inappropriate final letter W. Fixes: `90b28ded88` ("x86/reboot/quirks: Add ASUS EeeBook X205TA reboot quirk") Signed-off-by: Matjaz Hegedic <matjaz.hegedic@gmail.com> Link: http://lkml.kernel.org/r/1489064417-7445-1-git-send-email-matjaz.hegedic@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-03-10 11:58:33 +01:00
Masahiro Yamada	8a1115ff6b	scripts/spelling.txt: add "disble(d)" pattern and fix typo instances Fix typos and add the following to the scripts/spelling.txt: disble\|\|disable disbled\|\|disabled I kept the TSL2563_INT_DISBLED in /drivers/iio/light/tsl2563.c untouched. The macro is not referenced at all, but this commit is touching only comment blocks just in case. Link: http://lkml.kernel.org/r/1481573103-11329-20-git-send-email-yamada.masahiro@socionext.com Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-03-09 17:01:09 -08:00
Josh Poimboeuf	af085d9084	stacktrace/x86: add function for detecting reliable stack traces For live patching and possibly other use cases, a stack trace is only useful if it can be assured that it's completely reliable. Add a new save_stack_trace_tsk_reliable() function to achieve that. Note that if the target task isn't the current task, and the target task is allowed to run, then it could be writing the stack while the unwinder is reading it, resulting in possible corruption. So the caller of save_stack_trace_tsk_reliable() must ensure that the task is either 'current' or inactive. save_stack_trace_tsk_reliable() relies on the x86 unwinder's detection of pt_regs on the stack. If the pt_regs are not user-mode registers from a syscall, then they indicate an in-kernel interrupt or exception (e.g. preemption or a page fault), in which case the stack is considered unreliable due to the nature of frame pointers. It also relies on the x86 unwinder's detection of other issues, such as: - corrupted stack data - stack grows the wrong way - stack walk doesn't reach the bottom - user didn't provide a large enough entries array Such issues are reported by checking unwind_error() and !unwind_done(). Also add CONFIG_HAVE_RELIABLE_STACKTRACE so arch-independent code can determine at build time whether the function is implemented. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Acked-by: Ingo Molnar <mingo@kernel.org> # for the x86 changes Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2017-03-08 09:18:02 +01:00
Dave Airlie	2e16101780	Merge tag 'drm-intel-next-2017-03-06' of git://anongit.freedesktop.org/git/drm-intel into drm-next 4 weeks worth of stuff since I was traveling&lazy: - lspcon improvements (Imre) - proper atomic state for cdclk handling (Ville) - gpu reset improvements (Chris) - lots and lots of polish around fences, requests, waiting and everything related all over (both gem and modeset code), from Chris - atomic by default on gen5+ minus byt/bsw (Maarten did the patch to flip the default, really this is a massive joint team effort) - moar power domains, now 64bit (Ander) - big pile of in-kernel unit tests for various gem subsystems (Chris), including simple mock objects for i915 device and and the ggtt manager. - i915_gpu_info in debugfs, for taking a snapshot of the current gpu state. Same thing as i915_error_state, but useful if the kernel didn't notice something is stick. From Chris. - bxt dsi fixes (Umar Shankar) - bxt w/a updates (Jani) - no more struct_mutex for gem object unreference (Chris) - some execlist refactoring (Tvrtko) - color manager support for glk (Ander) - improve the power-well sync code to better take over from the firmware (Imre) - gem tracepoint polish (Tvrtko) - lots of glk fixes all around (Ander) - ctx switch improvements (Chris) - glk dsi support&fixes (Deepak M) - dsi fixes for vlv and clanups, lots of them (Hans de Goede) - switch to i915.ko types in lots of our internal modeset code (Ander) - byt/bsw atomic wm update code, yay (Ville) * tag 'drm-intel-next-2017-03-06' of git://anongit.freedesktop.org/git/drm-intel: (432 commits) drm/i915: Update DRIVER_DATE to 20170306 drm/i915: Don't use enums for hardware engine id drm/i915: Split breadcrumbs spinlock into two drm/i915: Refactor wakeup of the next breadcrumb waiter drm/i915: Take reference for signaling the request from hardirq drm/i915: Add FIFO underrun tracepoints drm/i915: Add cxsr toggle tracepoint drm/i915: Add VLV/CHV watermark/FIFO programming tracepoints drm/i915: Add plane update/disable tracepoints drm/i915: Kill level 0 wm hack for VLV/CHV drm/i915: Workaround VLV/CHV sprite1->sprite0 enable underrun drm/i915: Sanitize VLV/CHV watermarks properly drm/i915: Only use update_wm_{pre,post} for pre-ilk platforms drm/i915: Nuke crtc->wm.cxsr_allowed drm/i915: Compute proper intermediate wms for vlv/cvh drm/i915: Skip useless watermark/FIFO related work on VLV/CHV when not needed drm/i915: Compute vlv/chv wms the atomic way drm/i915: Compute VLV/CHV FIFO sizes based on the PM2 watermarks drm/i915: Plop vlv/chv fifo sizes into crtc state drm/i915: Plop vlv wm state into crtc_state ...	2017-03-08 12:41:47 +10:00
Linus Torvalds	ec3b93ae0b	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes and minor updates all over the place: - an SGI/UV fix - a defconfig update - a build warning fix - move the boot_params file to the arch location in debugfs - a pkeys fix - selftests fix - boot message fixes - sparse fixes - a resume warning fix - ioapic hotplug fixes - reboot quirks ... plus various minor cleanups" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/build/x86_64_defconfig: Enable CONFIG_R8169 x86/reboot/quirks: Add ASUS EeeBook X205TA/W reboot quirk x86/hpet: Prevent might sleep splat on resume x86/boot: Correct setup_header.start_sys name x86/purgatory: Fix sparse warning, symbol not declared x86/purgatory: Make functions and variables static x86/events: Remove last remnants of old filenames x86/pkeys: Check against max pkey to avoid overflows x86/ioapic: Split IOAPIC hot-removal into two steps x86/PCI: Implement pcibios_release_device to release IRQ from IOAPIC x86/intel_rdt: Remove duplicate inclusion of linux/cpu.h x86/vmware: Remove duplicate inclusion of asm/timer.h x86/hyperv: Hide unused label x86/reboot/quirks: Add ASUS EeeBook X205TA reboot quirk x86/platform/uv/BAU: Fix HUB errors by remove initial write to sw-ack register x86/selftests: Add clobbers for int80 on x86_64 x86/apic: Simplify enable_IR_x2apic(), remove try_to_enable_IR() x86/apic: Fix a warning message in logical CPU IDs allocation x86/kdebugfs: Move boot params hierarchy under (debugfs)/x86/	2017-03-07 14:47:24 -08:00
Linus Torvalds	609b07b72d	Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "A fix for KVM's scheduler clock which (erroneously) was always marked unstable, a fix for RT/DL load balancing, plus latency fixes" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/clock, x86/tsc: Rework the x86 'unstable' sched_clock() interface sched/core: Fix pick_next_task() for RT,DL sched/fair: Make select_idle_cpu() more aggressive	2017-03-07 14:42:34 -08:00
Linus Torvalds	c3abcabe81	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "This includes a fix for a crash if certain special addresses are kprobed, plus does a rename of two Kconfig variables that were a minor misnomer" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Rename CONFIG_[UK]PROBE_EVENT to CONFIG_[UK]PROBE_EVENTS kprobes/x86: Fix kernel panic when certain exception-handling addresses are probed	2017-03-07 14:38:16 -08:00
Borislav Petkov	79d243a042	x86/boot/64: Rename start_cpu() It doesn't really start a CPU but does a far jump to C code. So call it that. Eliminate the unconditional JMP to it from secondary_startup_64() but make the jump to C code piece part of secondary_startup_64() instead. Also, it doesn't need to be a global symbol either so make it a local label. One less needlessly global symbol in the symbol table. No functionality change. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170304095611.11355-1-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-07 13:57:25 +01:00
Matjaz Hegedic	3b3e78552d	x86/reboot/quirks: Add ASUS EeeBook X205TA/W reboot quirk Without the parameter reboot=a, ASUS EeeBook X205TA/W will hang when it should reboot. This adds the appropriate quirk, thus fixing the problem. Signed-off-by: Matjaz Hegedic <matjaz.hegedic@gmail.com> Link: http://lkml.kernel.org/r/1488737804-20681-1-git-send-email-matjaz.hegedic@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-03-06 11:47:43 +01:00
Linus Torvalds	2d62e0768d	Second batch of KVM changes for 4.11 merge window PPC: * correct assumption about ASDR on POWER9 * fix MMIO emulation on POWER9 x86: * add a simple test for ioperm * cleanup TSS (going through KVM tree as the whole undertaking was caused by VMX's use of TSS) * fix nVMX interrupt delivery * fix some performance counters in the guest And two cleanup patches. -----BEGIN PGP SIGNATURE----- iQEcBAABCAAGBQJYuu5qAAoJEED/6hsPKofoRAUH/jkx/KFDcw3FggixysWVgRai iLSbbAZemnSLFSOkOU/t7Bz0fXCUgB0tAcMJd9ow01Dg1zObiTpuUIo6qEPaYHdX gqtUzlHuyECZEcgK0RXS9kDYLrvw7EFocxnDWQfV91qCZSS6nBSSLF3ST1rNV69W mUvcZG+MciDcZUe1lTexoswVTh1m7avvozEnQ5OHnZR9yicoXiadBQjzL6yqWoqf Ml/29zRk5+MvloTudxjkAKm3mh7psW88jNMh37TXbAA7i+Xwl9cU6GLR9mFWstoP 7Ot7ecq9mNAUO3lTIQh7lqvB60LMFznS4IlYK7MbplC3kvJLkfzhTWaN1aGvh90= =cqHo -----END PGP SIGNATURE----- Merge tag 'kvm-4.11-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull more KVM updates from Radim Krčmář: "Second batch of KVM changes for the 4.11 merge window: PPC: - correct assumption about ASDR on POWER9 - fix MMIO emulation on POWER9 x86: - add a simple test for ioperm - cleanup TSS (going through KVM tree as the whole undertaking was caused by VMX's use of TSS) - fix nVMX interrupt delivery - fix some performance counters in the guest ... and two cleanup patches" * tag 'kvm-4.11-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: nVMX: Fix pending events injection x86/kvm/vmx: remove unused variable in segment_base() selftests/x86: Add a basic selftest for ioperm x86/asm: Tidy up TSS limit code kvm: convert kvm.users_count from atomic_t to refcount_t KVM: x86: never specify a sample period for virtualized in_tx_cp counters KVM: PPC: Book3S HV: Don't use ASDR for real-mode HPT faults on POWER9 KVM: PPC: Book3S HV: Fix software walk of guest process page tables	2017-03-04 11:36:19 -08:00
Ingo Molnar	68e21be291	sched/headers: Move task->mm handling methods to <linux/sched/mm.h> Move the following task->mm helper APIs into a new header file, <linux/sched/mm.h>, to further reduce the size and complexity of <linux/sched.h>. Here are how the APIs are used in various kernel files: # mm_alloc(): arch/arm/mach-rpc/ecard.c fs/exec.c include/linux/sched/mm.h kernel/fork.c # __mmdrop(): arch/arc/include/asm/mmu_context.h include/linux/sched/mm.h kernel/fork.c # mmdrop(): arch/arm/mach-rpc/ecard.c arch/m68k/sun3/mmu_emu.c arch/x86/mm/tlb.c drivers/gpu/drm/amd/amdkfd/kfd_process.c drivers/gpu/drm/i915/i915_gem_userptr.c drivers/infiniband/hw/hfi1/file_ops.c drivers/vfio/vfio_iommu_spapr_tce.c fs/exec.c fs/proc/base.c fs/proc/task_mmu.c fs/proc/task_nommu.c fs/userfaultfd.c include/linux/mmu_notifier.h include/linux/sched/mm.h kernel/fork.c kernel/futex.c kernel/sched/core.c mm/khugepaged.c mm/ksm.c mm/mmu_context.c mm/mmu_notifier.c mm/oom_kill.c virt/kvm/kvm_main.c # mmdrop_async_fn(): include/linux/sched/mm.h # mmdrop_async(): include/linux/sched/mm.h kernel/fork.c # mmget_not_zero(): fs/userfaultfd.c include/linux/sched/mm.h mm/oom_kill.c # mmput(): arch/arc/include/asm/mmu_context.h arch/arc/kernel/troubleshoot.c arch/frv/mm/mmu-context.c arch/powerpc/platforms/cell/spufs/context.c arch/sparc/include/asm/mmu_context_32.h drivers/android/binder.c drivers/gpu/drm/etnaviv/etnaviv_gem.c drivers/gpu/drm/i915/i915_gem_userptr.c drivers/infiniband/core/umem.c drivers/infiniband/core/umem_odp.c drivers/infiniband/core/uverbs_main.c drivers/infiniband/hw/mlx4/main.c drivers/infiniband/hw/mlx5/main.c drivers/infiniband/hw/usnic/usnic_uiom.c drivers/iommu/amd_iommu_v2.c drivers/iommu/intel-svm.c drivers/lguest/lguest_user.c drivers/misc/cxl/fault.c drivers/misc/mic/scif/scif_rma.c drivers/oprofile/buffer_sync.c drivers/vfio/vfio_iommu_type1.c drivers/vhost/vhost.c drivers/xen/gntdev.c fs/exec.c fs/proc/array.c fs/proc/base.c fs/proc/task_mmu.c fs/proc/task_nommu.c fs/userfaultfd.c include/linux/sched/mm.h kernel/cpuset.c kernel/events/core.c kernel/events/uprobes.c kernel/exit.c kernel/fork.c kernel/ptrace.c kernel/sys.c kernel/trace/trace_output.c kernel/tsacct.c mm/memcontrol.c mm/memory.c mm/mempolicy.c mm/migrate.c mm/mmu_notifier.c mm/nommu.c mm/oom_kill.c mm/process_vm_access.c mm/rmap.c mm/swapfile.c mm/util.c virt/kvm/async_pf.c # mmput_async(): include/linux/sched/mm.h kernel/fork.c mm/oom_kill.c # get_task_mm(): arch/arc/kernel/troubleshoot.c arch/powerpc/platforms/cell/spufs/context.c drivers/android/binder.c drivers/gpu/drm/etnaviv/etnaviv_gem.c drivers/infiniband/core/umem.c drivers/infiniband/core/umem_odp.c drivers/infiniband/hw/mlx4/main.c drivers/infiniband/hw/mlx5/main.c drivers/infiniband/hw/usnic/usnic_uiom.c drivers/iommu/amd_iommu_v2.c drivers/iommu/intel-svm.c drivers/lguest/lguest_user.c drivers/misc/cxl/fault.c drivers/misc/mic/scif/scif_rma.c drivers/oprofile/buffer_sync.c drivers/vfio/vfio_iommu_type1.c drivers/vhost/vhost.c drivers/xen/gntdev.c fs/proc/array.c fs/proc/base.c fs/proc/task_mmu.c include/linux/sched/mm.h kernel/cpuset.c kernel/events/core.c kernel/exit.c kernel/fork.c kernel/ptrace.c kernel/sys.c kernel/trace/trace_output.c kernel/tsacct.c mm/memcontrol.c mm/memory.c mm/mempolicy.c mm/migrate.c mm/mmu_notifier.c mm/nommu.c mm/util.c # mm_access(): fs/proc/base.c include/linux/sched/mm.h kernel/fork.c mm/process_vm_access.c # mm_release(): arch/arc/include/asm/mmu_context.h fs/exec.c include/linux/sched/mm.h include/uapi/linux/sched.h kernel/exit.c kernel/fork.c Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-03 01:43:28 +01:00
Thomas Gleixner	bb1a2c2616	x86/hpet: Prevent might sleep splat on resume Sergey reported a might sleep warning triggered from the hpet resume path. It's caused by the call to disable_irq() from interrupt disabled context. The problem with the low level resume code is that it is not accounted as a special system_state like we do during the boot process. Calling the same code during system boot would not trigger the warning. That's inconsistent at best. In this particular case it's trivial to replace the disable_irq() with disable_hardirq() because this particular code path is solely used from system resume and the involved hpet interrupts can never be force threaded. Reported-and-tested-by: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1703012108460.3684@nanos Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-03-02 09:33:47 +01:00
Peter Zijlstra	f94c8d1169	sched/clock, x86/tsc: Rework the x86 'unstable' sched_clock() interface Wanpeng Li reported that since the following commit: `acb04058de` ("sched/clock: Fix hotplug crash") ... KVM always runs with unstable sched-clock even though KVM's kvm_clock _is_ stable. The problem is that we've tied clear_sched_clock_stable() to the TSC state, and overlooked that sched_clock() is a paravirt function. Solve this by doing two things: - tie the sched_clock() stable state more clearly to the TSC stable state for the normal (!paravirt) case. - only call clear_sched_clock_stable() when we mark TSC unstable when we use native_sched_clock(). The first means we can actually run with stable sched_clock in more situations then before, which is good. And since commit: `12907fbb1a` ("sched/clock, clocksource: Add optional cs::mark_unstable() method") ... this should be reliable. Since any detection of TSC fail now results in marking the TSC unstable. Reported-by: Wanpeng Li <kernellwp@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Fixes: `acb04058de` ("sched/clock: Fix hotplug crash") Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-02 08:50:49 +01:00
Ingo Molnar	32ef5517c2	sched/headers: Prepare to move cputime functionality from <linux/sched.h> into <linux/sched/cputime.h> Introduce a trivial, mostly empty <linux/sched/cputime.h> header to prepare for the moving of cputime functionality out of sched.h. Update all code that relies on these facilities. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-02 08:42:39 +01:00
Ingo Molnar	9164bb4a18	sched/headers: Prepare to move 'init_task' and 'init_thread_union' from <linux/sched.h> to <linux/sched/task.h> Update all usage sites first. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-02 08:42:38 +01:00
Ingo Molnar	68db0cf106	sched/headers: Prepare for new header dependencies before moving code to <linux/sched/task_stack.h> We are going to split <linux/sched/task_stack.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/task_stack.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-02 08:42:36 +01:00
Ingo Molnar	299300258d	sched/headers: Prepare for new header dependencies before moving code to <linux/sched/task.h> We are going to split <linux/sched/task.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/task.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-02 08:42:35 +01:00
Ingo Molnar	ef8bd77f33	sched/headers: Prepare for new header dependencies before moving code to <linux/sched/hotplug.h> We are going to split <linux/sched/hotplug.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/hotplug.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-02 08:42:35 +01:00
Ingo Molnar	b17b01533b	sched/headers: Prepare for new header dependencies before moving code to <linux/sched/debug.h> We are going to split <linux/sched/debug.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/debug.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-02 08:42:34 +01:00
Ingo Molnar	174cd4b1e5	sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h> Fix up affected files that include this signal functionality via sched.h. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-02 08:42:32 +01:00
Ingo Molnar	5b825c3af1	sched/headers: Prepare to remove <linux/cred.h> inclusion from <linux/sched.h> Add #include <linux/cred.h> dependencies to all .c files rely on sched.h doing that for them. Note that even if the count where we need to add extra headers seems high, it's still a net win, because <linux/sched.h> is included in over 2,200 files ... Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-02 08:42:31 +01:00
Ingo Molnar	010426079e	sched/headers: Prepare for new header dependencies before moving more code to <linux/sched/mm.h> We are going to split more MM APIs out of <linux/sched.h>, which will have to be picked up from a couple of .c files. The APIs that we are going to move are: arch_pick_mmap_layout() arch_get_unmapped_area() arch_get_unmapped_area_topdown() mm_update_next_owner() Include the header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-02 08:42:30 +01:00
Ingo Molnar	38b8d208a4	sched/headers: Prepare for new header dependencies before moving code to <linux/sched/nmi.h> We are going to move softlockup APIs out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. <linux/nmi.h> already includes <linux/sched.h>. Include the <linux/nmi.h> header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-02 08:42:30 +01:00
Ingo Molnar	3f07c01441	sched/headers: Prepare for new header dependencies before moving code to <linux/sched/signal.h> We are going to split <linux/sched/signal.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/signal.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-02 08:42:29 +01:00
Ingo Molnar	e601757102	sched/headers: Prepare for new header dependencies before moving code to <linux/sched/clock.h> We are going to split <linux/sched/clock.h> out of <linux/sched.h>, which will have to be picked up from other headers and .c files. Create a trivial placeholder <linux/sched/clock.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-02 08:42:27 +01:00
Ingo Molnar	4c822698cb	sched/headers: Prepare for new header dependencies before moving code to <linux/sched/idle.h> We are going to split <linux/sched/idle.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/idle.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-02 08:42:26 +01:00
Ingo Molnar	105ab3d8ce	sched/headers: Prepare for new header dependencies before moving code to <linux/sched/topology.h> We are going to split <linux/sched/topology.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/topology.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-02 08:42:26 +01:00
Ingo Molnar	9d020d33fc	Merge branch 'linus' into perf/urgent, to resolve conflict Conflicts: arch/powerpc/configs/85xx/kmp204x_defconfig Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-02 08:05:45 +01:00
Josh Poimboeuf	e390f9a968	objtool, modules: Discard objtool annotation sections for modules The '__unreachable' and '__func_stack_frame_non_standard' sections are only used at compile time. They're discarded for vmlinux but they should also be discarded for modules. Since this is a recurring pattern, prefix the section names with ".discard.". It's a nice convention and vmlinux.lds.h already discards such sections. Also remove the 'a' (allocatable) flag from the __unreachable section since it doesn't make sense for a discarded section. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Jessica Yu <jeyu@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: `d1091c7fa3` ("objtool: Improve detection of BUG() and other dead ends") Link: http://lkml.kernel.org/r/20170301180444.lhd53c5tibc4ns77@treble Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-01 20:32:25 +01:00
Andy Lutomirski	b7ceaec112	x86/asm: Tidy up TSS limit code In an earlier version of the patch ("x86/kvm/vmx: Defer TR reload after VM exit") that introduced TSS limit validity tracking, I confused which helper was which. On reflection, the names I chose sucked. Rename the helpers to make it more obvious what's going on and add some comments. While I'm at it, clear __tss_limit_invalid when force-reloading as well as when contitionally reloading, since any TR reload fixes the limit. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2017-03-01 17:03:22 +01:00
Masanari Iida	e86a2d2d34	x86/intel_rdt: Remove duplicate inclusion of linux/cpu.h Signed-off-by: Masanari Iida <standby24x7@gmail.com> Cc: fenghua.yu@intel.com Link: http://lkml.kernel.org/r/20170227130703.26968-1-standby24x7@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-03-01 10:51:41 +01:00
Masanari Iida	aa5ec3f715	x86/vmware: Remove duplicate inclusion of asm/timer.h Signed-off-by: Masanari Iida <standby24x7@gmail.com> Cc: akataria@vmware.com Cc: virtualization@lists.linux-foundation.org Link: http://lkml.kernel.org/r/20170227122922.26230-1-standby24x7@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-03-01 10:51:40 +01:00
Matjaz Hegedic	90b28ded88	x86/reboot/quirks: Add ASUS EeeBook X205TA reboot quirk Without the parameter reboot=a, ASUS EeeBook X205TA will hang when it should reboot. This adds the appropriate quirk, thus fixing the problem. Signed-off-by: Matjaz Hegedic <matjaz.hegedic@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-01 10:29:20 +01:00
Dou Liyang	11277aabcb	x86/apic: Simplify enable_IR_x2apic(), remove try_to_enable_IR() The following commit: `2e63ad4bd5` ("x86/apic: Do not init irq remapping if ioapic is disabled") ... added a check for skipped IO-APIC setup to enable_IR_x2apic(), but this check is also duplicated in try_to_enable_IR() - and it will never succeed in calling irq_remapping_enable(). Remove the whole irq_remapping_enable() complication: if the IO-APIC is disabled we cannot enable IRQ remapping. Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bp@alien8.de Cc: nicstange@gmail.com Cc: wanpeng.li@hotmail.com Link: http://lkml.kernel.org/r/1487841401-1543-1-git-send-email-douly.fnst@cn.fujitsu.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-01 10:09:09 +01:00
Dou Liyang	bb3f0a5263	x86/apic: Fix a warning message in logical CPU IDs allocation The current warning message in allocate_logical_cpuid() is somewhat confusing: Only 1 processors supported.Processor 2/0x2 and the rest are ignored. As it might imply that there's only one CPU in the system - while what we ran into here is a kernel limitation. Fix the warning message to clarify all that: APIC: NR_CPUS/possible_cpus limit of 2 reached. Processor 2/0x2 and the rest are ignored. ( Also update the error return from -1 to -EINVAL, which is the more canonical return value. ) Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bp@alien8.de Cc: nicstange@gmail.com Cc: wanpeng.li@hotmail.com Link: http://lkml.kernel.org/r/1488261052-25753-1-git-send-email-douly.fnst@cn.fujitsu.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-01 10:09:08 +01:00
Borislav Petkov	10bce84106	x86/kdebugfs: Move boot params hierarchy under (debugfs)/x86/ ... since this is all x86-specific data and it makes sense to have it under x86/ logically instead in the toplevel debugfs dir. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170227225058.27289-1-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-01 09:57:02 +01:00
Masami Hiramatsu	75013fb16f	kprobes/x86: Fix kernel panic when certain exception-handling addresses are probed Fix to the exception table entry check by using probed address instead of the address of copied instruction. This bug may cause unexpected kernel panic if user probe an address where an exception can happen which should be fixup by __ex_table (e.g. copy_from_user.) Unless user puts a kprobe on such address, this doesn't cause any problem. This bug has been introduced years ago, by commit: `464846888d` ("x86/kprobes: Fix a bug which can modify kernel code permanently"). Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: `464846888d` ("x86/kprobes: Fix a bug which can modify kernel code permanently") Link: http://lkml.kernel.org/r/148829899399.28855.12581062400757221722.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-01 09:56:13 +01:00
Ingo Molnar	0871d5a66d	Merge branch 'linus' into WIP.x86/boot, to fix up conflicts and to pick up updates Conflicts: arch/x86/xen/setup.c Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-01 09:02:26 +01:00
Linus Torvalds	f89db789de	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Two documentation updates, plus a debugging annotation fix" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/crash: Update the stale comment in reserve_crashkernel() x86/irq, trace: Add __irq_entry annotation to x86's platform IRQ handlers Documentation, x86, resctrl: Recommend locking for resctrlfs	2017-02-28 11:46:00 -08:00
Linus Torvalds	e72e58faa7	Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool fixes from Ingo Molnar: "A handful of objtool fixes related to unreachable code, plus a build fix for out of tree modules" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Enclose contents of unreachable() macro in a block objtool: Prevent GCC from merging annotate_unreachable() objtool: Improve detection of BUG() and other dead ends objtool: Fix CONFIG_STACK_VALIDATION=y warning for out-of-tree modules	2017-02-28 10:15:59 -08:00
Jinbum Park	2959a5f726	mm: add arch-independent testcases for RODATA This patch makes arch-independent testcases for RODATA. Both x86 and x86_64 already have testcases for RODATA, But they are arch-specific because using inline assembly directly. And cacheflush.h is not a suitable location for rodata-test related things. Since they were in cacheflush.h, If someone change the state of CONFIG_DEBUG_RODATA_TEST, It cause overhead of kernel build. To solve the above issues, write arch-independent testcases and move it to shared location. [jinb.park7@gmail.com: fix config dependency] Link: http://lkml.kernel.org/r/20170209131625.GA16954@pjb1027-Latitude-E5410 Link: http://lkml.kernel.org/r/20170129105436.GA9303@pjb1027-Latitude-E5410 Signed-off-by: Jinbum Park <jinb.park7@gmail.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Valentin Rothberg <valentinrothberg@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-27 18:43:48 -08:00
Vegard Nossum	f1f1007644	mm: add new mmgrab() helper Apart from adding the helper function itself, the rest of the kernel is converted mechanically using: git grep -l 'atomic_inc.mm_count' \| xargs sed -i 's/atomic_inc(&$.$->mm_count);/mmgrab$\1$;/' git grep -l 'atomic_inc.mm_count' \| xargs sed -i 's/atomic_inc(&$.$\.mm_count);/mmgrab$\&\1$;/' This is needed for a later patch that hooks into the helper, but might be a worthwhile cleanup on its own. (Michal Hocko provided most of the kerneldoc comment.) Link: http://lkml.kernel.org/r/20161218123229.22952-1-vegard.nossum@oracle.com Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-27 18:43:48 -08:00
Daniel Vetter	c771633daf	Merge airlied/drm-next into drm-misc-next Backmerge the main pull request to sync up with all the newly landed drivers. Otherwise we'll have chaos even before 4.12 started in earnest. Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>	2017-02-27 09:30:11 +01:00
Linus Torvalds	ac1820fb28	This is a tree wide change and has been kept separate for that reason. Bart Van Assche noted that the ib DMA mapping code was significantly similar enough to the core DMA mapping code that with a few changes it was possible to remove the IB DMA mapping code entirely and switch the RDMA stack to use the core DMA mapping code. This resulted in a nice set of cleanups, but touched the entire tree. This branch will be submitted separately to Linus at the end of the merge window as per normal practice for tree wide changes like this. -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJYo06oAAoJELgmozMOVy/d9Z8QALedWHdu98St1L0u2c8sxnR9 2zo/4sF5Vb9u7FpmdIX32L4SQ9s9KhPE8Qp8NtZLf9v10zlDebIRJDpXknXtKooV CAXxX4sxBXV27/UrhbZEfXiPrmm6ccJFyIfRnMU6NlMqh2AtAsRa5AC2/RMp8oUD Med97PFiF0o6TD22/UH1VFbRpX1zjaKyqm7a3as5sJfzNA+UGIZAQ7Euz8000DKZ xCgVLTEwS0FmOujtBkCst7xa9TjuqR1HLOB4DdGvAhP6BHdz2yamM7Qmh9NN+NEX 0BtjsuXomtn6j6AszGC+bpipCZh3NUigcwoFAARXCYFHibBvo4DPdFeGsraFgXdy 1+KyR8CCeQG3Aly5Vwr264RFPGkGpwMj8PsBlXgQVtrlg4rriaCzOJNmIIbfdADw ftqhxBOzReZw77aH2s+9p2ILRfcAmPqhynLvFGFo9LBvsik8LVso7YgZN0xGxwcI IjI/XGC8UskPVsIZBIYA6sl2bYzgOjtBIHiXjRrPlW3uhduIXLrvKFfLPP/5XLAG ehLXK+J0bfsyY9ClmlNS8oH/WdLhXAyy/KNmnj5bRRm9qg6BRJR3bsOBhZJODuoC XgEXFfF6/7roNESWxowff7pK0rTkRg/m/Pa4VQpeO+6NWHE7kgZhL6kyIp5nKcwS 3e7mgpcwC+3XfA/6vU3F =e0Si -----END PGP SIGNATURE----- Merge tag 'for-next-dma_ops' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma DMA mapping updates from Doug Ledford: "Drop IB DMA mapping code and use core DMA code instead. Bart Van Assche noted that the ib DMA mapping code was significantly similar enough to the core DMA mapping code that with a few changes it was possible to remove the IB DMA mapping code entirely and switch the RDMA stack to use the core DMA mapping code. This resulted in a nice set of cleanups, but touched the entire tree and has been kept separate for that reason." * tag 'for-next-dma_ops' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (37 commits) IB/rxe, IB/rdmavt: Use dma_virt_ops instead of duplicating it IB/core: Remove ib_device.dma_device nvme-rdma: Switch from dma_device to dev.parent RDS: net: Switch from dma_device to dev.parent IB/srpt: Modify a debug statement IB/srp: Switch from dma_device to dev.parent IB/iser: Switch from dma_device to dev.parent IB/IPoIB: Switch from dma_device to dev.parent IB/rxe: Switch from dma_device to dev.parent IB/vmw_pvrdma: Switch from dma_device to dev.parent IB/usnic: Switch from dma_device to dev.parent IB/qib: Switch from dma_device to dev.parent IB/qedr: Switch from dma_device to dev.parent IB/ocrdma: Switch from dma_device to dev.parent IB/nes: Remove a superfluous assignment statement IB/mthca: Switch from dma_device to dev.parent IB/mlx5: Switch from dma_device to dev.parent IB/mlx4: Switch from dma_device to dev.parent IB/i40iw: Remove a superfluous assignment statement IB/hns: Switch from dma_device to dev.parent ...	2017-02-25 13:45:43 -08:00
Lucas Stach	712c604dcd	mm: wire up GFP flag passing in dma_alloc_from_contiguous The callers of the DMA alloc functions already provide the proper context GFP flags. Make sure to pass them through to the CMA allocator, to make the CMA compaction context aware. Link: http://lkml.kernel.org/r/20170127172328.18574-3-l.stach@pengutronix.de Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Radim Krcmar <rkrcmar@redhat.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Chris Zankel <chris@zankel.net> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Alexander Graf <agraf@suse.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-24 17:46:55 -08:00
Matthew Wilcox	a00cc7d9dd	mm, x86: add support for PUD-sized transparent hugepages The current transparent hugepage code only supports PMDs. This patch adds support for transparent use of PUDs with DAX. It does not include support for anonymous pages. x86 support code also added. Most of this patch simply parallels the work that was done for huge PMDs. The only major difference is how the new ->pud_entry method in mm_walk works. The ->pmd_entry method replaces the ->pte_entry method, whereas the ->pud_entry method works along with either ->pmd_entry or ->pte_entry. The pagewalk code takes care of locking the PUD before calling ->pud_walk, so handlers do not need to worry whether the PUD is stable. [dave.jiang@intel.com: fix SMP x86 32bit build for native_pud_clear()] Link: http://lkml.kernel.org/r/148719066814.31111.3239231168815337012.stgit@djiang5-desk3.ch.intel.com [dave.jiang@intel.com: native_pud_clear missing on i386 build] Link: http://lkml.kernel.org/r/148640375195.69754.3315433724330910314.stgit@djiang5-desk3.ch.intel.com Link: http://lkml.kernel.org/r/148545059381.17912.8602162635537598445.stgit@djiang5-desk3.ch.intel.com Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Tested-by: Alexander Kapshuk <alexander.kapshuk@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Jan Kara <jack@suse.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Nilesh Choudhury <nilesh.choudhury@oracle.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-24 17:46:54 -08:00
Josh Poimboeuf	d1091c7fa3	objtool: Improve detection of BUG() and other dead ends The BUG() macro's use of __builtin_unreachable() via the unreachable() macro tells gcc that the instruction is a dead end, and that it's safe to assume the current code path will not execute past the previous instruction. On x86, the BUG() macro is implemented with the 'ud2' instruction. When objtool's branch analysis sees that instruction, it knows the current code path has come to a dead end. Peter Zijlstra has been working on a patch to change the WARN macros to use 'ud2'. That patch will break objtool's assumption that 'ud2' is always a dead end. Generally it's best for objtool to avoid making those kinds of assumptions anyway. The more ignorant it is of kernel code internals, the better. So create a more generic way for objtool to detect dead ends by adding an annotation to the unreachable() macro. The annotation stores a pointer to the end of the unreachable code path in an '__unreachable' section. Objtool can read that section to find the dead ends. Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/41a6d33971462ebd944a1c60ad4bf5be86c17b77.1487712920.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-02-24 09:10:52 +01:00
Linus Torvalds	60e8d3e116	pci-v4.11-changes -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJYrvYlAAoJEFmIoMA60/r8FuQQAMDpia3kacyCAJpa+zjmyMNF 1slytaoIvP37dFq9XF1em031lwGNr5sahZ7nP1EKgALz4odZUzait7BUABcfviIn Uesz2E1s/miMo4/0X1j9DqY9xV649DmmSIgk1yn3kvCkH/+Ix27dexu47auGzPEb H/sEfd1RZidjZ5EWaG0ww5FrHcuge+JHtcH6vFQtWsTOspcx++IhaVIGjC0JCpqK DnlQKilsJ38KUkvuDcxWjtFKxAc8De9jvCR4kX96OvbHahfAWwBO4AtUv7U3JpJN 2nyQk+I5kRagbfBucaXZISUtWM7h4peLiL+TGkvKg8eOVlOCedjYlrZW4SWkbAN+ 0qwcHRQ8lwhNmgp3VYq7pmnugIvW4P2Fh3uqaplCAIwlpODxWPDQP7HLM2kyzmvq gPGi0R4Yo2PdIXqfbilrzbFVeyqkIFECr287a6+5PekC0DxsqZvOG0uA1mWKLIaH pRQMT0FO2SCCSOpcxRExeIj+XxhXlDVOrIBP6eMiFXAMgzUAyU8fLSZVMtXAvsTS 02hVDOc/Fq2jKlCSoJRIiRp5aj1QDFS/DjBhOnW7pXuvUTCrfYBXY5NCdT9UV3Q7 W6qHWkizRmRDGxUzqSODRt5aU7VOKbWvZnp10eJyKt5s2Iawe6We5V1NX+u18UIS Scc1nbuPTL6u1n8PsaBG =4Owc -----END PGP SIGNATURE----- Merge tag 'pci-v4.11-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: - add ASPM L1 substate support - enable PCIe Extended Tags when supported - configure PCIe MPS settings on iProc, Versatile, X-Gene, and Xilinx - increase VPD access timeout - add ACS quirks for Intel Union Point, Qualcomm QDF2400 and QDF2432 - use new pci_irq_alloc_vectors() in more drivers - fix MSI affinity memory leak - remove unused MSI interfaces and update documentation - remove unused AER .link_reset() callback - avoid pci_lock / p->pi_lock deadlock seen with perf - serialize sysfs enable/disable num_vfs operations - move DesignWare IP from drivers/pci/host/ to drivers/pci/dwc/ and refactor so we can support both hosts and endpoints - add DT ECAM-like support for HiSilicon Hip06/Hip07 controllers - add Rockchip system power management support - add Thunder-X cn81xx and cn83xx support - add Exynos 5440 PCIe PHY support * tag 'pci-v4.11-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (93 commits) PCI: dwc: Remove dependency of designware on CONFIG_PCI PCI: dwc: Add CONFIG_PCIE_DW_HOST to enable PCI dwc host PCI: dwc: Split pcie-designware.c into host and core files PCI: dwc: designware: Fix style errors in pcie-designware.c PCI: dwc: designware: Parse "num-lanes" property in dw_pcie_setup_rc() PCI: dwc: all: Split struct pcie_port into host-only and core structures PCI: dwc: designware: Get device pointer at the start of dw_pcie_host_init() PCI: dwc: all: Rename cfg_read/cfg_write to read/write PCI: dwc: all: Use platform_set_drvdata() to save private data PCI: dwc: designware: Move register defines to designware header file PCI: dwc: Use PTR_ERR_OR_ZERO to simplify code PCI: dra7xx: Group PHY API invocations PCI: dra7xx: Enable MSI and legacy interrupts simultaneously PCI: dra7xx: Add support to force RC to work in GEN1 mode PCI: dra7xx: Simplify probe code with devm_gpiod_get_optional() PCI: Move DesignWare IP support to new drivers/pci/dwc/ directory PCI: exynos: Support the PHY generic framework Documentation: binding: Modify the exynos5440 PCIe binding phy: phy-exynos-pcie: Add support for Exynos PCIe PHY Documentation: samsung-phy: Add exynos-pcie-phy binding ...	2017-02-23 11:53:22 -08:00
Linus Torvalds	fd7e9a8834	4.11 is going to be a relatively large release for KVM, with a little over 200 commits and noteworthy changes for most architectures. * ARM: - GICv3 save/restore - cache flushing fixes - working MSI injection for GICv3 ITS - physical timer emulation * MIPS: - various improvements under the hood - support for SMP guests - a large rewrite of MMU emulation. KVM MIPS can now use MMU notifiers to support copy-on-write, KSM, idle page tracking, swapping, ballooning and everything else. KVM_CAP_READONLY_MEM is also supported, so that writes to some memory regions can be treated as MMIO. The new MMU also paves the way for hardware virtualization support. * PPC: - support for POWER9 using the radix-tree MMU for host and guest - resizable hashed page table - bugfixes. * s390: expose more features to the guest - more SIMD extensions - instruction execution protection - ESOP2 * x86: - improved hashing in the MMU - faster PageLRU tracking for Intel CPUs without EPT A/D bits - some refactoring of nested VMX entry/exit code, preparing for live migration support of nested hypervisors - expose yet another AVX512 CPUID bit - host-to-guest PTP support - refactoring of interrupt injection, with some optimizations thrown in and some duct tape removed. - remove lazy FPU handling - optimizations of user-mode exits - optimizations of vcpu_is_preempted() for KVM guests * generic: - alternative signaling mechanism that doesn't pound on tsk->sighand->siglock -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJYral1AAoJEL/70l94x66DbNgH/Rx8YXuidFq2fe3RWOvld3RK 85OM/D5g38cTLpBE0/sJpcvX34iYN8U/l5foCZwpxB+83GHEk2Cr57JyfTogdaAJ x8dBhHKQCA/HxSQUQLN6nFqRV+yT8WUR92Fhqx82+80BSen5Yzcfee/TDoW6T1IW g8CYgX9FrRaGOX066ImAuUfdAdUVjyssfs9VttDTX+HiusPeuBPx/wsRe1ZEEPlH vnltIJQb1ETV2GOZLUojKjzH6aZkjIl29XxjkYii9JTUornClG0DfW+5QT3uLrB5 gJ+G+Zmpsq8ZBx9jNDtAi7sFsoPY1Mzf+JPNCGXBra2sP2GrBAuXcxmgznRYltQ= =8IIp -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM updates from Paolo Bonzini: "4.11 is going to be a relatively large release for KVM, with a little over 200 commits and noteworthy changes for most architectures. ARM: - GICv3 save/restore - cache flushing fixes - working MSI injection for GICv3 ITS - physical timer emulation MIPS: - various improvements under the hood - support for SMP guests - a large rewrite of MMU emulation. KVM MIPS can now use MMU notifiers to support copy-on-write, KSM, idle page tracking, swapping, ballooning and everything else. KVM_CAP_READONLY_MEM is also supported, so that writes to some memory regions can be treated as MMIO. The new MMU also paves the way for hardware virtualization support. PPC: - support for POWER9 using the radix-tree MMU for host and guest - resizable hashed page table - bugfixes. s390: - expose more features to the guest - more SIMD extensions - instruction execution protection - ESOP2 x86: - improved hashing in the MMU - faster PageLRU tracking for Intel CPUs without EPT A/D bits - some refactoring of nested VMX entry/exit code, preparing for live migration support of nested hypervisors - expose yet another AVX512 CPUID bit - host-to-guest PTP support - refactoring of interrupt injection, with some optimizations thrown in and some duct tape removed. - remove lazy FPU handling - optimizations of user-mode exits - optimizations of vcpu_is_preempted() for KVM guests generic: - alternative signaling mechanism that doesn't pound on tsk->sighand->siglock" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (195 commits) x86/kvm: Provide optimized version of vcpu_is_preempted() for x86-64 x86/paravirt: Change vcp_is_preempted() arg type to long KVM: VMX: use correct vmcs_read/write for guest segment selector/base x86/kvm/vmx: Defer TR reload after VM exit x86/asm/64: Drop __cacheline_aligned from struct x86_hw_tss x86/kvm/vmx: Simplify segment_base() x86/kvm/vmx: Get rid of segment_base() on 64-bit kernels x86/kvm/vmx: Don't fetch the TSS base from the GDT x86/asm: Define the kernel TSS limit in a macro kvm: fix page struct leak in handle_vmon KVM: PPC: Book3S HV: Disable HPT resizing on POWER9 for now KVM: Return an error code only as a constant in kvm_get_dirty_log() KVM: Return an error code only as a constant in kvm_get_dirty_log_protect() KVM: Return directly after a failed copy_from_user() in kvm_vm_compat_ioctl() KVM: x86: remove code for lazy FPU handling KVM: race-free exit from KVM_RUN without POSIX signals KVM: PPC: Book3S HV: Turn "KVM guest htab" message into a debug message KVM: PPC: Book3S PR: Ratelimit copy data failure error messages KVM: Support vCPU-based gfn->hva cache KVM: use separate generations for each address space ...	2017-02-22 18:22:53 -08:00
Linus Torvalds	e30aee9e10	char/misc driver patches for 4.11-rc1 Here is the big char/misc driver patchset for 4.11-rc1. Lots of different driver subsystems updated here. Rework for the hyperv subsystem to handle new platforms better, mei and w1 and extcon driver updates, as well as a number of other "minor" driver updates. Full details are in the shortlog below. All of these have been in linux-next for a while with no reported issues. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWK2iRQ8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ynhFACguVE+/ixj5u5bT5DXQaZNai/6zIAAmgMWwd/t YTD2cwsJsGbTT1fY3SUe =CiSI -----END PGP SIGNATURE----- Merge tag 'char-misc-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver updates from Greg KH: "Here is the big char/misc driver patchset for 4.11-rc1. Lots of different driver subsystems updated here: rework for the hyperv subsystem to handle new platforms better, mei and w1 and extcon driver updates, as well as a number of other "minor" driver updates. All of these have been in linux-next for a while with no reported issues" * tag 'char-misc-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (169 commits) goldfish: Sanitize the broken interrupt handler x86/platform/goldfish: Prevent unconditional loading vmbus: replace modulus operation with subtraction vmbus: constify parameters where possible vmbus: expose hv_begin/end_read vmbus: remove conditional locking of vmbus_write vmbus: add direct isr callback mode vmbus: change to per channel tasklet vmbus: put related per-cpu variable together vmbus: callback is in softirq not workqueue binder: Add support for file-descriptor arrays binder: Add support for scatter-gather binder: Add extra size to allocator binder: Refactor binder_transact() binder: Support multiple /dev instances binder: Deal with contexts in debugfs binder: Support multiple context managers binder: Split flat_binder_object auxdisplay: ht16k33: remove private workqueue auxdisplay: ht16k33: rework input device initialization ...	2017-02-22 11:38:22 -08:00
Waiman Long	dd0fd8bca1	x86/kvm: Provide optimized version of vcpu_is_preempted() for x86-64 It was found when running fio sequential write test with a XFS ramdisk on a KVM guest running on a 2-socket x86-64 system, the %CPU times as reported by perf were as follows: 69.75% 0.59% fio [k] down_write 69.15% 0.01% fio [k] call_rwsem_down_write_failed 67.12% 1.12% fio [k] rwsem_down_write_failed 63.48% 52.77% fio [k] osq_lock 9.46% 7.88% fio [k] __raw_callee_save___kvm_vcpu_is_preempt 3.93% 3.93% fio [k] __kvm_vcpu_is_preempted Making vcpu_is_preempted() a callee-save function has a relatively high cost on x86-64 primarily due to at least one more cacheline of data access from the saving and restoring of registers (8 of them) to and from stack as well as one more level of function call. To reduce this performance overhead, an optimized assembly version of the the __raw_callee_save___kvm_vcpu_is_preempt() function is provided for x86-64. With this patch applied on a KVM guest on a 2-socket 16-core 32-thread system with 16 parallel jobs (8 on each socket), the aggregrate bandwidth of the fio test on an XFS ramdisk were as follows: I/O Type w/o patch with patch -------- --------- ---------- random read 8141.2 MB/s 8497.1 MB/s seq read 8229.4 MB/s 8304.2 MB/s random write 1675.5 MB/s 1701.5 MB/s seq write 1681.3 MB/s 1699.9 MB/s There are some increases in the aggregated bandwidth because of the patch. The perf data now became: 70.78% 0.58% fio [k] down_write 70.20% 0.01% fio [k] call_rwsem_down_write_failed 69.70% 1.17% fio [k] rwsem_down_write_failed 59.91% 55.42% fio [k] osq_lock 10.14% 10.14% fio [k] __kvm_vcpu_is_preempted The assembly code was verified by using a test kernel module to compare the output of C __kvm_vcpu_is_preempted() and that of assembly __raw_callee_save___kvm_vcpu_is_preempt() to verify that they matched. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Waiman Long <longman@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-02-21 12:48:35 +01:00
Waiman Long	6c62985d57	x86/paravirt: Change vcp_is_preempted() arg type to long The cpu argument in the function prototype of vcpu_is_preempted() is changed from int to long. That makes it easier to provide a better optimized assembly version of that function. For Xen, vcpu_is_preempted(long) calls xen_vcpu_stolen(int), the downcast from long to int is not a problem as vCPU number won't exceed 32 bits. Signed-off-by: Waiman Long <longman@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-02-21 12:48:06 +01:00
Andy Lutomirski	b7ffc44d5b	x86/kvm/vmx: Defer TR reload after VM exit Intel's VMX is daft and resets the hidden TSS limit register to 0x67 on VMX reload, and the 0x67 is not configurable. KVM currently reloads TR using the LTR instruction on every exit, but this is quite slow because LTR is serializing. The 0x67 limit is entirely harmless unless ioperm() is in use, so defer the reload until a task using ioperm() is actually running. Here's some poorly done benchmarking using kvm-unit-tests: Before: cpuid 1313 vmcall 1195 mov_from_cr8 11 mov_to_cr8 17 inl_from_pmtimer 6770 inl_from_qemu 6856 inl_from_kernel 2435 outl_to_kernel 1402 After: cpuid 1291 vmcall 1181 mov_from_cr8 11 mov_to_cr8 16 inl_from_pmtimer 6457 inl_from_qemu 6209 inl_from_kernel 2339 outl_to_kernel 1391 Signed-off-by: Andy Lutomirski <luto@kernel.org> [Force-reload TR in invalidate_tss_limit. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-02-21 12:45:08 +01:00
Linus Torvalds	43e31e4047	ACPI updates for v4.11-rc1 - Update of the ACPICA code in the kernel to upstream revision 20170119 including: * Fixes related to the handling of the bit width and bit offset fields in Generic Address Structure (Lv Zheng). * ACPI resources handling fix related to invalid resource descriptors (Bob Moore). * Fix to enable implicit result conversion for several ASL library functions (Bob Moore). * Support for method invocations as target operands in AML (Bob Moore). * Fix to use a correct operand type for DeRefOf() in some situations (Bob Moore). * Utilities updates (Bob Moore, Lv Zheng). * Disassembler/debugger updates (David Box, Lv Zheng). * Build fixes (Colin Ian King, Lv Zheng). * Update of copyright notices in all files (Bob Moore). - Fix for modalias handling for SPI and I2C devices with DT-compatible identification strings (Dan O'Donovan). - Fixes for the ACPI EC and button drivers (Lv Zheng). - ACPI processor handling fix related to CPU hotplug (online/offline) on x86 (Vitaly Kuznetsov). - Suspend quirk to save/restore NVS memory over S3 transitions for Lenovo G50-45 (Zhang Rui). - Message formatting fix for the ACPI APEI code (Colin Ian King). -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJYq3J5AAoJEILEb/54YlRx9UcP/0434BwmytZkmo5vKGtmzyuE G4RoVNgCegq6BX8KxbML6UHHb+z7XlSHgH3mTU+Csin3OOQ4w3rgDyhwUEK2mWBO 5bU1hwHRZfy4cpPGrAVDdAXSARJRaRBrl4Y8nZx2SD34WCVzMZJVEvBPPkjVFJP0 1XQuGvteORcuOD5Sc1XfEStsJUVo5Uim9IaF0tHrdXhkrlsNWgMTIxt9TIKdUOJ0 JtPK/qNQz5xK4DYo5ny9yLEAxhUFmHoQZzRLWST27eeIxtSZLAErk/Jp64sSQ1uK tsHD++7PrjfniHxp+uVPZKi3BexM1CyvQ7sv/amQILgH4cUhWBx7kNZtb85muwWw OlgkFZino19oKmdu0w/1KgLAQ71PDo+oMcc+yR1PFWwGhaYR3n/MEsjmQI8/VvcA PrCOOrsrW4CNZGf6nN9xunsXMMXacWMdQBV0TspXRRmtFnXdSixp7AurJl8UFg7u 7j8vUgn2HVOIvEnBxVQCOFT2nZLyEzRL+gXNjWxGs3WJsUlYGKjD7f/SGgo3ztQh 4VxX0aXWk1vSQ/X1sszhF4GWHIgeeYYY06gvH0cXImRZhI5X0hrLuJrNt5vxoP+u RzsXGuHZ5VA0YxEHOPq/o7EmG1va0JnbuyGFvdR3QUOsqIG1Z/+5DZzdJybm0chq E+/X0juoMuY/ZB0BXi6t =aJJ+ -----END PGP SIGNATURE----- Merge tag 'acpi-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI updates from Rafael Wysocki: "These update the ACPICA code in the kernel to upstream revision 20170119, which among other things updates copyright notices in all of the ACPICA files, fix a couple of issues in the ACPI EC and button drivers, fix modalias handling for non-discoverable devices with DT-compatible identification strings, add a suspend quirk for one platform and fix a message in the APEI code. Specifics: - Update of the ACPICA code in the kernel to upstream revision 20170119 including: + Fixes related to the handling of the bit width and bit offset fields in Generic Address Structure (Lv Zheng) + ACPI resources handling fix related to invalid resource descriptors (Bob Moore) + Fix to enable implicit result conversion for several ASL library functions (Bob Moore) + Support for method invocations as target operands in AML (Bob Moore) + Fix to use a correct operand type for DeRefOf() in some situations (Bob Moore) + Utilities updates (Bob Moore, Lv Zheng) + Disassembler/debugger updates (David Box, Lv Zheng) + Build fixes (Colin Ian King, Lv Zheng) + Update of copyright notices in all files (Bob Moore) - Fix for modalias handling for SPI and I2C devices with DT-compatible identification strings (Dan O'Donovan) - Fixes for the ACPI EC and button drivers (Lv Zheng) - ACPI processor handling fix related to CPU hotplug (online/offline) on x86 (Vitaly Kuznetsov) - Suspend quirk to save/restore NVS memory over S3 transitions for Lenovo G50-45 (Zhang Rui) - Message formatting fix for the ACPI APEI code (Colin Ian King)" * tag 'acpi-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (32 commits) ACPICA: Update version to 20170119 ACPICA: Tools: Update common signon, remove compilation bit width ACPICA: Source tree: Update copyright notices to 2017 ACPICA: Linuxize: Restore and fix Intel compiler build x86/ACPI: keep x86_cpu_to_acpiid mapping valid on CPU hotplug spi: acpi: Initialize modalias from of_compatible i2c: acpi: Initialize info.type from of_compatible ACPI / bus: Introduce acpi_of_modalias() equiv of of_modalias_node() ACPI: save NVS memory for Lenovo G50-45 ACPI, APEI, EINJ: fix malformed newline escape ACPI / button: Remove lid_init_state=method mode ACPI / button: Change default behavior to lid_init_state=open ACPI / EC: Use busy polling mode when GPE is not enabled ACPI / EC: Remove old CLEAR_ON_RESUME quirk ACPICA: Update version to 20161222 ACPICA: Parser: Update parse info table for some operators ACPICA: Fix a problem with recent extra support for control method invocations ACPICA: Parser: Allow method invocations as target operands ACPICA: Fix for implicit result conversion for the ToXXX functions ACPICA: Resources: Not a valid resource if buffer length too long ..	2017-02-20 17:55:15 -08:00
Linus Torvalds	02c3de1105	Power management updates for v4.11-rc1 - Operating Performance Points (OPP) framework fixes, cleanups and switch over from RCU-based synchronization to reference counting using krefs (Viresh Kumar, Wei Yongjun, Dave Gerlach). - cpufreq core cleanups and documentation updates (Viresh Kumar, Rafael Wysocki). - New cpufreq driver for Broadcom BMIPS SoCs (Markus Mayer). - New cpufreq-dt sub-driver for TI SoCs requiring special handling, like in the AM335x, AM437x, DRA7x, and AM57x families, along with new DT bindings for it (Dave Gerlach, Paul Gortmaker). - ARM64 SoCs support for the qoriq cpufreq driver (Tang Yuantian). - intel_pstate driver updates including a new sysfs knob to control the driver's operation mode and fixes related to the no_turbo sysfs knob and the hardware-managed P-states feature support (Rafael Wysocki, Srinivas Pandruvada). - New interface to export ultra-turbo frequencies for the powernv cpufreq driver (Shilpasri Bhat). - Assorted fixes for cpufreq drivers (Arnd Bergmann, Dan Carpenter, Wei Yongjun). - devfreq core fixes, mostly related to the sysfs interface exported by it (Chanwoo Choi, Chris Diamand). - Updates of the exynos-bus and exynos-ppmu devfreq drivers (Chanwoo Choi). - Device PM QoS extension to support CPUs and support for per-CPU wakeup (device resume) latency constraints in the cpuidle menu governor (Alex Shi). - Wakeup IRQs framework fixes (Grygorii Strashko). - Generic power domains framework update including a fix to make it handle asynchronous invocations of noirq suspend/resume callbacks correctly (Ulf Hansson, Geert Uytterhoeven). - Assorted fixes and cleanups in the core suspend/hibernate code, PM QoS framework and x86 ACPI idle support code (Corentin Labbe, Geert Uytterhoeven, Geliang Tang, John Keeping, Nick Desaulniers). - Update of the analyze_suspend.py script is updated to version 4.5 offering multiple improvements (Todd Brandt). - New tool for intel_pstate diagnostics using the pstate_sample tracepoint (Doug Smythies). -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJYq3IjAAoJEILEb/54YlRx/lYP+gNXhfETSzjd4kWSHy3FVEDb gc5rMiE2j0OYgVSXwBI7p4EqMPy56lSWBASvbF2o6v9CIxb880KLFEsMDCVHwn46 6xfEnIRxf1oeRqn7EG9ZPIcTgNsUyvK+gah7zgLXu/0KU7ceXxygvNk47qpeOZ8f dKYgIk/TOSGPC8H2nsg8VBKlK/ZOj5hID4F3MmFw6yDuWVCYuh2EokYXS4Nx0JwY UQGpWtz+FWWs71vhgVl33GbPXWvPqA7OMe0btZ3RCnhnz4tA/mH+jDWiaspCdS3J vKGeZyZptjIMJcufm3X7s7ghYjELheqQusMODDXk4AaWQ5nz8V5/h7NThYfa9J1b M93Tb0rMb2MqUhBpv/M6D3qQroZmhq55QKfQrul3QWSOiQUzTWJcbbpyeBQ7nkrI F1qNqQfuCnBL/r9y7HpW8P2iFg9kCHkwTtXMdp/lzGXdKzSGtAUSkYg5ohnUzQTp 2WCPTEk+5DxLVPjW5rDoZOotr5p1kdcdWBk6r3MEWRokZK6PJo7rJBcnTtXSo2mO lLRba006q+fTlI5wZtjAI0rOiS3JgtT6cRx7uPjZlze9TGjklJhdsCPJbM5gcOT+ YiOxvqD+9if5QRSxiEZNj3bQ43wYhXmpctfIanyxziq09BPIPxvgfRR/BkUzc34R ps4CIvImim5v5xc8Zsbk =57xJ -----END PGP SIGNATURE----- Merge tag 'pm-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "The majority of changes go into the Operating Performance Points (OPP) framework and cpufreq this time, followed by devfreq and some scattered updates all over. The OPP changes are mostly related to switching over from RCU-based synchronization, that turned out to be overly complicated and problematic, to reference counting using krefs. In the cpufreq land there are core cleanups, documentation updates, a new driver for Broadcom BMIPS SoCs, a new cpufreq-dt sub-driver for TI SoCs that require special handling, ARM64 SoCs support for the qoriq driver, intel_pstate updates, powernv driver update and assorted fixes. The devfreq changes are mostly fixes related to the sysfs interface and some Exynos drivers updates. Apart from that, the cpuidle menu governor will support per-CPU PM QoS constraints for the wakeup latency now, some bugs in the wakeup IRQs framework are fixed, the generic power domains framework should handle asynchronous invocations of noirq suspend/resume callbacks from now on, the analyze_suspend.py script is updated and there is a new tool for intel_pstate diagnostics. Specifics: - Operating Performance Points (OPP) framework fixes, cleanups and switch over from RCU-based synchronization to reference counting using krefs (Viresh Kumar, Wei Yongjun, Dave Gerlach) - cpufreq core cleanups and documentation updates (Viresh Kumar, Rafael Wysocki) - New cpufreq driver for Broadcom BMIPS SoCs (Markus Mayer) - New cpufreq-dt sub-driver for TI SoCs requiring special handling, like in the AM335x, AM437x, DRA7x, and AM57x families, along with new DT bindings for it (Dave Gerlach, Paul Gortmaker) - ARM64 SoCs support for the qoriq cpufreq driver (Tang Yuantian) - intel_pstate driver updates including a new sysfs knob to control the driver's operation mode and fixes related to the no_turbo sysfs knob and the hardware-managed P-states feature support (Rafael Wysocki, Srinivas Pandruvada) - New interface to export ultra-turbo frequencies for the powernv cpufreq driver (Shilpasri Bhat) - Assorted fixes for cpufreq drivers (Arnd Bergmann, Dan Carpenter, Wei Yongjun) - devfreq core fixes, mostly related to the sysfs interface exported by it (Chanwoo Choi, Chris Diamand) - Updates of the exynos-bus and exynos-ppmu devfreq drivers (Chanwoo Choi) - Device PM QoS extension to support CPUs and support for per-CPU wakeup (device resume) latency constraints in the cpuidle menu governor (Alex Shi) - Wakeup IRQs framework fixes (Grygorii Strashko) - Generic power domains framework update including a fix to make it handle asynchronous invocations of noirq suspend/resume callbacks correctly (Ulf Hansson, Geert Uytterhoeven) - Assorted fixes and cleanups in the core suspend/hibernate code, PM QoS framework and x86 ACPI idle support code (Corentin Labbe, Geert Uytterhoeven, Geliang Tang, John Keeping, Nick Desaulniers) - Update of the analyze_suspend.py script is updated to version 4.5 offering multiple improvements (Todd Brandt) - New tool for intel_pstate diagnostics using the pstate_sample tracepoint (Doug Smythies)" tag 'pm-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (85 commits) MAINTAINERS: cpufreq: add bmips-cpufreq.c PM / QoS: Fix memory leak on resume_latency.notifiers PM / Documentation: Spelling s/wrtie/write/ PM / sleep: Fix test_suspend after sleep state rework cpufreq: CPPC: add ACPI_PROCESSOR dependency cpufreq: make ti-cpufreq explicitly non-modular cpufreq: Do not clear real_cpus mask on policy init tools/power/x86: Debug utility for intel_pstate driver AnalyzeSuspend: fix drag and zoom bug in javascript PM / wakeirq: report a wakeup_event on dedicated wekup irq PM / wakeirq: Fix spurious wake-up events for dedicated wakeirqs PM / wakeirq: Enable dedicated wakeirq for suspend cpufreq: dt: Don't use generic platdev driver for ti-cpufreq platforms cpufreq: ti: Add cpufreq driver to determine available OPPs at runtime Documentation: dt: add bindings for ti-cpufreq PM / OPP: Expose _of_get_opp_desc_node as dev_pm_opp API cpufreq: qoriq: Don't look at clock implementation details cpufreq: qoriq: add ARM64 SoCs support PM / Domains: Provide dummy governors if CONFIG_PM_GENERIC_DOMAINS=n cpufreq: brcmstb-avs-cpufreq: remove unnecessary platform_set_drvdata() ...	2017-02-20 17:41:31 -08:00
Linus Torvalds	c945d0227d	Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 platform updates from Ingo Molnar: "Misc platform updates: SGI UV4 support additions, intel-mid Merrifield enhancements and purge of old code" * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits) x86/platform/UV/NMI: Fix uneccessary kABI breakage x86/platform/UV: Clean up the NMI code to match current coding style x86/platform/UV: Ensure uv_system_init is called when necessary x86/platform/UV: Initialize PCH GPP_D_0 NMI Pin to be NMI source x86/platform/UV: Verify NMI action is valid, default is standard x86/platform/UV: Add basic CPU NMI health check x86/platform/UV: Add Support for UV4 Hubless NMIs x86/platform/UV: Add Support for UV4 Hubless systems x86/platform/UV: Clean up the UV APIC code x86/platform/intel-mid: Move watchdog registration to arch_initcall() x86/platform/intel-mid: Don't shadow error code of mp_map_gsi_to_irq() x86/platform/intel-mid: Allocate RTC interrupt for Merrifield x86/ioapic: Return suitable error code in mp_map_gsi_to_irq() x86/platform/UV: Fix 2 socket config problem x86/platform/UV: Fix panic with missing UVsystab support x86/platform/intel-mid: Enable RTC on Intel Merrifield x86/platform/intel: Remove PMIC GPIO block support x86/platform/intel-mid: Make intel_scu_device_register() static x86/platform/intel-mid: Enable GPIO keys on Merrifield x86/platform/intel-mid: Get rid of duplication of IPC handler ...	2017-02-20 16:26:57 -08:00
Linus Torvalds	8b5abde16b	Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm updates from Ingo Molnar: "A laundry list of changes: KASAN improvements/fixes for ptdump, a self-test fix, PAT cleanup and wbinvd() avoidance, removal of stale code and documentation updates" * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm/ptdump: Add address marker for KASAN shadow region x86/mm/ptdump: Optimize check for W+X mappings for CONFIG_KASAN=y x86/mm/pat: Use rb_entry() x86/mpx: Re-add MPX to selftests Makefile x86/mm: Remove CONFIG_DEBUG_NX_TEST x86/mm/cpa: Avoid wbinvd() for PREEMPT x86/mm: Improve documentation for low-level device I/O functions	2017-02-20 15:57:19 -08:00
Linus Torvalds	a25a1d6c24	Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 microcode updates from Ingo Molnar: "The main changes are further simplification and unification of the code between the AMD and Intel microcode loaders, plus other simplifications - by Borislav Petkov" * 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/microcode/AMD: Remove struct cont_desc.eq_id x86/microcode/AMD: Remove AP scanning optimization x86/microcode/AMD: Simplify saving from initrd x86/microcode/AMD: Unify load_ucode_amd_ap() x86/microcode/AMD: Check patch level only on the BSP x86/microcode: Remove local vendor variable x86/microcode/AMD: Use find_microcode_in_initrd() x86/microcode/AMD: Get rid of global this_equiv_id x86/microcode: Decrease CPUID use x86/microcode/AMD: Rework container parsing x86/microcode/AMD: Extend the container struct x86/microcode/AMD: Shorten function parameter's name x86/microcode/AMD: Clean up find_equiv_id() x86/microcode: Convert to bare minimum MSR accessors x86/MSR: Carve out bare minimum accessors	2017-02-20 15:30:51 -08:00
Linus Torvalds	280d7a1ede	Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fpu updates from Ingo Molnar: "The main changes relate to fixes between (lack of) CPUID and FPU detection that should only affect old or weird CPUs, by Andy Lutomirski" * 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/fpu: Fix the "Giving up, no FPU found" test x86/fpu: Fix CPUID-less FPU detection x86/fpu: Fix "x86/fpu: Legacy x87 FPU detected" message x86/cpu: Re-apply forced caps every time CPU caps are re-read x86/cpu: Factor out application of forced CPU caps x86/cpu: Add X86_FEATURE_CPUID x86/fpu/xstate: Move XSAVES state init to a function	2017-02-20 15:03:51 -08:00
Linus Torvalds	8a9365a472	Merge branch 'x86-cpufeature-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpufeature updates from Ingo Molnar: "The main changes in this cycle were related to enable ring-3 MONITOR/MWAIT instructions support on supported CPUs, by Grzegorz Andrejczuk and Piotr Luc" * 'x86-cpufeature-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpufeature: Move RING3MWAIT feature to avoid conflicts x86/cpufeature: Enable RING3MWAIT for Knights Mill x86/cpufeature: Enable RING3MWAIT for Knights Landing x86/cpufeature: Add RING3MWAIT to CPU features x86/elf: Add HWCAP2 to expose ring 3 MONITOR/MWAIT x86/msr: Add MSR_MISC_FEATURE_ENABLES and RING3MWAIT bit x86/cpufeature: Add AVX512_VPOPCNTDQ feature	2017-02-20 14:37:08 -08:00
Linus Torvalds	2891e8e667	Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: "Two small cleanups" * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/traps: Get rid of unnecessary preempt_disable/preempt_enable_no_resched x86/pci-calgary: Fix iommu_free() comparison of unsigned expression >= 0	2017-02-20 14:34:23 -08:00
Linus Torvalds	292d386743	Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 boot updates from Ingo Molnar: "Misc updates: - fix e820 error handling - convert page table setup code from assembly to C - fix kexec environment bug - ... plus small cleanups" * 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/kconfig: Remove misleading note regarding hibernation and KASLR x86/boot: Fix KASLR and memmap= collision x86/e820/32: Fix e820_search_gap() error handling on x86-32 x86/boot/32: Convert the 32-bit pgtable setup code from assembly to C x86/e820: Make e820_search_gap() static and remove unused variables	2017-02-20 14:04:37 -08:00
Linus Torvalds	4cee9fe53e	Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 apic changes from Ingo Molnar: "The main changes in this cycle were: - Re-activate the hw IRQ resend mechanism that was downgraded to a sw-resend unintentionally. (Ruslan Ruslichenko) - Avoid sporadic spurious hrtimer interrupts (Frederic Weisbecker)" [ Let's see if the io_apic retrigger ends up surviving this release, it got reverted last time because it found problems elsewhere - Linus ] * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/apic: Fix a typo in a comment line x86/ioapic: Restore IO-APIC irq_chip retrigger callback x86/apic: Implement set_state_oneshot_stopped() callback x86/apic: Fix typos in comments	2017-02-20 14:01:21 -08:00
Linus Torvalds	42e1b14b6e	Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "The main changes in this cycle were: - Implement wraparound-safe refcount_t and kref_t types based on generic atomic primitives (Peter Zijlstra) - Improve and fix the ww_mutex code (Nicolai Hähnle) - Add self-tests to the ww_mutex code (Chris Wilson) - Optimize percpu-rwsems with the 'rcuwait' mechanism (Davidlohr Bueso) - Micro-optimize the current-task logic all around the core kernel (Davidlohr Bueso) - Tidy up after recent optimizations: remove stale code and APIs, clean up the code (Waiman Long) - ... plus misc fixes, updates and cleanups" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits) fork: Fix task_struct alignment locking/spinlock/debug: Remove spinlock lockup detection code lockdep: Fix incorrect condition to print bug msgs for MAX_LOCKDEP_CHAIN_HLOCKS lkdtm: Convert to refcount_t testing kref: Implement 'struct kref' using refcount_t refcount_t: Introduce a special purpose refcount type sched/wake_q: Clarify queue reinit comment sched/wait, rcuwait: Fix typo in comment locking/mutex: Fix lockdep_assert_held() fail locking/rtmutex: Flip unlikely() branch to likely() in __rt_mutex_slowlock() locking/rwsem: Reinit wake_q after use locking/rwsem: Remove unnecessary atomic_long_t casts jump_labels: Move header guard #endif down where it belongs locking/atomic, kref: Implement kref_put_lock() locking/ww_mutex: Turn off __must_check for now locking/atomic, kref: Avoid more abuse locking/atomic, kref: Use kref_get_unless_zero() more locking/atomic, kref: Kill kref_sub() locking/atomic, kref: Add kref_read() locking/atomic, kref: Add KREF_INIT() ...	2017-02-20 13:23:30 -08:00
Linus Torvalds	828cad8ea0	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "The main changes in this (fairly busy) cycle were: - There was a class of scheduler bugs related to forgetting to update the rq-clock timestamp which can cause weird and hard to debug problems, so there's a new debug facility for this: which uncovered a whole lot of bugs which convinced us that we want to keep the debug facility. (Peter Zijlstra, Matt Fleming) - Various cputime related updates: eliminate cputime and use u64 nanoseconds directly, simplify and improve the arch interfaces, implement delayed accounting more widely, etc. - (Frederic Weisbecker) - Move code around for better structure plus cleanups (Ingo Molnar) - Move IO schedule accounting deeper into the scheduler plus related changes to improve the situation (Tejun Heo) - ... plus a round of sched/rt and sched/deadline fixes, plus other fixes, updats and cleanups" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (85 commits) sched/core: Remove unlikely() annotation from sched_move_task() sched/autogroup: Rename auto_group.[ch] to autogroup.[ch] sched/topology: Split out scheduler topology code from core.c into topology.c sched/core: Remove unnecessary #include headers sched/rq_clock: Consolidate the ordering of the rq_clock methods delayacct: Include <uapi/linux/taskstats.h> sched/core: Clean up comments sched/rt: Show the 'sched_rr_timeslice' SCHED_RR timeslice tuning knob in milliseconds sched/clock: Add dummy clear_sched_clock_stable() stub function sched/cputime: Remove generic asm headers sched/cputime: Remove unused nsec_to_cputime() s390, sched/cputime: Remove unused cputime definitions powerpc, sched/cputime: Remove unused cputime definitions s390, sched/cputime: Make arch_cpu_idle_time() to return nsecs ia64, sched/cputime: Remove unused cputime definitions ia64: Convert vtime to use nsec units directly ia64, sched/cputime: Move the nsecs based cputime headers to the last arch using it sched/cputime: Remove jiffies based cputime sched/cputime, vtime: Return nsecs instead of cputime_t to account sched/cputime: Complete nsec conversion of tick based accounting ...	2017-02-20 12:52:55 -08:00
Linus Torvalds	60c906bab1	Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS updates from Ingo Molnar: "The main changes in this cycle were: - Assign notifier chain priorities for all RAS related handlers to make the ordering explicit (Borislav Petkov) - Improve the AMD MCA banks sysfs output (Yazen Ghannam) - Various cleanups and restructuring of the x86 RAS code (Borislav Petkov)" * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/ras, EDAC, acpi: Assign MCE notifier handlers a priority x86/ras: Get rid of mce_process_work() EDAC/mce/amd: Dump TSC value EDAC/mce/amd: Unexport amd_decode_mce() x86/ras/amd/inj: Change dependency x86/ras: Flip the TSC-adding logic x86/ras/amd: Make sysfs names of banks more user-friendly x86/ras/therm_throt: Do not log a fake MCE for thermal events x86/ras/inject: Make it depend on X86_LOCAL_APIC=y	2017-02-20 12:47:44 -08:00
Linus Torvalds	7f4eb0a6d5	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "On the kernel side the main changes in this cycle were: - Add Intel Kaby Lake CPU support (Srinivas Pandruvada) - AMD uncore driver updates for fam17 (Janakarajan Natarajan) - Intel/PT updates and core events optimizations and cleanups (Alexander Shishkin) - cgroups events fixes (David Carrillo-Cisneros) - kprobes improvements (Masami Hiramatsu) - ... plus misc fixes and updates. On the tooling side the main changes were: - Support clang build in tools/{perf,lib/{bpf,traceevent,api}} with CC=clang, to, for instance, take advantage of better warnings (Arnaldo Carvalho de Melo): - Introduce the 'delta-abs' 'perf diff' compute method, that orders the histogram entries by the absolute value of the percentage delta for a function in two perf.data files, i.e. the functions that changed the most (increase or decrease in samples) comes first (Namhyung Kim) - Add support for parsing Intel uncore vendor event files and add uncore vendor events for the Intel server processors (Haswell, Broadwell, IvyBridge), Xeon Phi (Knights Landing) and Broadwell DE (Andi Kleen) - Introduce 'perf ftrace' a perf front end to the kernel's ftrace function and function_graph tracer, defaulting to the "function_graph" tracer, more work will be done in reviving this effort, forward porting it from its initial patch submission (Namhyung Kim) - Add 'e' and 'c' hotkeys to expand/collapse call chains for a single hist entry in the 'perf report' and 'perf top' TUI (Jiri Olsa) - Account thread wait time (off CPU time) separately: sleep, iowait and preempt, based on the prev_state of the last event, show the breakdown when using "perf sched timehist --state" (Namhyumg Kim) - Add more triggers to switch the output file (perf.data.TIMESTAMP). Now, in addition to switching to a different output file when receiving a SIGUSR2, one can also specify file size and time based triggers: perf record -a --switch-output=signal is equivalent to what we had before: perf record -a --switch-output While we can also ask for the file to be "sliced" by size, taking into account that that will happen only when we get woken up by the kernel, i.e. one has to take into account the --mmap-pages (the size of the perf mmap ring buffer): perf record -a --switch-output=2G will break the perf.data output into multiple files limited to 2GB of samples, right when generating the output. For time based samples, alert() will be used, so to have 1 minute limited perf.data output files: perf record -a --switch-output=1m (Jiri Olsa) - Improve 'perf trace' (Arnaldo Carvalho de Melo) - 'perf kallsyms' toy tool to look for extended symbol information on the running kernel and demonstrate the machine/thread/symbol APIs for use in other tools, such as 'perf probe' (Arnaldo Carvalho de Melo) - ... plus tons of other changes, see the shortlog and Git log for details" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (131 commits) perf tools: Add missing parse_events_error() prototype perf pmu: Fix check for unset alias->unit array perf tools: Be consistent on the type of map->symbols[] interator perf intel pt decoder: clang has no -Wno-override-init perf evsel: Do not put a variable sized type not at the end of a struct perf probe: Avoid accessing uninitialized 'map' variable perf tools: Do not put a variable sized type not at the end of a struct perf record: Do not put a variable sized type not at the end of a struct perf tests: Synthesize struct instead of using field after variable sized type perf bench numa: Make sure dprintf() is not defined Revert "perf bench futex: Sanitize numeric parameters" tools lib subcmd: Make it an error to pass a signed value to OPTION_UINTEGER tools: Set the maximum optimization level according to the compiler being used tools: Suppress request for warning options not existent in clang samples/bpf: Reset global variables samples/bpf: Ignore already processed ELF sections samples/bpf: Add missing header perf symbols: dso->name is an array, no need to check it against NULL perf tests record: No need to test an array against NULL perf symbols: No need to check if sym->name is NULL ...	2017-02-20 12:21:13 -08:00
Linus Torvalds	32e2d7c8af	Merge branch 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI updates from Ingo Molnar: "The main changes in this cycle were: - Changes to the EFI init code to establish whether secure boot authentication was performed at boot time. (Josh Boyer, David Howells) - Wire up the UEFI memory attributes table for x86. This eliminates any runtime memory regions that are both writable and executable, on recent firmware versions. (Sai Praneeth) - Move the BGRT init code to an earlier stage so that we can still use efi_mem_reserve(). (Dave Young) - Preserve debug symbols in the ARM/arm64 UEFI stub (Ard Biesheuvel) - Code deduplication work and various other cleanups (Lukas Wunner) - ... plus various other fixes and cleanups" * 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi/libstub: Make file I/O chunking x86-specific efi: Print the secure boot status in x86 setup_arch() efi: Disable secure boot if shim is in insecure mode efi: Get and store the secure boot status efi: Add SHIM and image security database GUID definitions arm/efi: Allow invocation of arbitrary runtime services x86/efi: Allow invocation of arbitrary runtime services efi/libstub: Preserve .debug sections after absolute relocation check efi/x86: Add debug code to print cooked memmap efi/x86: Move the EFI BGRT init code to early init code efi: Use typed function pointers for the runtime services table efi/esrt: Fix typo in pr_err() message x86/efi: Add support for EFI_MEMORY_ATTRIBUTES_TABLE efi: Introduce the EFI_MEM_ATTR bit and set it from the memory attributes table efi: Make EFI_MEMORY_ATTRIBUTES_TABLE initialization common across all architectures x86/efi: Deduplicate efi_char16_printk() efi: Deduplicate efi_file_size() / _read() / _close()	2017-02-20 11:47:11 -08:00
Rafael J. Wysocki	a74d1cafc2	Merge branches 'acpi-bus', 'acpi-sleep' and 'acpi-processor' * acpi-bus: spi: acpi: Initialize modalias from of_compatible i2c: acpi: Initialize info.type from of_compatible ACPI / bus: Introduce acpi_of_modalias() equiv of of_modalias_node() * acpi-sleep: ACPI: save NVS memory for Lenovo G50-45 * acpi-processor: x86/ACPI: keep x86_cpu_to_acpiid mapping valid on CPU hotplug	2017-02-20 14:28:03 +01:00
Rafael J. Wysocki	ad7eec4244	Merge branch 'pm-cpuidle' * pm-cpuidle: CPU / PM: expose pm_qos_resume_latency for CPUs cpuidle/menu: add per CPU PM QoS resume latency consideration cpuidle/menu: stop seeking deeper idle if current state is deep enough ACPI / idle: small formatting fixes	2017-02-20 14:23:21 +01:00
Ingo Molnar	8312593a55	Merge branches 'x86/cache', 'x86/debug' and 'x86/irq' into x86/urgent Pick up simple singular commits from their topic branches. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-02-20 14:16:58 +01:00
Thomas Gleixner	5b1ad68f9b	Merge branch 'linus' into x86/mm Make sure to get the latest fixes before applying the ptdump enhancements.	2017-02-16 19:51:27 +01:00
Ingo Molnar	210f400d68	Linux 4.10-rc8 -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJYoM2fAAoJEHm+PkMAQRiGr9MH/izEAMri7rJ0QMc3ejt+WmD0 8pkZw3+MVn71z6cIEgpzk4QkEWJd5rfhkETCeCp7qQ9V6cDW1FDE9+0OmPjiphDt nnzKs7t7skEBwH5Mq5xygmIfkv+Z0QGHZ20gfQWY3F56Uxo+ARF88OBHBLKhqx3v 98C7YbMFLKBslKClA78NUEIdx0UfBaRqerlERx0Lfl9aoOrbBS6WI3iuREiylpih 9o7HTrwaGKkU4Kd6NdgJP2EyWPsd1LGalxBBjeDSpm5uokX6ALTdNXDZqcQscHjE RmTqJTGRdhSThXOpNnvUJvk9L442yuNRrVme/IqLpxMdHPyjaXR3FGSIDb2SfjY= =VMy8 -----END PGP SIGNATURE----- Merge tag 'v4.10-rc8' into perf/core, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-02-14 07:29:14 +01:00
Kirill A. Shutemov	3ba5b5ea7d	x86/vm86: Fix unused variable warning if THP is disabled GCC complains about unused variable 'vma' in mark_screen_rdonly() if THP is disabled: arch/x86/kernel/vm86_32.c: In function ‘mark_screen_rdonly’: arch/x86/kernel/vm86_32.c:180:26: warning: unused variable ‘vma’ [-Wunused-variable] struct vm_area_struct *vma = find_vma(mm, 0xA0000); That's silly. pmd_trans_huge() resolves to 0 when THP is disabled, so the whole block should be eliminated. Moving the variable declaration outside the if() block shuts GCC up. Reported-by: Jérémy Lefaure <jeremy.lefaure@lse.epita.fr> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Tested-by: Borislav Petkov <bp@suse.de> Cc: Carlos O'Donell <carlos@redhat.com> Link: http://lkml.kernel.org/r/20170213125228.63645-1-kirill.shutemov@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-02-13 19:04:38 +01:00
Linus Torvalds	1ce42845f9	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Last minute x86 fixes: - Fix a softlockup detector warning and long delays if using ptdump with KASAN enabled. - Two more TSC-adjust fixes for interesting firmware interactions. - Two commits to fix an AMD CPU topology enumeration bug that caused a measurable gaming performance regression" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm/ptdump: Fix soft lockup in page table walker x86/tsc: Make the TSC ADJUST sanitizing work for tsc_reliable x86/tsc: Avoid the large time jump when sanitizing TSC ADJUST x86/CPU/AMD: Fix Zen SMT topology x86/CPU/AMD: Bring back Compute Unit ID	2017-02-11 10:31:46 -08:00
Christoph Hellwig	699c4cec23	PCI/MSI: Remove pci_msi_domain_{alloc,free}_irqs() Just call the msi_* version directly instead of having trivial wrappers for one or two callsites. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>	2017-02-10 14:30:33 -06:00
Thomas Gleixner	5f2e71e714	x86/tsc: Make the TSC ADJUST sanitizing work for tsc_reliable When the TSC is marked reliable then the synchronization check is skipped, but that also skips the TSC ADJUST sanitizing code. So on a machine with a wreckaged BIOS the TSC deviation between CPUs might go unnoticed. Let the TSC adjust sanitizing code run unconditionally and just skip the expensive synchronization checks when TSC is marked reliable. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Olof Johansson <olof@lixom.net> Link: http://lkml.kernel.org/r/20170209151231.491189912@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-02-10 09:47:17 +01:00
Thomas Gleixner	f2e04214ef	x86/tsc: Avoid the large time jump when sanitizing TSC ADJUST Olof reported that on a machine which has a BIOS wreckaged TSC the timestamps in dmesg are making a large jump because the TSC value is jumping forward after resetting the TSC ADJUST register to a sane value. This can be avoided by calling the TSC ADJUST saniziting function before initializing the per cpu sched clock machinery. That takes the offset into account and avoid the time jump. What cannot be avoided is that the 'Firmware Bug' warnings on the secondary CPUs are printed with the large time offsets because it would be too much effort and ugly hackery to print those warnings into a buffer and emit them after the adjustemt on the starting CPUs. It's a firmware bug and should be fixed in firmware. The weird timestamps are collateral damage and just illustrate the sillyness of the BIOS folks: [ 0.397445] smp: Bringing up secondary CPUs ... [ 0.402100] x86: Booting SMP configuration: [ 0.406343] .... node #0, CPUs: #1 [1265776479.930667] [Firmware Bug]: TSC ADJUST differs: Reference CPU0: -2978888639075328 CPU1: -2978888639183101 [1265776479.944664] TSC ADJUST synchronize: Reference CPU0: 0 CPU1: -2978888639183101 [ 0.508119] #2 [1265776480.032346] [Firmware Bug]: TSC ADJUST differs: Reference CPU0: -2978888639075328 CPU2: -2978888639183677 [1265776480.044192] TSC ADJUST synchronize: Reference CPU0: 0 CPU2: -2978888639183677 [ 0.607643] #3 [1265776480.131874] [Firmware Bug]: TSC ADJUST differs: Reference CPU0: -2978888639075328 CPU3: -2978888639184530 [1265776480.143720] TSC ADJUST synchronize: Reference CPU0: 0 CPU3: -2978888639184530 [ 0.707108] smp: Brought up 1 node, 4 CPUs [ 0.711271] smpboot: Total of 4 processors activated (21698.88 BogoMIPS) Reported-by: Olof Johansson <olof@lixom.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20170209151231.411460506@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-02-10 09:47:16 +01:00
Paolo Bonzini	2e751dfb5f	kvmarm updates for 4.11 - GICv3 save restore - Cache flushing fixes - MSI injection fix for GICv3 ITS - Physical timer emulation support -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJYnICmAAoJECPQ0LrRPXpDC04P/A73ZEL6m0vUzGpuvclxwWc6 OCJ2C9kYloK+twyGLFbPprI4eN/70dpThgFE1Zr+ol/vAOhQlGQJoarc4n4+eYyb 8e8IxM5Cmi44HUB64xOInidLacqeyRy5+TvKXIH0aHLgpdynSEQJu88RVXUvVgvs IZizhTpYueDYdexNNEkL5r2yJhVZCaczyjB1vU8k5MdODLDM63ABnPOSNJNXio2x itoO0EU1Lb9GhuzQj0hiMvKJPyviuPHwau7AhokUSjDPaHzaQT7TgSVioKov/rl6 bRzhPmXqesex97ZWA5Fxr8jgSNR7JyRz+bzCLEry7XFaI3chbe0YvXeRv32PNH7I meuycQw64gsKmfJGRNlq30qhQQfv4fTbzpZP/j1UbvKNwhK5J6e7037c1CUH4i9C p9UO9HF/zAMqzD3iMcDZSpaFcbhJYrfQufbhTnbHfGC5AMVJEOWheHSEmzlDWnwr K5fPBxnsPv58hDmp/UZUTqCEPusY+HyuOq4ZumFSsnBwjdW+z9mLuaaTJbxaqR/G B6dfSQNwSnw6b2lbiXPUCm6c+Z9b190pUEWdwJ4kOTxwiPUWBppVU7gE2TrjnQ8m aIvEBPGIf58okjEewA5Dni6qjv7CjDN5z1V0vZUTTdVw8xuhX9eJ1Cx853SM7n0U sJgW5nSvSLDUpizSKdRI =H4vX -----END PGP SIGNATURE----- Merge tag 'kvmarm-for-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD kvmarm updates for 4.11 - GICv3 save restore - Cache flushing fixes - MSI injection fix for GICv3 ITS - Physical timer emulation support	2017-02-09 16:01:23 +01:00
Linus Torvalds	d966564fcd	Revert "x86/ioapic: Restore IO-APIC irq_chip retrigger callback" This reverts commit `020eb3daab`. Gabriel C reports that it causes his machine to not boot, and we haven't tracked down the reason for it yet. Since the bug it fixes has been around for a longish time, we're better off reverting the fix for now. Gabriel says: "It hangs early and freezes with a lot RCU warnings. I bisected it down to : > Ruslan Ruslichenko (1): > x86/ioapic: Restore IO-APIC irq_chip retrigger callback Reverting this one fixes the problem for me.. The box is a PRIMERGY TX200 S5 , 2 socket , 2 x E5520 CPU(s) installed" and Ruslan and Thomas are currently stumped. Reported-and-bisected-by: Gabriel C <nix.or.die@gmail.com> Cc: Ruslan Ruslichenko <rruslich@cisco.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@kernel.org # for the backport of the original commit Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-08 18:08:29 -08:00
Marcelo Tosatti	f4066c2bc4	kvmclock: export kvmclock clocksource and data pointers To be used by KVM PTP driver. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-02-08 17:16:19 +01:00
Vitaly Kuznetsov	febf240741	x86/ACPI: keep x86_cpu_to_acpiid mapping valid on CPU hotplug We may or may not have all possible CPUs in MADT on boot but in any case we're overwriting x86_cpu_to_acpiid mapping with U32_MAX when acpi_register_lapic() is called again on the CPU hotplug path: acpi_processor_hotadd_init() -> acpi_map_cpu() -> acpi_register_lapic() As we have the required acpi_id information in acpi_processor_hotadd_init() propagate it to acpi_map_cpu() to always keep x86_cpu_to_acpiid mapping valid. Reported-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2017-02-07 13:34:56 +01:00
David Howells	9661b33204	efi: Print the secure boot status in x86 setup_arch() Print the secure boot status in the x86 setup_arch() function, but otherwise do nothing more for now. More functionality will be added later, but this at least allows for testing. Signed-off-by: David Howells <dhowells@redhat.com> [ Use efi_enabled() instead of IS_ENABLED(CONFIG_EFI). ] Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/1486380166-31868-7-git-send-email-ard.biesheuvel@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-02-07 10:42:10 +01:00
David Howells	de8cb45862	efi: Get and store the secure boot status Get the firmware's secure-boot status in the kernel boot wrapper and stash it somewhere that the main kernel image can find. The efi_get_secureboot() function is extracted from the ARM stub and (a) generalised so that it can be called from x86 and (b) made to use efi_call_runtime() so that it can be run in mixed-mode. For x86, it is stored in boot_params and can be overridden by the boot loader or kexec. This allows secure-boot mode to be passed on to a new kernel. Suggested-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/1486380166-31868-5-git-send-email-ard.biesheuvel@linaro.org [ Small readability edits. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-02-07 10:42:10 +01:00
Dou Liyang	543113d2f4	x86/apic: Fix a typo in a comment line s/bringin /bringing Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: trivial@kernel.org Link: http://lkml.kernel.org/r/1486442688-24690-1-git-send-email-douly.fnst@cn.fujitsu.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-02-07 10:40:27 +01:00
Ingo Molnar	87a8d03266	Linux 4.10-rc7 -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJYl7EJAAoJEHm+PkMAQRiGZ7kIAIQJNCOInU/14K5tjEz9jrKM VPWebPm8a1E36C/Nk7mXOW1onOLSHJa7YacmmazbncmtfOANyaUqKVME88+lTWT1 Pj2ZJyeQE7XpxY0N1eXy0PWZPgPI4ENcYXXERueHTFClXdlK6550obyenw/Tqxhv na5Yw66GSXMdNy/kTsK1pp3aJaENYWb2ueYiwr4YNQPUjhs9Y2zKAlMBwHOjuzol aGz0482M4cKY6UmMmi8DVVEET4Sp6cQ9YCjtOP+NUOyEjAJ6XG16SejYTSQyjdK7 w0AAc9F2uCjlNPNy6QvJRO0FFiNhSdaYspPt0IgWa6bpY4m26n1DEBdqKT1uzO4= =e20h -----END PGP SIGNATURE----- Merge tag 'v4.10-rc7' into efi/core, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-02-07 08:49:17 +01:00
Masami Hiramatsu	b6263178b8	kprobes/x86: Use hlist_for_each_entry() instead of hlist_for_each_entry_safe() Use hlist_for_each_entry() in the first loop in the kretprobe trampoline_handler() function, because it doesn't change the hlist. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/148637493309.19245.12546866092052500584.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-02-06 11:07:07 +01:00
Greg Kroah-Hartman	17fa87fe5a	Merge 4.10-rc7 into char-misc-next We want the hv and other fixes in here as well to handle merge and testing issues. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-02-06 09:39:13 +01:00
Yazen Ghannam	08b259631b	x86/CPU/AMD: Fix Zen SMT topology After: `a33d331761` ("x86/CPU/AMD: Fix Bulldozer topology") our SMT scheduling topology for Fam17h systems is broken, because the ThreadId is included in the ApicId when SMT is enabled. So, without further decoding cpu_core_id is unique for each thread rather than the same for threads on the same core. This didn't affect systems with SMT disabled. Make cpu_core_id be what it is defined to be. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> # 4.9 Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170205105022.8705-2-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-02-05 12:18:45 +01:00
Borislav Petkov	79a8b9aa38	x86/CPU/AMD: Bring back Compute Unit ID Commit: `a33d331761` ("x86/CPU/AMD: Fix Bulldozer topology") restored the initial approach we had with the Fam15h topology of enumerating CU (Compute Unit) threads as cores. And this is still correct - they're beefier than HT threads but still have some shared functionality. Our current approach has a problem with the Mad Max Steam game, for example. Yves Dionne reported a certain "choppiness" while playing on v4.9.5. That problem stems most likely from the fact that the CU threads share resources within one CU and when we schedule to a thread of a different compute unit, this incurs latency due to migrating the working set to a different CU through the caches. When the thread siblings mask mirrors that aspect of the CUs and threads, the scheduler pays attention to it and tries to schedule within one CU first. Which takes care of the latency, of course. Reported-by: Yves Dionne <yves.dionne@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> # 4.9 Cc: Brice Goglin <Brice.Goglin@inria.fr> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yazen Ghannam <yazen.ghannam@amd.com> Link: http://lkml.kernel.org/r/20170205105022.8705-1-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-02-05 12:18:45 +01:00
Piotr Luc	4d8bb00604	x86/cpufeature: Enable RING3MWAIT for Knights Mill Enable ring 3 MONITOR/MWAIT for Intel Xeon Phi codenamed Knights Mill. We can't guarantee that this (KNM) will be the last CPU model that needs this hack. But, we do recognize that this is far from optimal, and there is an effort to ensure we don't keep doing extending this hack forever. Signed-off-by: Piotr Luc <piotr.luc@intel.com> Cc: Piotr.Luc@intel.com Cc: dave.hansen@linux.intel.com Link: http://lkml.kernel.org/r/1484918557-15481-6-git-send-email-grzegorz.andrejczuk@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-02-05 00:19:52 +01:00
Linus Torvalds	a572a1b999	Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Thomas Gleixner: - Prevent double activation of interrupt lines, which causes problems on certain interrupt controllers - Handle the fallout of the above because x86 (ab)uses the activation function to reconfigure interrupts under the hood. * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/irq: Make irq activate operations symmetric irqdomain: Avoid activating interrupts more than once	2017-02-04 12:18:01 -08:00
Alexander Kuleshov	07d495dae2	x86/traps: Get rid of unnecessary preempt_disable/preempt_enable_no_resched Exception handlers which may run on IST stack call ist_enter() at the start of execution and ist_exit() in the end. ist_enter() disables preemption unconditionally and ist_exit() enables it. So the extra preempt_disable/enable() pairs nested inside the ist_enter/exit() regions are pointless and can be removed. Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Jianyu Zhan <nasa4836@gmail.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/20161128075057.7724-1-kuleshovmail@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-02-04 09:36:59 +01:00
Nikola Pajkovsky	68dee8e2f2	x86/pci-calgary: Fix iommu_free() comparison of unsigned expression >= 0 commit `8fd524b355` ("x86: Kill bad_dma_address variable") has killed bad_dma_address variable and used instead of macro DMA_ERROR_CODE which is always zero. Since dma_addr is unsigned, the statement dma_addr >= DMA_ERROR_CODE is always true, and not needed. arch/x86/kernel/pci-calgary_64.c: In function ‘iommu_free’: arch/x86/kernel/pci-calgary_64.c:299:2: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits] if (unlikely((dma_addr >= DMA_ERROR_CODE) && (dma_addr < badend))) { Fixes: `8fd524b355` ("x86: Kill bad_dma_address variable") Signed-off-by: Nikola Pajkovsky <npajkovsky@suse.cz> Cc: iommu@lists.linux-foundation.org Cc: Jon Mason <jdmason@kudzu.us> Cc: Muli Ben-Yehuda <mulix@mulix.org> Link: http://lkml.kernel.org/r/7612c0f9dd7c1290407dbf8e809def922006920b.1479161177.git.npajkovsky@suse.cz Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-02-04 09:27:06 +01:00
Grzegorz Andrejczuk	e16fd002af	x86/cpufeature: Enable RING3MWAIT for Knights Landing Enable ring 3 MONITOR/MWAIT for Intel Xeon Phi x200 codenamed Knights Landing. Presence of this feature cannot be detected automatically (by reading any other MSR) therefore it is required to explicitly check for the family and model of the CPU before attempting to enable it. Signed-off-by: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com> Cc: Piotr.Luc@intel.com Cc: dave.hansen@linux.intel.com Link: http://lkml.kernel.org/r/1484918557-15481-5-git-send-email-grzegorz.andrejczuk@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-02-04 08:51:09 +01:00
Grzegorz Andrejczuk	0274f9551e	x86/elf: Add HWCAP2 to expose ring 3 MONITOR/MWAIT Introduce ELF_HWCAP2 variable for x86 and reserve its bit 0 to expose the ring 3 MONITOR/MWAIT. HWCAP variables contain bitmasks which can be used by userspace applications to detect which instruction sets are supported by CPU. On x86 architecture information about CPU capabilities can be checked via CPUID instructions, unfortunately presence of ring 3 MONITOR/MWAIT feature cannot be checked this way. ELF_HWCAP cannot be used as well, because on x86 it is set to CPUID[1].EDX which means that all bits are reserved there. HWCAP2 approach was chosen because it reuses existing solution present in other architectures, so only minor modifications are required to the kernel and userspace applications. When ELF_HWCAP2 is defined kernel maps it to AT_HWCAP2 during the start of the application. This way the ring 3 MONITOR/MWAIT feature can be detected using getauxval() API in a simple and fast manner. ELF_HWCAP2 type is u32 to be consistent with x86 ELF_HWCAP type. Signed-off-by: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com> Cc: Piotr.Luc@intel.com Cc: dave.hansen@linux.intel.com Link: http://lkml.kernel.org/r/1484918557-15481-3-git-send-email-grzegorz.andrejczuk@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-02-04 08:51:09 +01:00
Borislav Petkov	e22af0be2c	x86/boot: Fix pr_debug() API braindamage What looked like a straightforward conversion from printk(KERN_DEBUG, ...) to pr_debug() broke the boot log output: DMI: /M57SLI-S4, BIOS FF 01/24/2008 -e820: update [mem 0x00000000-0x00000fff] usable ==> reserved -e820: remove [mem 0x000a0000-0x000fffff] usable +usable ==> reserved +usable e820: last_pfn = 0x230000 max_arch_pfn = 0x400000000 ... x86/PAT: Configuration [0-7]: WB WC UC- UC WB WC UC- WT -e820: update [mem 0xd0000000-0xffffffff] usable ==> reserved +usable ==> reserved i.e. spurious (and nonsensical) kernel log entries were created... We need a pr_debug_and_I_mean_it() function which does nothing but printk(KERN_DEBUG... Signed-off-by: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org [ Wrote changelog. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-02-01 11:07:23 +01:00
Daniel Vetter	e2b06d71bd	Merge remote-tracking branch 'airlied/drm-next' into drm-intel-next-queued Chris Wilson wants the new fence tracepoint added in commit `8c96c67801` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Jan 24 11:57:58 2017 +0000 dma/fence: Export enable-signaling tracepoint for emission by drivers Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>	2017-02-01 10:58:11 +01:00
travis@sgi.com	9ec808a022	x86/platform/UV: Ensure uv_system_init is called when necessary Move the check to whether this is a UV system that needs initialization from is_uv_system() to the internal uv_system_init() function. This is because on a UV system without a HUB the is_uv_system() returns false. But we still need some specific UV system initialization. See the uv_system_init() for change to a quick check if UV is applicable. This change should not increase overhead since is_uv_system() also called into this same area. Signed-off-by: Mike Travis <travis@sgi.com> Reviewed-by: Russ Anderson <rja@hpe.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Dimitri Sivanich <sivanich@hpe.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20170125163518.256403963@asylum.americas.sgi.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-02-01 10:21:00 +01:00
travis@sgi.com	abdf1df6bc	x86/platform/UV: Add Support for UV4 Hubless NMIs Merge new UV Hubless NMI support into existing UV NMI handler. Signed-off-by: Mike Travis <travis@sgi.com> Reviewed-by: Russ Anderson <rja@hpe.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Dimitri Sivanich <sivanich@hpe.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20170125163517.585269837@asylum.americas.sgi.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-02-01 10:20:59 +01:00
travis@sgi.com	74862b03b4	x86/platform/UV: Add Support for UV4 Hubless systems Add recognition and support for UV4 hubless systems. Signed-off-by: Mike Travis <travis@sgi.com> Reviewed-by: Russ Anderson <rja@hpe.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Dimitri Sivanich <sivanich@hpe.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20170125163517.398537358@asylum.americas.sgi.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-02-01 10:20:59 +01:00
Ingo Molnar	7243e10689	x86/platform/UV: Clean up the UV APIC code Make it more readable. Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Dimitri Sivanich <sivanich@hpe.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Travis <travis@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20170114082612.GA27842@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-02-01 10:20:59 +01:00
Ingo Molnar	1055e0ba56	Merge branch 'x86/urgent' into x86/platform, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-02-01 10:19:35 +01:00
Frederic Weisbecker	f7dcd63de4	x86: Convert obsolete cputime type to nsecs Use the new nsec based cputime accessors as part of the whole cputime conversion from cputime_t to nsecs. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/1485832191-26889-10-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-02-01 09:13:50 +01:00
Frederic Weisbecker	a1cecf2ba7	sched/cputime: Introduce special task_cputime_t() API to return old-typed cputime This API returns a task's cputime in cputime_t in order to ease the conversion of cputime internals to use nsecs units instead. Blindly converting all cputime readers to use this API now will later let us convert more smoothly and step by step all these places to use the new nsec based cputime. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/1485832191-26889-7-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-02-01 09:13:48 +01:00
Ingo Molnar	ed5c8c854f	Merge branch 'linus' into sched/core, to pick up fixes and refresh the branch Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-02-01 09:12:25 +01:00
Dave Young	7b0a911478	efi/x86: Move the EFI BGRT init code to early init code Before invoking the arch specific handler, efi_mem_reserve() reserves the given memory region through memblock. efi_bgrt_init() will call efi_mem_reserve() after mm_init(), at which time memblock is dead and should not be used anymore. The EFI BGRT code depends on ACPI initialization to get the BGRT ACPI table, so move parsing of the BGRT table to ACPI early boot code to ensure that efi_mem_reserve() in EFI BGRT code still use memblock safely. Tested-by: Bhupesh Sharma <bhsharma@redhat.com> Signed-off-by: Dave Young <dyoung@redhat.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-acpi@vger.kernel.org Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/1485868902-20401-9-git-send-email-ard.biesheuvel@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-02-01 08:45:46 +01:00
Thomas Gleixner	0becc0ae5b	x86/mce: Make timer handling more robust Erik reported that on a preproduction hardware a CMCI storm triggers the BUG_ON in add_timer_on(). The reason is that the per CPU MCE timer is started by the CMCI logic before the MCE CPU hotplug callback starts the timer with add_timer_on(). So the timer is already queued which triggers the BUG. Using add_timer_on() is pretty pointless in this code because the timer is strictlty per CPU, initialized as pinned and all operations which arm the timer happen on the CPU to which the timer belongs. Simplify the whole machinery by using mod_timer() instead of add_timer_on() which avoids the problem because mod_timer() can handle already queued timers. Use __start_timer() everywhere so the earliest armed expiry time is preserved. Reported-by: Erik Veijola <erik.veijola@intel.com> Tested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1701310936080.3457@nanos Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-01-31 21:47:58 +01:00
Thomas Gleixner	aaaec6fc75	x86/irq: Make irq activate operations symmetric The recent commit which prevents double activation of interrupts unearthed interesting code in x86. The code (ab)uses irq_domain_activate_irq() to reconfigure an already activated interrupt. That trips over the prevention code now. Fix it by deactivating the interrupt before activating the new configuration. Fixes: `08d85f3ea9` "irqdomain: Avoid activating interrupts more than once" Reported-and-tested-by: Mike Galbraith <efault@gmx.de> Reported-and-tested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1701311901580.3457@nanos	2017-01-31 20:22:18 +01:00
Ingo Molnar	f26483eaed	Merge branch 'x86/urgent' into x86/microcode, to resolve conflicts Conflicts: arch/x86/kernel/cpu/microcode/amd.c arch/x86/kernel/cpu/microcode/core.c Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-31 08:38:17 +01:00
Kees Cook	3ad38ceb27	x86/mm: Remove CONFIG_DEBUG_NX_TEST CONFIG_DEBUG_NX_TEST has been broken since CONFIG_DEBUG_SET_MODULE_RONX=y was added in v2.6.37 via: `84e1c6bb38` ("x86: Add RO/NX protection for loadable kernel modules") since the exception table was then made read-only. Additionally, the manually constructed extables were never fixed when relative extables were introduced in v3.5 via: `706276543b` ("x86, extable: Switch to relative exception table entries") However, relative extables won't work for test_nx.c, since test instruction memory areas may be more than INT_MAX away from an executable fixup (e.g. stack and heap too far away from executable memory with the fixup). Since clearly no one has been using this code for a while now, and similar tests exist in LKDTM, this should just be removed entirely. Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jinbum Park <jinb.park7@gmail.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170131003711.GA74048@beast Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-31 08:31:58 +01:00
Ingo Molnar	441ac2f33d	x86/boot/e820: Simplify e820__update_table() - Remove the now unnecessary __e820__update_table() wrappery - Move statics out from function scope, to make the logic clearer - Rename local variables to be more in line with the rest of 820.c - Remove unnecessary local variables: old_nr, *nr_entries No change in functionality. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-30 09:49:28 +01:00
Borislav Petkov	24c2503255	x86/microcode: Do not access the initrd after it has been freed When we look for microcode blobs, we first try builtin and if that doesn't succeed, we fallback to the initrd supplied to the kernel. However, at some point doing boot, that initrd gets jettisoned and we shouldn't access it anymore. But we do, as the below KASAN report shows. That's because find_microcode_in_initrd() doesn't check whether the initrd is still valid or not. So do that. ================================================================== BUG: KASAN: use-after-free in find_cpio_data Read of size 1 by task swapper/1/0 page:ffffea0000db9d40 count:0 mapcount:0 mapping: (null) index:0x1 flags: 0x100000000000000() raw: 0100000000000000 0000000000000000 0000000000000001 00000000ffffffff raw: dead000000000100 dead000000000200 0000000000000000 0000000000000000 page dumped because: kasan: bad access detected CPU: 1 PID: 0 Comm: swapper/1 Tainted: G W 4.10.0-rc5-debug-00075-g2dbde22 #3 Hardware name: Dell Inc. XPS 13 9360/0839Y6, BIOS 1.2.3 12/01/2016 Call Trace: dump_stack ? _atomic_dec_and_lock ? __dump_page kasan_report_error ? pointer ? find_cpio_data __asan_report_load1_noabort ? find_cpio_data find_cpio_data ? vsprintf ? dump_stack ? get_ucode_user ? print_usage_bug find_microcode_in_initrd __load_ucode_intel ? collect_cpu_info_early ? debug_check_no_locks_freed load_ucode_intel_ap ? collect_cpu_info ? trace_hardirqs_on ? flat_send_IPI_mask_allbutself load_ucode_ap ? get_builtin_firmware ? flush_tlb_func ? do_raw_spin_trylock ? cpumask_weight cpu_init ? trace_hardirqs_off ? play_dead_common ? native_play_dead ? hlt_play_dead ? syscall_init ? arch_cpu_idle_dead ? do_idle start_secondary start_cpu Memory state around the buggy address: ffff880036e74f00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff880036e74f80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >ffff880036e75000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ^ ffff880036e75080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff880036e75100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ================================================================== Reported-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Tested-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170126165833.evjemhbqzaepirxo@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-30 09:32:42 +01:00
Ingo Molnar	7410aa1ca3	x86/boot/e820: Separate the E820 ABI structures from the in-kernel structures Linus pointed out that relying on the compiler to pack structures with enums is fragile not just for the kernel, but for external tooling as well which might rely on our UAPI headers. So separate the two from each other: introduce 'struct boot_e820_entry', which is the boot protocol entry format. This actually simplifies the code, as e820__update_table() is now never called directly with boot protocol table entries - we can rely on append_e820_table() and do a e820__update_table() call afterwards. ( This will allow further simplifications of __e820__update_table(), but that will be done in a separate patch. ) This change also has the side effect of not modifying the bootparams structure anymore - which might be useful for debugging. In theory we could even constify the boot_params structure - at least from the E820 code's point of view. Remove the uapi/asm/e820/types.h file, as it's not used anymore - all kernel side E820 types are defined in asm/e820/types.h. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-29 13:39:32 +01:00
Ingo Molnar	c5231a57eb	x86/boot/e820: Fix and clean up e820_type switch() statements A test-build of e820.o with -Wswitch-enum shows the following warnings: arch/x86/kernel/e820.c: In function ‘e820_type_to_string’: arch/x86/kernel/e820.c:965:2: warning: enumeration value ‘E820_TYPE_RESERVED’ not handled in switch [-Wswitch-enum] switch (entry->type) { ^ arch/x86/kernel/e820.c: In function ‘e820_type_to_iomem_type’: arch/x86/kernel/e820.c:979:2: warning: enumeration value ‘E820_TYPE_RESERVED’ not handled in switch [-Wswitch-enum] switch (entry->type) { ^ arch/x86/kernel/e820.c: In function ‘e820_type_to_iores_desc’: arch/x86/kernel/e820.c:993:2: warning: enumeration value ‘E820_TYPE_RESERVED’ not handled in switch [-Wswitch-enum] switch (entry->type) { ^ arch/x86/kernel/e820.c: In function ‘do_mark_busy’: arch/x86/kernel/e820.c:1015:2: warning: enumeration value ‘E820_TYPE_RAM’ not handled in switch [-Wswitch-enum] switch (type) { ^ Here's the four warnings: - The one in e820_type_to_string() is a borderline bug, we should differentiate known-reserved E820 types from unknown types. Fix it by printing a separate message for unknown E820 types. - The ones in e820_type_to_iomem_type(), e820_type_to_iores_desc() and do_mark_busy() are worth documenting, at least to the extent of enumerating them explicitly. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-29 13:39:31 +01:00
Ingo Molnar	0c6fc11ac3	x86/boot/e820: Rename the remaining E820 APIs to the e820__*() prefix Three more renames left: e820_end_of_ram_pfn() => e820__end_of_ram_pfn() e820_end_of_low_ram_pfn() => e820__end_of_low_ram_pfn() e820_reallocate_tables() => e820__reallocate_tables() After this all E820 API calls are prefixed with "e820__", making it much easier to grep for E820 functionality in the kernel. No change in functionality. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-28 22:55:26 +01:00
Ingo Molnar	dd618c7256	x86/boot/e820: Remove unnecessary #include's A number of headers were included into e820.c unnecessarily - remove them. No change in functionality. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-28 22:55:26 +01:00
Ingo Molnar	090d717164	x86/boot/e820: Rename e820_mark_nosave_regions() to e820__register_nosave_regions() This function is a minor misnomer: it is talking about 'marking' regions as nosave - while the hibernation API is called register_nosave_region() and the e820_mark_nosave_regions() is a wrapper around that functionality. So name it to be in line with the API it is derived from. ( Rename e820_mark_nvs_memory() to e820__register_nvs_regions(), for similar reasons. ) No change in functionality. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-28 22:55:26 +01:00
Ingo Molnar	1506c8dc94	x86/boot/e820: Rename e820_reserve_resources() to e820__reserve_resources() Also do some minor cleanups. No change in functionality. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-28 22:55:25 +01:00
Ingo Molnar	81b3e090fa	x86/boot/e820: Use bool in query APIs Change e820__mapped_any() and e820__mapped_all()'s return type and e820__range_remove()'s check_type parameter to bool. Propagate it into arch/x86/pci/mmconfig-shared.c as this change affects a function signature there too. No change in functionality. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-28 22:55:25 +01:00
Ingo Molnar	1a1270349a	x86/boot/e820: Document e820__reserve_setup_data() Also clean it up a bit. No change in functionality. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-28 22:55:25 +01:00
Ingo Molnar	9a02fd0f1e	x86/boot/e820: Clean up __e820__update_table() et al The __e820__update_table() function has various weirdly named variables, such as 'pbios', 'biosmap' and 'pnr_map' which are pretty confusing and actively misleading at times. This weird naming found its way into other functions as well, such as __append_e820_table() and append_e820_table(). Standardize the naming to make it all much easier to read: biosmap -> entries pbios -> entry nr_map -> nr_entries pnr_map -> nr_entries ... Also clean up the types used: entry indices routinely mixed u32 and int, standardize on u32 thoughout. Update the comments as well, while at it. No change in functionality. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-28 22:55:24 +01:00
Ingo Molnar	f9748fa045	x86/boot/e820: Simplify the e820__update_table() interface The e820__update_table() parameters are pretty complex: arch/x86/include/asm/e820/api.h:extern int e820__update_table(struct e820_entry biosmap, int max_nr_map, u32 pnr_map); But 90% of the usage is trivial: arch/x86/kernel/e820.c: if (e820__update_table(e820_table->entries, ARRAY_SIZE(e820_table->entries), &e820_table->nr_entries)) arch/x86/kernel/e820.c: e820__update_table(e820_table->entries, ARRAY_SIZE(e820_table->entries), &e820_table->nr_entries); arch/x86/kernel/e820.c: e820__update_table(e820_table->entries, ARRAY_SIZE(e820_table->entries), &e820_table->nr_entries); arch/x86/kernel/e820.c: if (e820__update_table(e820_table->entries, ARRAY_SIZE(e820_table->entries), &e820_table->nr_entries) < 0) arch/x86/kernel/e820.c: e820__update_table(boot_params.e820_table, ARRAY_SIZE(boot_params.e820_table), &new_nr); arch/x86/kernel/early-quirks.c: e820__update_table(e820_table->entries, ARRAY_SIZE(e820_table->entries), &e820_table->nr_entries); arch/x86/kernel/setup.c: e820__update_table(e820_table->entries, ARRAY_SIZE(e820_table->entries), &e820_table->nr_entries); arch/x86/kernel/setup.c: e820__update_table(e820_table->entries, ARRAY_SIZE(e820_table->entries), &e820_table->nr_entries); arch/x86/platform/efi/efi.c: e820__update_table(e820_table->entries, ARRAY_SIZE(e820_table->entries), &e820_table->nr_entries); arch/x86/xen/setup.c: e820__update_table(xen_e820_table.entries, ARRAY_SIZE(xen_e820_table.entries), arch/x86/xen/setup.c: e820__update_table(e820_table->entries, ARRAY_SIZE(e820_table->entries), &e820_table->nr_entries); arch/x86/xen/setup.c: e820__update_table(xen_e820_table.entries, ARRAY_SIZE(xen_e820_table.entries), as it only uses an exiting struct e820_table's entries array, its size and its current number of entries as input and output arguments. Only one use is non-trivial: arch/x86/kernel/e820.c: e820__update_table(boot_params.e820_table, ARRAY_SIZE(boot_params.e820_table), &new_nr); ... which call updates the E820 table in the zeropage in-situ, and the layout there does not match that of 'struct e820_table' (in particular nr_entries is at a different offset, hardcoded by the boot protocol). Simplify all this by introducing a low level __e820__update_table() API that the zeropage update call can use, and simplifying the main e820__update_table() call signature down to: int e820__update_table(struct e820_table table); This visibly simplifies all the call sites: arch/x86/include/asm/e820/api.h:extern int e820__update_table(struct e820_table table); arch/x86/include/asm/e820/types.h: * call to e820__update_table() to remove duplicates. The allowance arch/x86/kernel/e820.c: * The return value from e820__update_table() is zero if it arch/x86/kernel/e820.c:int __init e820__update_table(struct e820_table *table) arch/x86/kernel/e820.c: if (e820__update_table(e820_table)) arch/x86/kernel/e820.c: e820__update_table(e820_table_firmware); arch/x86/kernel/e820.c: e820__update_table(e820_table); arch/x86/kernel/e820.c: e820__update_table(e820_table); arch/x86/kernel/e820.c: if (e820__update_table(e820_table) < 0) arch/x86/kernel/early-quirks.c: e820__update_table(e820_table); arch/x86/kernel/setup.c: e820__update_table(e820_table); arch/x86/kernel/setup.c: e820__update_table(e820_table); arch/x86/platform/efi/efi.c: e820__update_table(e820_table); arch/x86/xen/setup.c: e820__update_table(&xen_e820_table); arch/x86/xen/setup.c: e820__update_table(e820_table); arch/x86/xen/setup.c: e820__update_table(&xen_e820_table); No change in functionality. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-28 22:55:24 +01:00
Ingo Molnar	d88961b5d4	x86/boot/e820: Clean up and standardize sizeof() uses There's various sizeof() uses in e820.c - standardize on the shortest and least error prone one, along the pattern of: - memset(entry, 0, sizeof(struct e820_entry)); + memset(entry, 0, sizeof(*entry)); ... because with this pattern in most cases it's immediately clear that we have used the right type - and the pattern is robust against changing the type as well. No change in functionality. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-28 22:55:23 +01:00
Ingo Molnar	08b46d5dd8	x86/boot/e820: Clean up the E820 table size define names We've got a number of defines related to the E820 table and its size: E820MAP E820NR E820_X_MAX E820MAX The first two denote byte offsets into the zeropage (struct boot_params), and can are not used in the kernel and can be removed. The E820_*_MAX values have an inconsistent structure and it's unclear in any case what they mean. 'X' presuably goes for extended - but it's not very expressive altogether. Change these over to: E820_MAX_ENTRIES_ZEROPAGE E820_MAX_ENTRIES ... which are self-explanatory names. No change in functionality. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-28 22:55:23 +01:00
Ingo Molnar	09821ff1d5	x86/boot/e820: Prefix the E820_* type names with "E820_TYPE_" So there's a number of constants that start with "E820" but which are not types - these create a confusing mixture when seen together with 'enum e820_type' values: E820MAP E820NR E820_X_MAX E820MAX To better differentiate the 'enum e820_type' values prefix them with E820_TYPE_. No change in functionality. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-28 22:55:22 +01:00
Ingo Molnar	6afc03b864	x86/boot/e820: Use 'enum e820_type' when handling the e820 region type The E820 region type is put into four different types (!) when used in function parameters or local variables: unsigned type; int type; unsigned long current_type; u32 type; Use 'enum e820_type' in all these cases instead. No change in functionality. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-28 17:02:57 +01:00
Ingo Molnar	09c5151339	x86/boot/e820: Use 'enum e820_type' in 'struct e820_entry' Use a stricter type for struct e820_entry. Add a build-time check to make sure the compiler won't ever pack the enum into a field smaller than 'int'. No change in functionality. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-28 17:02:56 +01:00
Ingo Molnar	c594761d1d	x86/boot/e820: Simplify e820_reserve_resources() Remove unnecessary duplications of "e820_table->entries[i]." via a local variable, plus pass in 'entry' to the type_to_*() functions which further improves the readability of the code - and other small tweaks. No change in functionality. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-28 14:42:33 +01:00
Ingo Molnar	be0c3f0fca	x86/boot/e820: Rename e820_print_map() to e820__print_table() All other table-level methods are already named 'table' in some way, to change this one over to the (now consistent) nomenclature. No change in functionality. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-28 14:42:32 +01:00
Ingo Molnar	ab6bc04cfd	x86/boot/e820: Create coherent API function names for E820 range operations We have these three related functions: extern void e820_add_region(u64 start, u64 size, int type); extern u64 e820_update_range(u64 start, u64 size, unsigned old_type, unsigned new_type); extern u64 e820_remove_range(u64 start, u64 size, unsigned old_type, int checktype); But it's not clear from the naming that they are 3 operations based around the same 'memory range' concept. Rename them to better signal this, and move the prototypes next to each other: extern void e820__range_add (u64 start, u64 size, int type); extern u64 e820__range_update(u64 start, u64 size, unsigned old_type, unsigned new_type); extern u64 e820__range_remove(u64 start, u64 size, unsigned old_type, int checktype); Note that this improved organization of the functions shows another problem that was easy to miss before: sometimes the E820 entry type is 'int', sometimes 'unsigned int' - but this will be fixed in a separate patch. No change in functionality. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-28 14:42:32 +01:00
Ingo Molnar	2df908baf5	x86/boot/e820: Rename e820_setup_gap() to e820__setup_pci_gap() The e820_setup_gap() function name is unnecessarily silent about what kind of gap it sets up. Make it clear that it's about the PCI gap. No change in functionality. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-28 14:42:31 +01:00
Ingo Molnar	3bce64f019	x86/boot/e820: Rename e820_any_mapped()/e820_all_mapped() to e820__mapped_any()/e820__mapped_all() The 'any' and 'all' are modified to the 'mapped' concept, so move them last in the name. No change in functionality. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-28 14:42:31 +01:00
Ingo Molnar	f52355a99f	x86/boot/e820: Rename sanitize_e820_table() to e820__update_table() sanitize_e820_table() is a minor misnomer in that it suggests that the E820 table requires sanitizing - which implies that it will only do anything if the E820 table is irregular (not sane). That is wrong, because sanitize_e820_table() also does a very regular sorting of the E820 table, which is a necessity in the basic append-only flow of E820 updates the kernel is allowed to perform to it. So rename it to e820__update_table() to include that purpose as well. This also lines up all the table-update functions into a coherent naming family: int e820__update_table(struct e820_entry biosmap, int max_nr_map, u32 pnr_map); void e820__update_table_print(void); void e820__update_table_firmware(void); No change in functionality. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-28 14:42:31 +01:00
Ingo Molnar	6464d294d2	x86/boot/e820: Rename update_e820() to e820__update_table() update_e820() should have 'e820' as a prefix as most of the other E820 functions have - but it's also a bit unclear about its purpose, as it's unclear what is updated - the whole table, or an entry? Also, the name does not express that it's a trivial wrapper around sanitize_e820_table() that also prints out the resulting table. So rename it to e820__update_table_print(). This also makes it harmonize with the e820__update_table_firmware() function which has a very similar purpose. No change in functionality. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-28 14:42:30 +01:00
Ingo Molnar	5da217ca96	x86/boot/e820: Rename early_reserve_e820() to e820__memblock_alloc() and document it early_reserve_e820() is an early hack for kexec that does a limited fixup of the mptable and passes it to the kexec kernel as if it was the real thing. For this it needs to allocate memory - but no memory allocator is available yet beyond the memblock allocator, so early_reserve_e820() is really a wrapper around memblock_alloc() plus a hack to update the e820_table_firmware entries. The name 'reserve' is really a bit of a misnomer, as 'reserved' memory typically means memory completely inaccessible to the kernel - while here what we want to do is a special RAM allocation for our own purposes and insert that as RAM_RESERVED. Rename the function to e820__memblock_alloc_reserved() to better signal this dual purpose, plus document it better, which was omitted when it was merged. The barely comprehensible and cryptic comment: /* * pre allocated 4k and reserved it in memblock and e820_table_firmware / u64 __init e820__memblock_alloc_reserved(u64 size, u64 align) ... does not count as documentation, replace it with: / * Allocate the requested number of bytes with the requsted alignment * and return (the physical address) to the caller. Also register this * range in the 'firmware' E820 table. * * This allows kexec to fake a new mptable, as if it came from the real * system. */ u64 __init e820__memblock_alloc_reserved(u64 size, u64 align) No change in functionality. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-28 14:42:30 +01:00
Ingo Molnar	9641bdafd8	x86/boot/e820: Clarify the role of finish_e820_parsing() and rename it to e820__finish_early_params() finish_e820_parsing() is closely related to parse_early_params(), but the name does not tell us this clearly, so rename it to e820__finish_early_params(). Also add a few comments to explain what the function does. No change in functionality. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-28 14:42:29 +01:00
Ingo Molnar	da92139bff	x86/boot/e820: Move e820_reserve_setup_data() to e820.c The e820_reserve_setup_data() is local to arch/x86/kernel/setup.c, but it is E820 functionality - so move it to e820.c to better isolate E820 functionality. No change in functionality. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-28 14:42:29 +01:00
Ingo Molnar	914053c08e	x86/boot/e820: Rename parse_e820_ext() to e820__memory_setup_extended() parse_e820_ext() is very similar to e820__memory_setup_default(), both are taking bootloader provided data, add it to the E820 table and then pass it sanitize_e820_table(). Rename it to e820__memory_setup_extended() to better signal their similar role. No change in functionality. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-28 14:42:29 +01:00
Ingo Molnar	4270fd8b4c	x86/boot/e820: Move the memblock_find_dma_reserve() function and rename it to memblock_set_dma_reserve() We introduced memblock_find_dma_reserve() in this commit: `6f2a75369e` x86, memblock: Use memblock_memory_size()/memblock_free_memory_size() to get correct dma_reserve But there's several problems with it: - The changelog is full of typos and is incomprehensible in general, and the comments in the code are not much better either. - The function was inexplicably placed into e820.c, while it has very little connection to the E820 table: when we call memblock_find_dma_reserve() then memblock is already set up and we are not using the E820 table anymore. - The function is a wrapper around set_dma_reserve(), but changed the 'set' name to 'find' - actively misleading about its primary purpose, which is still to set the DMA-reserve value. - The function is limited to 64-bit systems, but neither the changelog nor the comments explain why. The change would appear to be relevant to 32-bit systems as well, as the ISA DMA zone is the first 16 MB of RAM. So address some of these problems: - Move it into arch/x86/mm/init.c, next to the other zone setup related functions. - Clean up the code flow and names of local variables a bit. - Rename it to memblock_set_dma_reserve() - Improve the comments. No change in functionality. Enabling it for 32-bit systems is left for a separate patch. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-28 14:42:28 +01:00
Ingo Molnar	01259ef1e0	x86/boot/e820: Convert printk(KERN_* ...) to pr_*() No change in functionality. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-28 14:42:28 +01:00
Ingo Molnar	e5540f8754	x86/boot/e820: Consolidate 'struct e820_entry entry' local variable names So the E820 code has a lot of cases of: struct e820_entry ei; ... but the 'ei' name makes very little sense if you think about it, it's not an abbreviation of anything obviously related to E820 table entries. This results in weird looking lines such as: if (type && ei->type != type) where you might have to double check what 'ei' really means, plus weird looking secondary variable names, such as: u64 ei_end; The 'ei' name was introduced in a single function over a decade ago, and then mindlessly cargo-copied over into other functions - with usage growing to over 60 uses altogether (!). ( My best guess is that it might have been originally meant as abbreviation of 'entry interval'. ) Anyway, rename these to the much more obvious: struct e820_entry *entry; No change in functionality. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-28 14:42:28 +01:00
Ingo Molnar	4918e2286d	x86/boot/e820: Rename memblock_x86_fill() to e820__memblock_setup() and improve the explanations So memblock_x86_fill() is another E820 code misnomer: - nothing in its name tells us that it's part of the E820 subsystem ... - The 'fill' wording is ambiguous and doesn't tell us whether it's a single entry or some process - while the _real_ purpose of the function is hidden, which is to do a complete setup of the (platform independent) memblock regions. So rename it accordingly, to e820__memblock_setup(). Also translate this incomprehensible and misleading comment: /* * EFI may have more than 128 entries * We are safe to enable resizing, beause memblock_x86_fill() * is rather later for x86 / memblock_allow_resize(); The worst aspect of this comment isn't even the sloppy typos, but that it casually mentions a '128' number with no explanation, which makes one lead to the assumption that this is related to the well-known limit of a maximum of 128 E820 entries passed via legacy bootloaders. But no, the _real_ meaning of 128 here is that of the memblock subsystem, which too happens to have a 128 entries limit for very early memblock regions (which is unrelated to E820), via INIT_MEMBLOCK_REGIONS ... So change the comment to a more comprehensible version: / * The bootstrap memblock region count maximum is 128 entries * (INIT_MEMBLOCK_REGIONS), but EFI might pass us more E820 entries * than that - so allow memblock resizing. * * This is safe, because this call happens pretty late during x86 setup, * so we know about reserved memory regions already. (This is important * so that memblock resizing does no stomp over reserved areas.) */ memblock_allow_resize(); No change in functionality. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-28 14:42:27 +01:00
Ingo Molnar	640e1b38b0	x86/boot/e820: Basic cleanup of e820.c Over the last decade or so e820.c has become an ureadable mess of tinkerware. Perform some very basic cleanups before doing more intricate cleanups, so that my eyes don't start bleeding when I look at it. Here's some of the excesses: - Total disregard of countless aspects of Documentation/CodingStyle. - Totally inconsistent hodge-podge of various coding styles and practices. - Gems like: (unsigned long long) e820_table->entries[i].addr ... which is a completely unnecessary type conversion of an u64 value. - Incomprehensible comments while there are major functions with absolutely no explanation - plus an armada of typos and grammar mistakes. - Mindless checkpatch artifacts such as: if (append_e820_table(boot_params.e820_table, boot_params.e820_entries) < 0) { for_each_free_mem_range(u, NUMA_NO_NODE, MEMBLOCK_NONE, &start, &end, NULL) { - Actively misleading comments: /* In case someone cares... */ return who; ( The usage site of the return value just a few lines further down makes it clear that we very much care about the return value, we use it to print out the e820 map... ) - Colorfully inconsistent capitalization and punctuation throughout. - etc. This patch fixes only the worst excesses - there's more to fix. No change in functionality. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-28 14:42:27 +01:00
Ingo Molnar	544a0f47e7	x86/boot/e820: Rename e820_table_saved to e820_table_firmware and improve the description So the 'e820_table_saved' is a bit of a misnomer that hides its real purpose. At first sight the name suggests that it's some sort save/restore mechanism, as this is how we typically name such facilities in the kernel. But that is not so, e820_table_saved is the original firmware version of the e820 table, not modified by the kernel. This table is displayed in the /sys/firmware/memmap file, and it's also used by the hibernation code to calculate a physical memory layout MD5 fingerprint checksum which is invariant of the kernel. So rename it to 'e820_table_firmware' and update all the comments to better describe the main e820 data strutures. Also rename: 'initial_e820_table_saved' => 'e820_table_firmware_init' 'e820_update_range_saved' => 'e820_update_range_firmware' ... to better match the new nomenclature. No change in functionality. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-28 14:42:27 +01:00
Ingo Molnar	103e206309	x86/boot/e820: Rename default_machine_specific_memory_setup() to e820__memory_setup_default() The default_machine_specific_memory_setup() is a mouthful and despite the many words it doesn't actually tell us clearly what it does. The function is the x86 legacy memory layout setup code, based on E820-formatted memory layout information passed by the bootloader via the boot_params. Rename it to e820__memory_setup_default() to better signal its purpose. Also rename the related higher level function to be consistent with this new naming: setup_memory_map() => e820__memory_setup() No change in functionality. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-28 14:42:26 +01:00
Ingo Molnar	bf495573fa	x86/boot/e820: Harmonize the 'struct e820_table' fields So the e820_table->map and e820_table->nr_map names are a bit confusing, because it's not clear what a 'map' really means (it could be a bitmap, or some other data structure), nor is it clear what nr_map means (is it a current index, or some other count). Rename the fields from: e820_table->map => e820_table->entries e820_table->nr_map => e820_table->nr_entries which makes it abundantly clear that these are entries of the table, and that the size of the table is ->nr_entries. Propagate the changes to all affected files. Where necessary, adjust local variable names to better reflect the new field names. No change in functionality. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-28 09:33:16 +01:00
Ingo Molnar	61a5010163	x86/boot/e820: Rename everything to e820_table No change in functionality. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-28 09:33:16 +01:00
Ingo Molnar	acd4c04872	x86/boot/e820: Rename 'e820_map' variables to 'e820_array' In line with the rename to 'struct e820_array', harmonize the naming of common e820 table variable names as well: e820 => e820_array e820_saved => e820_array_saved e820_map => e820_array initial_e820 => e820_array_init This makes the variable names more consistent and easier to grep for. No change in functionality. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-28 09:33:15 +01:00
Ingo Molnar	e79d74d085	x86/boot/e820: Remove e820_mark_nosave_regions() definition uglies The e820_mark_nosave_regions definition has a number of ugly #ifdef conditions that unnecessarily uglify both the header and the e820.c file. Make this function unconditional: most distro kernels have hibernation enabled. If LTO functionality is added in the future it will be able to eliminate unused functions without uglifying the source code. No change in functionality. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-28 09:33:15 +01:00
Ingo Molnar	8ec67d97bf	x86/boot/e820: Rename the basic e820 data types to 'struct e820_entry' and 'struct e820_array' The 'e820entry' and 'e820map' names have various annoyances: - the missing underscore departs from the usual kernel style and makes the code look weird, - in the past I kept confusing the 'map' with the 'entry', because a 'map' is ambiguous in that regard, - it's not really clear from the 'e820map' that this is a regular C array. Rename them to 'struct e820_entry' and 'struct e820_array' accordingly. ( Leave the legacy UAPI header alone but do the rename in the bootparam.h and e820/types.h file - outside tools relying on these defines should either adjust their code, or should use the legacy header, or should create their private copies for the definitions. ) No change in functionality. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-28 09:33:14 +01:00
Ingo Molnar	5520b7e7d2	x86/boot/e820: Remove spurious asm/e820/api.h inclusions A commonly used lowlevel x86 header, asm/pgtable.h, includes asm/e820/api.h spuriously, without making direct use of it. Removing it is not simple: over the years various .c code learned to rely on this indirect inclusion. Remove the unnecessary include - this should speed up the kernel build a bit, as a large header is not included anymore in totally unrelated code. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-28 09:31:14 +01:00
Ingo Molnar	66441bd3cf	x86/boot/e820: Move asm/e820.h to asm/e820/api.h In line with asm/e820/types.h, move the e820 API declarations to asm/e820/api.h and update all usage sites. This is just a mechanical, obviously correct move & replace patch, there will be subsequent changes to clean up the code and to make better use of the new header organization. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-28 09:31:13 +01:00
Ingo Molnar	9a1f4150fe	Merge branch 'linus' into x86/boot, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-28 09:30:11 +01:00
Nick Desaulniers	2dc8ffad8c	ACPI / idle: small formatting fixes A quick cleanup with scripts/checkpatch.pl -f <file>. Signed-off-by: Nick Desaulniers <nick.desaulniers@gmail.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2017-01-27 11:21:58 +01:00
Paulo Zanoni	bc384c77e3	x86/gpu: GLK uses the same GMS values as SKL So don't forget to reserve its stolen memory bits. Cc: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com> Cc: x86@kernel.org Reviewed-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1485283642-14401-1-git-send-email-paulo.r.zanoni@intel.com	2017-01-27 10:45:00 +02:00
Andy Lutomirski	9729017f84	x86/fpu: Fix the "Giving up, no FPU found" test We would never print "Giving up, no FPU found" because X86_FEATURE_FPU was in REQUIRED_MASK on non-FPU-emulating builds, so the boot_cpu_has() test didn't do anything. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Whitehead <tedheadster@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/1499077fa76f0f84b8ea28e37d3fa70beca4e310.1484705016.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-25 10:12:44 +01:00
Andy Lutomirski	37ac78b67b	x86/fpu: Fix CPUID-less FPU detection The old code didn't work at all because it adjusted the current caps instead of the forced caps. Anything it did would be undone later during CPU identification. Fix that and, while we're at it, improve the logging and don't bother running it if CPUID is available. Reported-by: Matthew Whitehead <tedheadster@gmail.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/f1134e30cafa73c4e2e68119e9741793622cfd15.1484705016.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-25 10:12:43 +01:00
Andy Lutomirski	9170fb4094	x86/fpu: Fix "x86/fpu: Legacy x87 FPU detected" message That message isn't at all clear -- what does "Legacy x87" even mean? Clarify it. If there's no FPU, say: x86/fpu: No FPU detected If there's an FPU that doesn't have XSAVE, say: x86/fpu: x87 FPU will use FSAVE\|FXSAVE Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Whitehead <tedheadster@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/bb839385e18e27bca23fe8666dfdad8170473045.1484705016.git.luto@kernel.org [ Small tweaks to the messages. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-25 10:12:42 +01:00
Andy Lutomirski	60d3450167	x86/cpu: Re-apply forced caps every time CPU caps are re-read Calling get_cpu_cap() will reset a bunch of CPU features. This will cause the system to lose track of force-set and force-cleared features in the words that are reset until the end of CPU initialization. This can cause X86_FEATURE_FPU, for example, to change back and forth during boot and potentially confuse CPU setup. To minimize the chance of confusion, re-apply forced caps every time get_cpu_cap() is called. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Whitehead <tedheadster@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/c817eb373d2c67c2c81413a70fc9b845fa34a37e.1484705016.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-25 10:12:41 +01:00
Andy Lutomirski	8bf1ebca21	x86/cpu: Factor out application of forced CPU caps There are multiple call sites that apply forced CPU caps. Factor them into a helper. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Whitehead <tedheadster@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/623ff7555488122143e4417de09b18be2085ad06.1484705016.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-25 10:12:40 +01:00
Borislav Petkov	78d1b29684	x86/cpu: Add X86_FEATURE_CPUID Add a synthetic CPUID flag denoting whether the CPU sports the CPUID instruction or not. This will come useful later when accomodating CPUID-less CPUs. Signed-off-by: Borislav Petkov <bp@suse.de> [ Slightly prettified. ] Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Whitehead <tedheadster@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/dcb355adae3ab812c79397056a61c212f1a0c7cc.1484705016.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-25 10:12:39 +01:00
Yu-cheng Yu	a5828ed3d0	x86/fpu/xstate: Move XSAVES state init to a function Make XSTATE init similar to existing code; move it to a separate function. There is no functionality change. Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1485282346-15437-1-git-send-email-yu-cheng.yu@intel.com [ Minor cleanliness edits. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-25 08:25:12 +01:00
Bart Van Assche	5657933dbb	treewide: Move dma_ops from struct dev_archdata into struct device Some but not all architectures provide set_dma_ops(). Move dma_ops from struct dev_archdata into struct device such that it becomes possible on all architectures to configure dma_ops per device. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Juergen Gross <jgross@suse.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-arch@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: Russell King <linux@armlinux.org.uk> Cc: x86@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-01-24 12:23:35 -05:00
Bart Van Assche	5299709d0a	treewide: Constify most dma_map_ops structures Most dma_map_ops structures are never modified. Constify these structures such that these can be write-protected. This patch has been generated as follows: git grep -l 'struct dma_map_ops' \| xargs -d\\n sed -i \ -e 's/struct dma_map_ops/const struct dma_map_ops/g' \ -e 's/const struct dma_map_ops {/struct dma_map_ops {/g' \ -e 's/^const struct dma_map_ops;$/struct dma_map_ops;/' \ -e 's/const const struct dma_map_ops /const struct dma_map_ops /g'; sed -i -e 's/const $struct dma_map_ops intel_dma_ops$/\1/' \ $(git grep -l 'struct dma_map_ops intel_dma_ops'); sed -i -e 's/const $struct dma_map_ops dma_iommu_ops$/\1/' \ $(git grep -l 'struct dma_map_ops' \| grep ^arch/powerpc); sed -i -e '/^struct vmd_dev {$/,/^};$/ s/const $struct dma_map_ops[[:blank:]]dma_ops;$/\1/' \ -e '/^static void vmd_setup_dma_ops/,/^}$/ s/const $struct dma_map_ops \dest$/\1/' \ -e 's/const $struct dma_map_ops \dest = \&vmd->dma_ops$/\1/' \ drivers/pci/host/.c sed -i -e '/^void __init pci_iommu_alloc(void)$/,/^}$/ s/dma_ops->/intel_dma_ops./' arch/ia64/kernel/pci-dma.c sed -i -e 's/static const struct dma_map_ops sn_dma_ops/static struct dma_map_ops sn_dma_ops/' arch/ia64/sn/pci/pci_dma.c sed -i -e 's/(const struct dma_map_ops \)//' drivers/misc/mic/bus/vop_bus.c Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Juergen Gross <jgross@suse.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-arch@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: Russell King <linux@armlinux.org.uk> Cc: x86@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-01-24 12:23:35 -05:00
Borislav Petkov	9026cc82b6	x86/ras, EDAC, acpi: Assign MCE notifier handlers a priority Assign all notifiers on the MCE decode chain a priority so that they get called in the correct order. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170123183514.13356-10-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-24 09:14:57 +01:00
Borislav Petkov	cff4c0391a	x86/ras: Get rid of mce_process_work() Make mce_gen_pool_process() the workqueue function directly and save us an indirection. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170123183514.13356-9-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-24 09:14:56 +01:00
Borislav Petkov	669c00f099	x86/ras: Flip the TSC-adding logic Add the TSC value to the MCE record only when the MCE being logged is precise, i.e., it is logged as an exception or an MCE-related interrupt. So it doesn't look particularly easy to do without touching/changing a bunch of places. That's why I'm trying tricks first. For example, the mce-apei.c case I'm addressing by setting ->tsc only for errors of panic severity. The idea there is, that, panic errors will have raised an #MC and not polled. And then instead of propagating a flag to mce_setup(), it seems easier/less code to set ->tsc depending on the call sites, i.e., are we polling or are we preparing an MCE record in an exception handler/thresholding interrupt. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170123183514.13356-5-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-24 09:14:54 +01:00
Yazen Ghannam	0b737a9c2a	x86/ras/amd: Make sysfs names of banks more user-friendly Currently, we append the MCA_IPID[InstanceId] to the bank name to create the sysfs filename. The InstanceId field uniquely identifies a bank instance but it doesn't look very nice for most banks. Replace the InstanceId with a simpler, ascending (0, 1, ..) value. Only use this in the sysfs name when there is more than 1 instance. Otherwise, just use the bank's name as the sysfs name. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1484322741-41884-3-git-send-email-Yazen.Ghannam@amd.com Link: http://lkml.kernel.org/r/20170123183514.13356-4-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-24 09:14:53 +01:00
Borislav Petkov	9b052ea4ce	x86/ras/therm_throt: Do not log a fake MCE for thermal events We log a fake bank 128 MCE to note that we're handling a CPU thermal event. However, this confuses people into thinking that their hardware generates MCEs. Hijacking MCA for logging thermal events is a gross misuse anyway and it shouldn't have been done in the first place. And besides we have other means for dealing with thermal events which are much more suitable. So let's kill the MCE logging part. Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Ashok Raj <ashok.raj@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170105213846.GA12024@gmail.com Link: http://lkml.kernel.org/r/20170123183514.13356-3-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-24 09:14:53 +01:00
Borislav Petkov	d4b2ac63b0	x86/ras/inject: Make it depend on X86_LOCAL_APIC=y ... and get rid of the annoying: arch/x86/kernel/cpu/mcheck/mce-inject.c:97:13: warning: ‘mce_irq_ipi’ defined but not used [-Wunused-function] when doing randconfig builds. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170123183514.13356-2-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-24 09:14:52 +01:00
Yu-cheng Yu	dffba9a31c	x86/fpu/xstate: Fix xcomp_bv in XSAVES header The compacted-format XSAVES area is determined at boot time and never changed after. The field xsave.header.xcomp_bv indicates which components are in the fixed XSAVES format. In fpstate_init() we did not set xcomp_bv to reflect the XSAVES format since at the time there is no valid data. However, after we do copy_init_fpstate_to_fpregs() in fpu__clear(), as in commit: `b22cbe404a` x86/fpu: Fix invalid FPU ptrace state after execve() and when __fpu_restore_sig() does fpu__restore() for a COMPAT-mode app, a #GP occurs. This can be easily triggered by doing valgrind on a COMPAT-mode "Hello World," as reported by Joakim Tjernlund and others: https://bugzilla.kernel.org/show_bug.cgi?id=190061 Fix it by setting xcomp_bv correctly. This patch also moves the xcomp_bv initialization to the proper place, which was in copyin_to_xsaves() as of: `4c833368f0` x86/fpu: Set the xcomp_bv when we fake up a XSAVES area which fixed the bug too, but it's more efficient and cleaner to initialize things once per boot, not for every signal handling operation. Reported-by: Kevin Hao <haokexin@gmail.com> Reported-by: Joakim Tjernlund <Joakim.Tjernlund@infinera.com> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@suse.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: haokexin@gmail.com Link: http://lkml.kernel.org/r/1485212084-4418-1-git-send-email-yu-cheng.yu@intel.com [ Combined it with `4c833368f0`. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-24 09:04:48 +01:00
Kevin Hao	4c833368f0	x86/fpu: Set the xcomp_bv when we fake up a XSAVES area I got the following calltrace on a Apollo Lake SoC with 32-bit kernel: WARNING: CPU: 2 PID: 261 at arch/x86/include/asm/fpu/internal.h:363 fpu__restore+0x1f5/0x260 [...] Hardware name: Intel Corp. Broxton P/NOTEBOOK, BIOS APLIRVPA.X64.0138.B35.1608091058 08/09/2016 Call Trace: dump_stack() __warn() ? fpu__restore() warn_slowpath_null() fpu__restore() __fpu__restore_sig() fpu__restore_sig() restore_sigcontext.isra.9() sys_sigreturn() do_int80_syscall_32() entry_INT80_32() The reason is that a #GP occurs when executing XRSTORS. The root cause is that we forget to set the xcomp_bv when we fake up the XSAVES area in the copyin_to_xsaves() function. Signed-off-by: Kevin Hao <haokexin@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/1485075023-30161-1-git-send-email-haokexin@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-01-23 10:40:18 +01:00
Borislav Petkov	da0aa3dde0	x86/microcode/AMD: Remove struct cont_desc.eq_id The equivalence ID was needed outside of the container scanning logic but now, after this has been cleaned up, not anymore. Now, cont_desc.mc is used to denote whether the container we're looking at has the proper microcode patch for this CPU or not. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/20170120202955.4091-17-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-01-23 10:02:51 +01:00
Borislav Petkov	69f5f98300	x86/microcode/AMD: Remove AP scanning optimization The idea was to not scan the microcode blob on each AP (Application Processor) during boot and thus save us some milliseconds. However, on architectures where the microcode engine is shared between threads, this doesn't work. Here's why: The microcode on CPU0, i.e., the first thread, gets updated. The second thread, i.e., CPU1, i.e., the first AP walks into load_ucode_amd_ap(), sees that there's no container cached and goes and scans for the proper blob. It finds it and as a last step of apply_microcode_early_amd(), it tries to apply the patch but that core has already the updated microcode revision which it has received through CPU0's update. So it returns false and we do desc->size = -1 to prevent other APs from scanning. However, the next AP, CPU2, has a different microcode engine which hasn't been updated yet. The desc->size == -1 test prevents it from scanning the blob anew and we fail to update it. The fix is much more straight-forward than it looks: the BSP (BootStrapping Processor), i.e., CPU0, caches the microcode patch in amd_ucode_patch. We use that on the AP and try to apply it. In the 99.9999% of cases where we have homogeneous cores - not mixed-steppings - the application will be successful and we're good to go. In the remaining small set of systems, we will simply rescan the blob and find (or not, if none present) the proper patch and apply it then. Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170120202955.4091-16-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-01-23 10:02:51 +01:00
Borislav Petkov	72edfe950b	x86/microcode/AMD: Simplify saving from initrd No need to use the previously stashed info in the container - simply go ahead and parse the initrd once more. It simplifies and streamlines the code a whole lot. Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170120202955.4091-15-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-01-23 10:02:50 +01:00
Borislav Petkov	e71bb4ec07	x86/microcode/AMD: Unify load_ucode_amd_ap() Use a version for both bitness by adding a helper which does the actual container finding and parsing which can be used on any CPU - BSP or AP. Streamlines the paths more. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/20170120202955.4091-14-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-01-23 10:02:50 +01:00
Borislav Petkov	f3ad136d6e	x86/microcode/AMD: Check patch level only on the BSP Check final patch levels for AMD only on the BSP. This way, we decide early and only once whether to continue loading or to leave the loader disabled on such systems. Simplify a lot. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/20170120202955.4091-13-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-01-23 10:02:50 +01:00
Borislav Petkov	7a93a40be2	x86/microcode: Remove local vendor variable Use x86_cpuid_vendor() directly. No functionality change. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/20170120202955.4091-12-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-01-23 10:02:49 +01:00
Borislav Petkov	8cc26e0b4c	x86/microcode/AMD: Use find_microcode_in_initrd() Use the generic helper instead of semi-open-coding the procedure. Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170120202955.4091-11-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-01-23 10:02:48 +01:00
Borislav Petkov	3da9b41794	x86/microcode/AMD: Get rid of global this_equiv_id We have a container which we update/prepare each time before applying a microcode patch instead of using a global. Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170120202955.4091-10-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-01-23 10:02:48 +01:00
Borislav Petkov	309aac7776	x86/microcode: Decrease CPUID use Get CPUID(1).EAX value once per CPU and propagate value into the callers instead of conveniently calling it every time. Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170120202955.4091-9-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-01-23 10:02:47 +01:00
Borislav Petkov	8801b3fcb5	x86/microcode/AMD: Rework container parsing It was pretty clumsy before and the whole work of parsing the microcode containers was spread around the functions wrongly. Clean it up so that there's a main scan_containers() function which iterates over the microcode blob and picks apart the containers glued together. For each container, it calls a parse_container() helper which concentrates on one container only: sanity-checking, parsing, counting microcode patches in there, etc. It makes much more sense now and it is actually very readable. Oh, and we luvz a diffstat removing more crap than adding. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/20170120202955.4091-8-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-01-23 10:02:47 +01:00
Borislav Petkov	f454177f73	x86/microcode/AMD: Extend the container struct Make it into a container descriptor which is being passed around and stores important info like the matching container and the patch for the current CPU. Make it static too. Later patches will use this and thus get rid of a double container parsing. Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170120202955.4091-7-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-01-23 10:02:47 +01:00
Borislav Petkov	ef901dc33d	x86/microcode/AMD: Shorten function parameter's name The whole driver calls this "mc", do that here too. No functionality change. Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170120202955.4091-6-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-01-23 10:02:46 +01:00
Borislav Petkov	1f02ac0682	x86/microcode/AMD: Clean up find_equiv_id() No need to have it marked "inline" - let gcc decide. Also, shorten the argument name and simplify while-test. While at it, make it into a proper for-loop and simplify it even more, as tglx suggests. No functionality change. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/20170120202955.4091-5-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-01-23 10:02:46 +01:00
Borislav Petkov	c26665ab5c	x86/microcode/intel: Drop stashed AP patch pointer optimization This was meant to save us the scanning of the microcode containter in the initrd since the first AP had already done that but it can also hurt us: Imagine a single hyperthreaded CPU (Intel(R) Atom(TM) CPU N270, for example) which updates the microcode on the BSP but since the microcode engine is shared between the two threads, the update on CPU1 doesn't happen because it has already happened on CPU0 and we don't find a newer microcode revision on CPU1. Which doesn't set the intel_ucode_patch pointer and at initrd jettisoning time we don't save the microcode patch for later application. Now, when we suspend to RAM, the loaded microcode gets cleared so we need to reload but there's no patch saved in the cache. Removing the optimization fixes this issue and all is fine and dandy. Fixes: `06b8534cb7` ("x86/microcode: Rework microcode loading") Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170120202955.4091-2-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-01-23 09:39:55 +01:00
Xunlei Pang	a8d4c8246b	x86/crash: Update the stale comment in reserve_crashkernel() CRASH_KERNEL_ADDR_MAX has been missing for a long time, update it with a more detailed explanation. Signed-off-by: Xunlei Pang <xlpang@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Young <dyoung@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert LeBlanc <robert@leblancnet.us> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kexec@lists.infradead.org Link: http://lkml.kernel.org/r/1485154103-18426-1-git-send-email-xlpang@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-23 08:57:55 +01:00
K. Y. Srinivasan	8de8af7e08	Drivers: hv: vmbus: Move the extracting of Hypervisor version information As part of the effort to separate out architecture specific code, extract hypervisor version information in an architecture specific file. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-01-20 14:48:03 +01:00
K. Y. Srinivasan	63ed4e0c67	Drivers: hv: vmbus: Consolidate all Hyper-V specific clocksource code As part of the effort to separate out architecture specific code, consolidate all Hyper-V specific clocksource code to an architecture specific code. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-01-20 14:48:03 +01:00
Andy Shevchenko	358e96deae	x86/ioapic: Return suitable error code in mp_map_gsi_to_irq() mp_map_gsi_to_irq() in some cases might return legacy -1, which would be wrongly interpreted as -EPERM. Correct those cases to return proper error code. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: http://lkml.kernel.org/r/20170119192425.189899-2-andriy.shevchenko@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-01-20 10:07:41 +01:00
Peter Zijlstra	acb04058de	sched/clock: Fix hotplug crash Mike reported that he could trigger the WARN_ON_ONCE() in set_sched_clock_stable() using hotplug. This exposed a fundamental problem with the interface, we should never mark the TSC stable if we ever find it to be unstable. Therefore set_sched_clock_stable() is a broken interface. The reason it existed is that not having it is a pain, it means all relevant architecture code needs to call clear_sched_clock_stable() where appropriate. Of the three architectures that select HAVE_UNSTABLE_SCHED_CLOCK ia64 and parisc are trivial in that they never called set_sched_clock_stable(), so add an unconditional call to clear_sched_clock_stable() to them. For x86 the story is a lot more involved, and what this patch tries to do is ensure we preserve the status quo. So even is Cyrix or Transmeta have usable TSC they never called set_sched_clock_stable() so they now get an explicit mark unstable. Reported-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: `9881b024b7` ("sched/clock: Delay switching sched_clock to stable") Link: http://lkml.kernel.org/r/20170119133633.GB6536@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-20 02:38:46 +01:00
K. Y. Srinivasan	8730046c14	Drivers: hv vmbus: Move Hypercall page setup out of common code As part of the effort to separate out architecture specific code, move the hypercall page setup to an architecture specific file. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-01-19 11:42:07 +01:00
Tim Chen	02cfdc95a0	sched/x86: Remove unnecessary TBM3 check to update topology Scheduling to the max performance core is enabled by default for Turbo Boost Maxt Technology 3.0 capable platforms. Remove the useless sysctl_sched_itmt_enabled check to update sched topology for adding the prioritized core scheduling flag. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bp@suse.de Cc: jolsa@redhat.com Cc: linux-acpi@vger.kernel.org Cc: linux-pm@vger.kernel.org Cc: rjw@rjwysocki.net Link: http://lkml.kernel.org/r/1484778629-4404-1-git-send-email-tim.c.chen@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-19 08:42:37 +01:00
Ruslan Ruslichenko	020eb3daab	x86/ioapic: Restore IO-APIC irq_chip retrigger callback commit `d32932d02e` removed the irq_retrigger callback from the IO-APIC chip and did not add it to the new IO-APIC-IR irq chip. Unfortunately the software resend fallback is not enabled on X86, so edge interrupts which are received during the lazy disabled state of the interrupt line are not retriggered and therefor lost. Restore the callbacks. [ tglx: Massaged changelog ] Fixes: `d32932d02e` ("x86/irq: Convert IOAPIC to use hierarchical irqdomain interfaces") Signed-off-by: Ruslan Ruslichenko <rruslich@cisco.com> Cc: xe-linux-external@cisco.com Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1484662432-13580-1-git-send-email-rruslich@cisco.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-01-18 15:37:28 +01:00
Ruslan Ruslichenko	a9b4f08770	x86/ioapic: Restore IO-APIC irq_chip retrigger callback commit `d32932d02e` removed the irq_retrigger callback from the IO-APIC chip and did not add it to the new IO-APIC-IR irq chip. There is no harm because the interrupts are resent in software when the retrigger callback is NULL, but it's less efficient. So restore them. [ tglx: Massaged changelog ] Fixes: `d32932d02e` ("x86/irq: Convert IOAPIC to use hierarchical irqdomain interfaces") Signed-off-by: Ruslan Ruslichenko <rruslich@cisco.com> Cc: xe-linux-external@cisco.com Link: http://lkml.kernel.org/r/1484662432-13580-1-git-send-email-rruslich@cisco.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-01-18 11:51:02 +01:00
Piotr Luc	06b35d93af	x86/cpufeature: Add AVX512_VPOPCNTDQ feature Vector population count instructions for dwords and qwords are going to be available in future Intel Xeon & Xeon Phi processors. Bit 14 of CPUID[level:0x07, ECX] indicates that the instructions are supported by a processor. The specification can be found in the Intel Software Developer Manual (SDM) and in the Instruction Set Extensions Programming Reference (ISE). Populate the feature bit and clear it when xsave is disabled. Signed-off-by: Piotr Luc <piotr.luc@intel.com> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: kvm@vger.kernel.org Cc: Radim Krčmář <rkrcmar@redhat.com> Link: http://lkml.kernel.org/r/20170110173403.6010-2-piotr.luc@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-01-16 20:40:53 +01:00
Linus Torvalds	83346fbc07	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes: - unwinder fixes - AMD CPU topology enumeration fixes - microcode loader fixes - x86 embedded platform fixes - fix for a bootup crash that may trigger when clearcpuid= is used with invalid values" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mpx: Use compatible types in comparison to fix sparse error x86/tsc: Add the Intel Denverton Processor to native_calibrate_tsc() x86/entry: Fix the end of the stack for newly forked tasks x86/unwind: Include __schedule() in stack traces x86/unwind: Disable KASAN checks for non-current tasks x86/unwind: Silence warnings for non-current tasks x86/microcode/intel: Use correct buffer size for saving microcode data x86/microcode/intel: Fix allocation size of struct ucode_patch x86/microcode/intel: Add a helper which gives the microcode revision x86/microcode: Use native CPUID to tickle out microcode revision x86/CPU: Add native CPUID variants returning a single datum x86/boot: Add missing declaration of string functions x86/CPU/AMD: Fix Bulldozer topology x86/platform/intel-mid: Rename 'spidev' to 'mrfld_spidev' x86/cpu: Fix typo in the comment for Anniedale x86/cpu: Fix bootup crashes by sanitizing the argument of the 'clearcpuid=' command-line option	2017-01-15 12:03:11 -08:00
Thomas Gleixner	12907fbb1a	sched/clock, clocksource: Add optional cs::mark_unstable() method PeterZ reported that we'd fail to mark the TSC unstable when the clocksource watchdog finds it unsuitable. Allow a clocksource to run a custom action when its being marked unstable and hook up the TSC unstable code. Reported-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-14 11:29:43 +01:00
Waiman Long	aef591cd3d	locking/spinlocks/x86, paravirt: Remove paravirt_ticketlocks_enabled This is a follow-up of commit: `cfd8983f03` ("x86, locking/spinlocks: Remove ticket (spin)lock implementation") The static_key structure 'paravirt_ticketlocks_enabled' is now removed as it is no longer used. As a result, the init functions kvm_spinlock_init_jump() and xen_init_spinlocks_jump() are also removed. A simple build and boot test was done to verify it. Signed-off-by: Waiman Long <longman@redhat.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Alok Kataria <akataria@vmware.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kvm@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1484252878-1962-1-git-send-email-longman@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-14 09:33:46 +01:00
Len Brown	695085b4bc	x86/tsc: Add the Intel Denverton Processor to native_calibrate_tsc() The Intel Denverton microserver uses a 25 MHz TSC crystal, so we can derive its exact [] TSC frequency using CPUID and some arithmetic, eg.: TSC: 1800 MHz (25000000 Hz 216 / 3 / 1000000) [*] 'exact' is only as good as the crystal, which should be +/- 20ppm Signed-off-by: Len Brown <len.brown@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/306899f94804aece6d8fa8b4223ede3b48dbb59c.1484287748.git.len.brown@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-14 09:30:37 +01:00
Mike Travis	81a7117674	x86/platform/UV: Fix 2 socket config problem A UV4 chassis with only 2 sockets configured can unexpectedly target the wrong UV hub. Fix the problem by limiting the minimum size of a partition to 4 sockets even if only 2 are configured. Signed-off-by: Mike Travis <travis@sgi.com> Reviewed-by: Russ Anderson <rja@hpe.com> Acked-by: Dimitri Sivanich <sivanich@hpe.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170113152111.313888353@asylum.americas.sgi.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-14 09:26:35 +01:00
Mike Travis	eee5715efd	x86/platform/UV: Fix panic with missing UVsystab support Fix the panic where KEXEC'd kernel does not have access to EFI runtime mappings. This may cause the extended UVsystab to not be available. The solution is to revert to non-UV mode and continue with limited capabilities. Signed-off-by: Mike Travis <travis@sgi.com> Reviewed-by: Russ Anderson <rja@hpe.com> Reviewed-by: Alex Thorlton <athorlton@sgi.com> Acked-by: Dimitri Sivanich <sivanich@hpe.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170113152111.118886202@asylum.americas.sgi.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-14 09:26:35 +01:00
Andy Shevchenko	6e03f66c00	locking/jump_labels: Update bug_at() boot message First of all, %*ph specifier allows to dump data in hex format using the pointer to a buffer. This is suitable to use here. Besides that Thomas suggested to move it to critical level and replace __FILE__ by explicit mention of "jumplabel". Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170110164354.47372-1-andriy.shevchenko@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-12 09:43:07 +01:00
Arnd Bergmann	c19a5f35e3	x86/e820/32: Fix e820_search_gap() error handling on x86-32 GCC correctly points out that on 32-bit kernels, e820_search_gap() not finding a start now leads to pci_mem_start ('gapstart') being set to an uninitialized value: arch/x86/kernel/e820.c: In function 'e820_setup_gap': arch/x86/kernel/e820.c:641:16: error: 'gapstart' may be used uninitialized in this function [-Werror=maybe-uninitialized] This restores the behavior from before this cleanup: `b4ed1d15b4` ("x86/e820: Make e820_search_gap() static and remove unused variables") ... defaulting to address 0x10000000 if nothing was found. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Wei Yang <richard.weiyang@gmail.com> Fixes: `b4ed1d15b4` ("x86/e820: Make e820_search_gap() static and remove unused variables") Link: http://lkml.kernel.org/r/20170111144926.695369-1-arnd@arndb.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-12 09:40:06 +01:00
Josh Poimboeuf	84936118bd	x86/unwind: Disable KASAN checks for non-current tasks There are a handful of callers to save_stack_trace_tsk() and show_stack() which try to unwind the stack of a task other than current. In such cases, it's remotely possible that the task is running on one CPU while the unwinder is reading its stack from another CPU, causing the unwinder to see stack corruption. These cases seem to be mostly harmless. The unwinder has checks which prevent it from following bad pointers beyond the bounds of the stack. So it's not really a bug as long as the caller understands that unwinding another task will not always succeed. In such cases, it's possible that the unwinder may read a KASAN-poisoned region of the stack. Account for that by using READ_ONCE_NOCHECK() when reading the stack of another task. Use READ_ONCE() when reading the stack of the current task, since KASAN warnings can still be useful for finding bugs in that case. Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Miroslav Benes <mbenes@suse.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/4c575eb288ba9f73d498dfe0acde2f58674598f1.1483978430.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-12 09:28:27 +01:00
Josh Poimboeuf	900742d89c	x86/unwind: Silence warnings for non-current tasks There are a handful of callers to save_stack_trace_tsk() and show_stack() which try to unwind the stack of a task other than current. In such cases, it's remotely possible that the task is running on one CPU while the unwinder is reading its stack from another CPU, causing the unwinder to see stack corruption. These cases seem to be mostly harmless. The unwinder has checks which prevent it from following bad pointers beyond the bounds of the stack. So it's not really a bug as long as the caller understands that unwinding another task will not always succeed. Since stack "corruption" on another task's stack isn't necessarily a bug, silence the warnings when unwinding tasks other than current. Reported-by: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Miroslav Benes <mbenes@suse.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/00d8c50eea3446c1524a2a755397a3966629354c.1483978430.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-12 09:28:27 +01:00
Junichi Nomura	2e86222c67	x86/microcode/intel: Use correct buffer size for saving microcode data In generic_load_microcode(), curr_mc_size is the size of the last allocated buffer and since we have this performance "optimization" there to vmalloc a new buffer only when the current one is bigger, curr_mc_size ends up becoming the size of the biggest buffer we've seen so far. However, we end up saving the microcode patch which matches our CPU and its size is not curr_mc_size but the respective mc_size during the iteration while we're staring at it. So save that mc_size into a separate variable and use it to store the previously found microcode buffer. Without this fix, we could get oops like this: BUG: unable to handle kernel paging request at ffffc9000e30f000 IP: __memcpy+0x12/0x20 ... Call Trace: ? kmemdup+0x43/0x60 __alloc_microcode_buf+0x44/0x70 save_microcode_patch+0xd4/0x150 generic_load_microcode+0x1b8/0x260 request_microcode_user+0x15/0x20 microcode_write+0x91/0x100 __vfs_write+0x34/0x120 vfs_write+0xc1/0x130 SyS_write+0x56/0xc0 do_syscall_64+0x6c/0x160 entry_SYSCALL64_slow_path+0x25/0x25 Fixes: `06b8534cb7` ("x86/microcode: Rework microcode loading") Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/4f33cbfd-44f2-9bed-3b66-7446cd14256f@ce.jp.nec.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-01-09 23:11:15 +01:00
Junichi Nomura	9fcf5ba2ef	x86/microcode/intel: Fix allocation size of struct ucode_patch We allocate struct ucode_patch here. @size is the size of microcode data and used for kmemdup() later in this function. Fixes: `06b8534cb7` ("x86/microcode: Rework microcode loading") Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/7a730dc9-ac17-35c4-fe76-dfc94e5ecd95@ce.jp.nec.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-01-09 23:11:14 +01:00
Borislav Petkov	4167709bbf	x86/microcode/intel: Add a helper which gives the microcode revision Since on Intel we're required to do CPUID(1) first, before reading the microcode revision MSR, let's add a special helper which does the required steps so that we don't forget to do them next time, when we want to read the microcode revision. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/20170109114147.5082-4-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-01-09 23:11:14 +01:00
Borislav Petkov	f3e2a51f56	x86/microcode: Use native CPUID to tickle out microcode revision Intel supplies the microcode revision value in MSR 0x8b (IA32_BIOS_SIGN_ID) after CPUID(1) has been executed. Execute it each time before reading that MSR. It used to do sync_core() which did do CPUID but `c198b121b1` ("x86/asm: Rewrite sync_core() to use IRET-to-self") changed the sync_core() implementation so we better make the microcode loading case explicit, as the SDM documents it. Reported-and-tested-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/20170109114147.5082-3-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-01-09 23:11:14 +01:00
Frederic Weisbecker	914122c389	x86/apic: Implement set_state_oneshot_stopped() callback When clock_event_device::set_state_oneshot_stopped() is not implemented, hrtimer_cancel() can't stop the clock when there is no more timer in the queue. So the ghost of the freshly cancelled hrtimer haunts us back later with an extra interrupt: <idle>-0 [002] d..2 2248.557659: hrtimer_cancel: hrtimer=ffff88021fa92d80 <idle>-0 [002] d.h1 2249.303659: local_timer_entry: vector=239 So let's implement this missing callback for the lapic clock. This consist in calling its set_state_shutdown() callback. There don't seem to be a lighter way to stop the clock. Simply writing 0 to APIC_TMICT won't be enough to stop the clock and avoid the extra interrupt, as opposed to what is specified in the specs. We must also mask the timer interrupt in the device. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com> Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org> Link: http://lkml.kernel.org/r/1483029949-6925-1-git-send-email-fweisbec@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-01-09 11:48:42 +01:00
Linus Torvalds	2fd8774c79	Merge branch 'stable/for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb Pull swiotlb fixes from Konrad Rzeszutek Wilk: "This has one fix to make i915 work when using Xen SWIOTLB, and a feature from Geert to aid in debugging of devices that can't do DMA outside the 32-bit address space. The feature from Geert is on top of v4.10 merge window commit (specifically you pulling my previous branch), as his changes were dependent on the Documentation/ movement patches. I figured it would just easier than me trying than to cherry-pick the Documentation patches to satisfy git. The patches have been soaking since 12/20, albeit I updated the last patch due to linux-next catching an compiler error and adding an Tested-and-Reported-by tag" * 'stable/for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb: swiotlb: Export swiotlb_max_segment to users swiotlb: Add swiotlb=noforce debug option swiotlb: Convert swiotlb_force from int to enum x86, swiotlb: Simplify pci_swiotlb_detect_override()	2017-01-06 10:53:21 -08:00
Dou Liyang	12bf98b91f	x86/apic: Fix typos in comments s/ID/IDs/ s/inr_logical_cpuidi/nr_logical_cpuids/ s/generic_processor_info()/__generic_processor_info()/ Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1483610083-24314-1-git-send-email-douly.fnst@cn.fujitsu.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-06 08:40:33 +01:00
Boris Ostrovsky	1e620f9b23	x86/boot/32: Convert the 32-bit pgtable setup code from assembly to C The new Xen PVH entry point requires page tables to be setup by the kernel since it is entered with paging disabled. Pull the common code out of head_32.S so that mk_early_pgtbl_32() can be invoked from both the new Xen entry point and the existing startup_32() code. Convert resulting common code to C. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: matt@codeblueprint.co.uk Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1481215471-9639-1-git-send-email-boris.ostrovsky@oracle.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-06 08:39:26 +01:00
Borislav Petkov	a33d331761	x86/CPU/AMD: Fix Bulldozer topology The following commit: `8196dab4fc` ("x86/cpu: Get rid of compute_unit_id") ... broke the initial strategy for Bulldozer-based cores' topology, where we consider each thread of a compute unit a standalone core and not a HT or SMT thread. Revert to the firmware-supplied core_id numbering and do not make them thread siblings as we don't consider them for such even if they technically are, more or less. Reported-and-tested-by: Brice Goglin <Brice.Goglin@inria.fr> Tested-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> # v4.6+ Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: `8196dab4fc` ("x86/cpu: Get rid of compute_unit_id") Link: http://lkml.kernel.org/r/20170105092638.5247-1-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-06 08:37:41 +01:00
Daniel Bristot de Oliveira	c4158ff536	x86/irq, trace: Add __irq_entry annotation to x86's platform IRQ handlers This patch adds the __irq_entry annotation to the default x86 platform IRQ handlers. ftrace's function_graph tracer uses the __irq_entry annotation to notify the entry and return of IRQ handlers. For example, before the patch: 354549.667252 \| 3) d..1 \| default_idle_call() { 354549.667252 \| 3) d..1 \| arch_cpu_idle() { 354549.667253 \| 3) d..1 \| default_idle() { 354549.696886 \| 3) d..1 \| smp_trace_reschedule_interrupt() { 354549.696886 \| 3) d..1 \| irq_enter() { 354549.696886 \| 3) d..1 \| rcu_irq_enter() { After the patch: 366416.254476 \| 3) d..1 \| arch_cpu_idle() { 366416.254476 \| 3) d..1 \| default_idle() { 366416.261566 \| 3) d..1 ==========> \| 366416.261566 \| 3) d..1 \| smp_trace_reschedule_interrupt() { 366416.261566 \| 3) d..1 \| irq_enter() { 366416.261566 \| 3) d..1 \| rcu_irq_enter() { KASAN also uses this annotation. The smp_apic_timer_interrupt() was already annotated. Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Aaron Lu <aaron.lu@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Claudio Fontana <claudio.fontana@huawei.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dou Liyang <douly.fnst@cn.fujitsu.com> Cc: Gu Zheng <guz.fnst@cn.fujitsu.com> Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nicolai Stange <nicstange@gmail.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Cc: linux-edac@vger.kernel.org Link: http://lkml.kernel.org/r/059fdf437c2f0c09b13c18c8fe4e69999d3ffe69.1483528431.git.bristot@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-05 08:58:49 +01:00
Lukasz Odzioba	dd853fd216	x86/cpu: Fix bootup crashes by sanitizing the argument of the 'clearcpuid=' command-line option A negative number can be specified in the cmdline which will be used as setup_clear_cpu_cap() argument. With that we can clear/set some bit in memory predceeding boot_cpu_data/cpu_caps_cleared which may cause kernel to misbehave. This patch adds lower bound check to setup_disablecpuid(). Boris Petkov reproduced a crash: [ 1.234575] BUG: unable to handle kernel paging request at ffffffff858bd540 [ 1.236535] IP: memcpy_erms+0x6/0x10 Signed-off-by: Lukasz Odzioba <lukasz.odzioba@intel.com> Acked-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: andi.kleen@intel.com Cc: bp@alien8.de Cc: dave.hansen@linux.intel.com Cc: luto@kernel.org Cc: slaoub@gmail.com Fixes: `ac72e7888a` ("x86: add generic clearcpuid=... option") Link: http://lkml.kernel.org/r/1482933340-11857-1-git-send-email-lukasz.odzioba@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-05 08:54:34 +01:00
Wei Yang	b4ed1d15b4	x86/e820: Make e820_search_gap() static and remove unused variables e820_search_gap() is just used locally now and the 'start_addr' and 'end_addr' parameters are fixed values. Also, 'gapstart' is not checked in this function anymore. So make the function static and remove those unused variables. Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: akataria@vmware.com Link: http://lkml.kernel.org/r/1482676551-11411-1-git-send-email-richard.weiyang@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-12-28 09:20:29 +01:00
Thomas Gleixner	0dad3a3014	x86/mce/AMD: Make the init code more robust If mce_device_init() fails then the mce device pointer is NULL and the AMD mce code happily dereferences it. Add a sanity check. Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-12-26 17:30:24 -08:00
Linus Torvalds	3ddc76dfc7	Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer type cleanups from Thomas Gleixner: "This series does a tree wide cleanup of types related to timers/timekeeping. - Get rid of cycles_t and use a plain u64. The type is not really helpful and caused more confusion than clarity - Get rid of the ktime union. The union has become useless as we use the scalar nanoseconds storage unconditionally now. The 32bit timespec alike storage got removed due to the Y2038 limitations some time ago. That leaves the odd union access around for no reason. Clean it up. Both changes have been done with coccinelle and a small amount of manual mopping up" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: ktime: Get rid of ktime_equal() ktime: Cleanup ktime_set() usage ktime: Get rid of the union clocksource: Use a plain u64 instead of cycle_t	2016-12-25 14:30:04 -08:00
Linus Torvalds	b272f732f8	Merge branch 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull SMP hotplug notifier removal from Thomas Gleixner: "This is the final cleanup of the hotplug notifier infrastructure. The series has been reintgrated in the last two days because there came a new driver using the old infrastructure via the SCSI tree. Summary: - convert the last leftover drivers utilizing notifiers - fixup for a completely broken hotplug user - prevent setup of already used states - removal of the notifiers - treewide cleanup of hotplug state names - consolidation of state space There is a sphinx based documentation pending, but that needs review from the documentation folks" * 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/armada-xp: Consolidate hotplug state space irqchip/gic: Consolidate hotplug state space coresight/etm3/4x: Consolidate hotplug state space cpu/hotplug: Cleanup state names cpu/hotplug: Remove obsolete cpu hotplug register/unregister functions staging/lustre/libcfs: Convert to hotplug state machine scsi/bnx2i: Convert to hotplug state machine scsi/bnx2fc: Convert to hotplug state machine cpu/hotplug: Prevent overwriting of callbacks x86/msr: Remove bogus cleanup from the error path bus: arm-ccn: Prevent hotplug callback leak perf/x86/intel/cstate: Prevent hotplug callback leak ARM/imx/mmcd: Fix broken cpu hotplug handling scsi: qedi: Convert to hotplug state machine	2016-12-25 14:05:56 -08:00
Thomas Gleixner	a5a1d1c291	clocksource: Use a plain u64 instead of cycle_t There is no point in having an extra type for extra confusion. u64 is unambiguous. Conversion was done with the following coccinelle script: @rem@ @@ -typedef u64 cycle_t; @fix@ typedef cycle_t; @@ -cycle_t +u64 Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org>	2016-12-25 11:04:12 +01:00
Thomas Gleixner	73c1b41e63	cpu/hotplug: Cleanup state names When the state names got added a script was used to add the extra argument to the calls. The script basically converted the state constant to a string, but the cleanup to convert these strings into meaningful ones did not happen. Replace all the useless strings with 'subsys/xxx/yyy:state' strings which are used in all the other places already. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Link: http://lkml.kernel.org/r/20161221192112.085444152@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-12-25 10:47:44 +01:00
Thomas Gleixner	59fefd0890	x86/msr: Remove bogus cleanup from the error path The error cleanup which is invoked when the hotplug state setup failed tries to remove the failed state, which is broken. Fixes: `8fba38c937` ("x86/msr: Convert to hotplug state machine") Reported-by: kernel test robot <fengguang.wu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Siewior <bigeasy@linutronix.de>	2016-12-25 10:47:41 +01:00
Linus Torvalds	7c0f6ba682	Replace <asm/uaccess.h> with <linux/uaccess.h> globally This was entirely automated, using the script by Al: PATT='^[[:blank:]]#[[:blank:]]include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"\|grep -v ^include/linux/uaccess.h) to do the replacement at the end of the merge window. Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-12-24 11:46:01 -08:00
Linus Torvalds	6ac3bb167f	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "There's a number of fixes: - a round of fixes for CPUID-less legacy CPUs - a number of microcode loader fixes - i8042 detection robustization fixes - stack dump/unwinder fixes - x86 SoC platform driver fixes - a GCC 7 warning fix - virtualization related fixes" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) Revert "x86/unwind: Detect bad stack return address" x86/paravirt: Mark unused patch_default label x86/microcode/AMD: Reload proper initrd start address x86/platform/intel/quark: Add printf attribute to imr_self_test_result() x86/platform/intel-mid: Switch MPU3050 driver to IIO x86/alternatives: Do not use sync_core() to serialize I$ x86/topology: Document cpu_llc_id x86/hyperv: Handle unknown NMIs on one CPU when unknown_nmi_panic x86/asm: Rewrite sync_core() to use IRET-to-self x86/microcode/intel: Replace sync_core() with native_cpuid() Revert "x86/boot: Fail the boot if !M486 and CPUID is missing" x86/asm/32: Make sync_core() handle missing CPUID on all 32-bit kernels x86/cpu: Probe CPUID leaf 6 even when cpuid_level == 6 x86/tools: Fix gcc-7 warning in relocs.c x86/unwind: Dump stack data on warnings x86/unwind: Adjust last frame check for aligned function stacks x86/init: Fix a couple of comment typos x86/init: Remove i8042_detect() from platform ops Input: i8042 - Trust firmware a bit more when probing on X86 x86/init: Add i8042 state to the platform data ...	2016-12-23 16:54:46 -08:00
Josh Poimboeuf	c280f7736a	Revert "x86/unwind: Detect bad stack return address" Revert the following commit: `b6959a3621` ("x86/unwind: Detect bad stack return address") ... because Andrey Konovalov reported an unwinder warning: WARNING: unrecognized kernel stack return address ffffffffa0000001 at ffff88006377fa18 in a.out:4467 The unwind was initiated from an interrupt which occurred while running in the generated code for a kprobe. The unwinder printed the warning because it expected regs->ip to point to a valid text address, but instead it pointed to the generated code. Eventually we may want come up with a way to identify generated kprobe code so the unwinder can know that it's a valid return address. Until then, just remove the warning. Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/02f296848fbf49fb72dfeea706413ecbd9d4caf6.1482418739.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-12-23 20:32:30 +01:00
Linus Torvalds	eb254f323b	Merge branch 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cache allocation interface from Thomas Gleixner: "This provides support for Intel's Cache Allocation Technology, a cache partitioning mechanism. The interface is odd, but the hardware interface of that CAT stuff is odd as well. We tried hard to come up with an abstraction, but that only allows rather simple partitioning, but no way of sharing and dealing with the per package nature of this mechanism. In the end we decided to expose the allocation bitmaps directly so all combinations of the hardware can be utilized. There are two ways of associating a cache partition: - Task A task can be added to a resource group. It uses the cache partition associated to the group. - CPU All tasks which are not member of a resource group use the group to which the CPU they are running on is associated with. That allows for simple CPU based partitioning schemes. The main expected user sare: - Virtualization so a VM can only trash only the associated part of the cash w/o disturbing others - Real-Time systems to seperate RT and general workloads. - Latency sensitive enterprise workloads - In theory this also can be used to protect against cache side channel attacks" [ Intel RDT is "Resource Director Technology". The interface really is rather odd and very specific, which delayed this pull request while I was thinking about it. The pull request itself came in early during the merge window, I just delayed it until things had calmed down and I had more time. But people tell me they'll use this, and the good news is that it is _so_ specific that it's rather independent of anything else, and no user is going to depend on the interface since it's pretty rare. So if push comes to shove, we can just remove the interface and nothing will break ] * 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits) x86/intel_rdt: Implement show_options() for resctrlfs x86/intel_rdt: Call intel_rdt_sched_in() with preemption disabled x86/intel_rdt: Update task closid immediately on CPU in rmdir and unmount x86/intel_rdt: Fix setting of closid when adding CPUs to a group x86/intel_rdt: Update percpu closid immeditately on CPUs affected by changee x86/intel_rdt: Reset per cpu closids on unmount x86/intel_rdt: Select KERNFS when enabling INTEL_RDT_A x86/intel_rdt: Prevent deadlock against hotplug lock x86/intel_rdt: Protect info directory from removal x86/intel_rdt: Add info files to Documentation x86/intel_rdt: Export the minimum number of set mask bits in sysfs x86/intel_rdt: Propagate error in rdt_mount() properly x86/intel_rdt: Add a missing #include MAINTAINERS: Add maintainer for Intel RDT resource allocation x86/intel_rdt: Add scheduler hook x86/intel_rdt: Add schemata file x86/intel_rdt: Add tasks files x86/intel_rdt: Add cpus file x86/intel_rdt: Add mkdir to resctrl file system x86/intel_rdt: Add "info" files to resctrl file system ...	2016-12-22 09:25:45 -08:00
Peter Zijlstra	cef4402d76	x86/paravirt: Mark unused patch_default label A bugfix commit: `45dbea5f55` ("x86/paravirt: Fix native_patch()") ... introduced a harmless warning: arch/x86/kernel/paravirt_patch_32.c: In function 'native_patch': arch/x86/kernel/paravirt_patch_32.c:71:1: error: label 'patch_default' defined but not used [-Werror=unused-label] Fix it by annotating the label as __maybe_unused. Reported-by: Arnd Bergmann <arnd@arndb.de> Reported-by: Piotr Gregor <piotrgregor@rsyncme.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: `45dbea5f55` ("x86/paravirt: Fix native_patch()") Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-12-22 17:43:35 +01:00
Borislav Petkov	8877ebdd3f	x86/microcode/AMD: Reload proper initrd start address When we switch to virtual addresses and, especially after reserve_initrd()->relocate_initrd() have run, we have the updated initrd address in initrd_start. Use initrd_start then instead of the address which has been passed to us through boot params. (That still gets used when we're running the very early routines on the BSP). Reported-and-tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/20161220144012.lc4cwrg6dphqbyqu@pd.tnic Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-12-21 10:50:04 +01:00
Borislav Petkov	34bfab0eaf	x86/alternatives: Do not use sync_core() to serialize I$ We use sync_core() in the alternatives code to stop speculative execution of prefetched instructions because we are potentially changing them and don't want to execute stale bytes. What it does on most machines is call CPUID which is a serializing instruction. And that's expensive. However, the instruction cache is serialized when we're on the local CPU and are changing the data through the same virtual address. So then, we don't need the serializing CPUID but a simple control flow change. Last being accomplished with a CALL/RET which the noinline causes. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Brian Gerst <brgerst@gmail.com> Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Cc: Matthew Whitehead <tedheadster@gmail.com> Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161203150258.vwr5zzco7ctgc4pe@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-12-20 09:36:42 +01:00
Vitaly Kuznetsov	59107e2f48	x86/hyperv: Handle unknown NMIs on one CPU when unknown_nmi_panic There is a feature in Hyper-V ('Debug-VM --InjectNonMaskableInterrupt') which injects NMI to the guest. We may want to crash the guest and do kdump on this NMI by enabling unknown_nmi_panic. To make kdump succeed we need to allow the kdump kernel to re-establish VMBus connection so it will see VMBus devices (storage, network,..). To properly unload VMBus making it possible to start over during kdump we need to do the following: - Send an 'unload' message to the hypervisor. This can be done on any CPU so we do this the crashing CPU. - Receive the 'unload finished' reply message. WS2012R2 delivers this message to the CPU which was used to establish VMBus connection during module load and this CPU may differ from the CPU sending 'unload'. Receiving a VMBus message means the following: - There is a per-CPU slot in memory for one message. This slot can in theory be accessed by any CPU. - We get an interrupt on the CPU when a message was placed into the slot. - When we read the message we need to clear the slot and signal the fact to the hypervisor. In case there are more messages to this CPU pending the hypervisor will deliver the next message. The signaling is done by writing to an MSR so this can only be done on the appropriate CPU. To avoid doing cross-CPU work on crash we have vmbus_wait_for_unload() function which checks message slots for all CPUs in a loop waiting for the 'unload finished' messages. However, there is an issue which arises when these conditions are met: - We're crashing on a CPU which is different from the one which was used to initially contact the hypervisor. - The CPU which was used for the initial contact is blocked with interrupts disabled and there is a message pending in the message slot. In this case we won't be able to read the 'unload finished' message on the crashing CPU. This is reproducible when we receive unknown NMIs on all CPUs simultaneously: the first CPU entering panic() will proceed to crash and all other CPUs will stop themselves with interrupts disabled. The suggested solution is to handle unknown NMIs for Hyper-V guests on the first CPU which gets them only. This will allow us to rely on VMBus interrupt handler being able to receive the 'unload finish' message in case it is delivered to a different CPU. The issue is not reproducible on WS2016 as Debug-VM delivers NMI to the boot CPU only, WS2012R2 and earlier Hyper-V versions are affected. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Acked-by: K. Y. Srinivasan <kys@microsoft.com> Cc: devel@linuxdriverproject.org Cc: Haiyang Zhang <haiyangz@microsoft.com> Link: http://lkml.kernel.org/r/20161202100720.28121-1-vkuznets@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-12-20 09:31:48 +01:00
Geert Uytterhoeven	ae7871be18	swiotlb: Convert swiotlb_force from int to enum Convert the flag swiotlb_force from an int to an enum, to prepare for the advent of more possible values. Suggested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2016-12-19 09:05:20 -05:00
Geert Uytterhoeven	6c206e4d99	x86, swiotlb: Simplify pci_swiotlb_detect_override() At the end of the function, the local variable use_swiotlb has always the same value as the global variable swiotlb. Hence drop the local variable completely. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2016-12-19 09:05:20 -05:00
Andy Lutomirski	484d0e5c79	x86/microcode/intel: Replace sync_core() with native_cpuid() The Intel microcode driver is using sync_core() to mean "do CPUID with EAX=1". I want to rework sync_core(), but first the Intel microcode driver needs to stop depending on its current behavior. Reported-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Signed-off-by: Andy Lutomirski <luto@kernel.org> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Juergen Gross <jgross@suse.com> Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Matthew Whitehead <tedheadster@gmail.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: xen-devel <Xen-devel@lists.xen.org> Link: http://lkml.kernel.org/r/535a025bb91fed1a019c5412b036337ad239e5bb.1481307769.git.luto@kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-12-19 11:54:21 +01:00
Andy Lutomirski	3df8d92085	x86/cpu: Probe CPUID leaf 6 even when cpuid_level == 6 A typo (or mis-merge?) resulted in leaf 6 only being probed if cpuid_level >= 7. Fixes: `2ccd71f1b2` ("x86/cpufeature: Move some of the scattered feature bits to x86_capability") Signed-off-by: Andy Lutomirski <luto@kernel.org> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Link: http://lkml.kernel.org/r/6ea30c0e9daec21e488b54761881a6dfcf3e04d0.1481825597.git.luto@kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-12-19 11:50:24 +01:00
Josh Poimboeuf	8b5e99f022	x86/unwind: Dump stack data on warnings The unwinder warnings are good at finding unexpected unwinder issues, but they often don't give enough data to be able to fully diagnose them. Print a one-time stack dump when a warning is detected. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/15607370e3ddb1732b6a73d5c65937864df16ac8.1481904011.git.jpoimboe@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-12-19 11:47:05 +01:00
Josh Poimboeuf	8023e0e2a4	x86/unwind: Adjust last frame check for aligned function stacks Somehow, CONFIG_PARAVIRT=n convinces gcc to change the x86_64_start_kernel() prologue from: 0000000000000129 <x86_64_start_kernel>: 129: 55 push %rbp 12a: 48 89 e5 mov %rsp,%rbp to: 0000000000000124 <x86_64_start_kernel>: 124: 4c 8d 54 24 08 lea 0x8(%rsp),%r10 129: 48 83 e4 f0 and $0xfffffffffffffff0,%rsp 12d: 41 ff 72 f8 pushq -0x8(%r10) 131: 55 push %rbp 132: 48 89 e5 mov %rsp,%rbp This is an unusual pattern which aligns rsp (though in this case it's already aligned) and saves the start_cpu() return address again on the stack before storing the frame pointer. The unwinder assumes the last stack frame header is at a certain offset, but the above code breaks that assumption, resulting in the following warning: WARNING: kernel stack frame pointer at ffffffff82e03f40 in swapper:0 has bad value (null) Fix it by checking for the last task stack frame at the aligned offset in addition to the normal unaligned offset. Fixes: `acb4608ad1` ("x86/unwind: Create stack frames for saved syscall registers") Reported-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/9d7b4eb8cf55a7d6002cb738f25c23e7429c99a0.1481904011.git.jpoimboe@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-12-19 11:47:05 +01:00
Dmitry Torokhov	32786fdc95	x86/init: Remove i8042_detect() from platform ops Now that i8042 uses flag in legacy platform data, i8042_detect() is no longer used and can be removed. Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Tested-by: Takashi Iwai <tiwai@suse.de> Acked-by: Marcos Paulo de Souza <marcos.souza.org@gmail.com> Cc: linux-input@vger.kernel.org Link: http://lkml.kernel.org/r/1481317061-31486-4-git-send-email-dmitry.torokhov@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-12-19 11:34:15 +01:00
Dmitry Torokhov	93ffa9a479	x86/init: Add i8042 state to the platform data Add i8042 state to the platform data to help i8042 driver make decision whether to probe for i8042 or not. We recognize 3 states: platform/subarch ca not possible have i8042 (as is the case with Inrel MID platform), firmware (such as ACPI) reports that i8042 is absent from the device, or i8042 may be present and the driver should probe for it. The intent is to allow i8042 driver abort initialization on x86 if PNP data (absence of both keyboard and mouse PNP devices) agrees with firmware data. It will also allow us to remove i8042_detect later. Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Tested-by: Takashi Iwai <tiwai@suse.de> Acked-by: Marcos Paulo de Souza <marcos.souza.org@gmail.com> Cc: linux-input@vger.kernel.org Link: http://lkml.kernel.org/r/1481317061-31486-2-git-send-email-dmitry.torokhov@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-12-19 11:34:15 +01:00
Boris Ostrovsky	2b4c91569a	x86/microcode/AMD: Use native_cpuid() in load_ucode_amd_bsp() When CONFIG_PARAVIRT is selected, cpuid() becomes a call. Since for 32-bit kernels load_ucode_amd_bsp() is executed before paging is enabled the call cannot be completed (as kernel virtual addresses are not reachable yet). Use native_cpuid() instead which is an asm wrapper for the CPUID instruction. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Jürgen Gross <jgross@suse.com> Link: http://lkml.kernel.org/r/1481906392-3847-1-git-send-email-boris.ostrovsky@oracle.com Link: http://lkml.kernel.org/r/20161218164414.9649-5-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-12-19 10:46:20 +01:00
Borislav Petkov	a15a753539	x86/microcode/AMD: Do not load when running on a hypervisor Doing so is completely void of sense for multiple reasons so prevent it. Set dis_ucode_ldr to true and thus disable the microcode loader by default to address xen pv guests which execute the AP path but not the BSP path. By having it turned off by default, the APs won't run into the loader either. Also, check CPUID(1).ECX[31] which hypervisors set. Well almost, not the xen pv one. That one gets the aforementioned "fix". Also, improve the detection method by caching the final decision whether to continue loading in dis_ucode_ldr and do it once on the BSP. The APs then simply test that value. Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Juergen Gross <jgross@suse.com> Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Juergen Gross <jgross@suse.com> Link: http://lkml.kernel.org/r/20161218164414.9649-4-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-12-19 10:46:20 +01:00
Borislav Petkov	200d355316	x86/microcode/AMD: Sanitize apply_microcode_early_amd() Make it simply return bool to denote whether it found a container or not and return the pointer to the container and its size in the handed-in container pointer instead, as returning a struct was just silly. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Jürgen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: http://lkml.kernel.org/r/20161218164414.9649-3-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-12-19 10:46:20 +01:00
Borislav Petkov	8feaa64a9a	x86/microcode/AMD: Make find_proper_container() sane again Fixup signature and retvals, return the container struct through the passed in pointer, not as a function return value. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Jürgen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: http://lkml.kernel.org/r/20161218164414.9649-2-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-12-19 10:46:19 +01:00
Linus Torvalds	f7dd3b1734	Merge branch 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "This is the last functional update from the tip tree for 4.10. It got delayed due to a newly reported and anlyzed variant of BIOS bug and the resulting wreckage: - Seperation of TSC being marked realiable and the fact that the platform provides the TSC frequency via CPUID/MSRs and making use for it for GOLDMONT. - TSC adjust MSR validation and sanitizing: The TSC adjust MSR contains the offset to the hardware counter. The sum of the adjust MSR and the counter is the TSC value which is read via RDTSC. On at least two machines from different vendors the BIOS sets the TSC adjust MSR to negative values. This happens on cold and warm boot. While on cold boot the offset is a few milliseconds, on warm boot it basically compensates the power on time of the system. The BIOSes are not even using the adjust MSR to set all CPUs in the package to the same offset. The offsets are different which renders the TSC unusable, What's worse is that the TSC deadline timer has a HW feature^Wbug. It malfunctions when the TSC adjust value is negative or greater equal 0x80000000 resulting in silent boot failures, hard lockups or non firing timers. This looks like some hardware internal 32/64bit issue with a sign extension problem. Intel has been silent so far on the issue. The update contains sanity checks and keeps the adjust register within working limits and in sync on the package. As it looks like this disease is spreading via BIOS crapware, we need to address this urgently as the boot failures are hard to debug for users" * 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/tsc: Limit the adjust value further x86/tsc: Annotate printouts as firmware bug x86/tsc: Force TSC_ADJUST register to value >= zero x86/tsc: Validate TSC_ADJUST after resume x86/tsc: Validate cpumask pointer before accessing it x86/tsc: Fix broken CONFIG_X86_TSC=n build x86/tsc: Try to adjust TSC if sync test fails x86/tsc: Prepare warp test for TSC adjustment x86/tsc: Move sync cleanup to a safe place x86/tsc: Sync test only for the first cpu in a package x86/tsc: Verify TSC_ADJUST from idle x86/tsc: Store and check TSC ADJUST MSR x86/tsc: Detect random warps x86/tsc: Use X86_FEATURE_TSC_ADJUST in detect_art() x86/tsc: Finalize the split of the TSC_RELIABLE flag x86/tsc: Set TSC_KNOWN_FREQ and TSC_RELIABLE flags on Intel Atom SoCs x86/tsc: Mark Intel ATOM_GOLDMONT TSC reliable x86/tsc: Mark TSC frequency determined by CPUID as known x86/tsc: Add X86_FEATURE_TSC_KNOWN_FREQ flag	2016-12-18 13:59:10 -08:00
Linus Torvalds	1bbb05f520	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes and cleanups from Thomas Gleixner: "This set of updates contains: - Robustification for the logical package managment. Cures the AMD and virtualization issues. - Put the correct start_cpu() return address on the stack of the idle task. - Fixups for the fallout of the nodeid <-> cpuid persistent mapping modifciations - Move the x86/MPX specific mm_struct member to the arch specific mm_context where it belongs - Cleanups for C89 struct initializers and useless function arguments" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/floppy: Use designated initializers x86/mpx: Move bd_addr to mm_context_t x86/mm: Drop unused argument 'removed' from sync_global_pgds() ACPI/NUMA: Do not map pxm to node when NUMA is turned off x86/acpi: Use proper macro for invalid node x86/smpboot: Prevent false positive out of bounds cpumask access warning x86/boot/64: Push correct start_cpu() return address x86/boot/64: Use 'push' instead of 'call' in start_cpu() x86/smpboot: Make logical package management more robust	2016-12-18 11:12:53 -08:00
Thomas Gleixner	8c9b9d87b8	x86/tsc: Limit the adjust value further Adjust value 0x80000000 and other values larger than that render the TSC deadline timer disfunctional. We have not yet any information about this from Intel, but experimentation clearly proves that this is a 32/64 bit and sign extension issue. If adjust values larger than that are actually required, which might be the case for physical CPU hotplug, then we need to disable the deadline timer on the affected package/CPUs and use the local APIC timer instead. That requires some surgery in the APIC setup code, so we just limit the ADJUST register value into the known to work range for now and revisit this when Intel comes forth with proper information. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Roland Scheidegger <rscheidegger_lists@hispeed.ch> Cc: Bruce Schlobohm <bruce.schlobohm@intel.com> Cc: Kevin Stanton <kevin.b.stanton@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de>	2016-12-18 16:37:04 +01:00
Thomas Gleixner	16588f6592	x86/tsc: Annotate printouts as firmware bug Make it more obvious that the BIOS is screwed up. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Roland Scheidegger <rscheidegger_lists@hispeed.ch> Cc: Bruce Schlobohm <bruce.schlobohm@intel.com> Cc: Kevin Stanton <kevin.b.stanton@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de>	2016-12-18 16:35:15 +01:00
Linus Torvalds	de399813b5	powerpc updates for 4.10 Highlights include: - Support for the kexec_file_load() syscall, which is a prereq for secure and trusted boot. - Prevent kernel execution of userspace on P9 Radix (similar to SMEP/PXN). - Sort the exception tables at build time, to save time at boot, and store them as relative offsets to save space in the kernel image & memory. - Allow building the kernel with thin archives, which should allow us to build an allyesconfig once some other fixes land. - Build fixes to allow us to correctly rebuild when changing the kernel endian from big to little or vice versa. - Plumbing so that we can avoid doing a full mm TLB flush on P9 Radix. - Initial stack protector support (-fstack-protector). - Support for dumping the radix (aka. Linux) and hash page tables via debugfs. - Fix an oops in cxl coredump generation when cxl_get_fd() is used. - Freescale updates from Scott: "Highlights include 8xx hugepage support, qbman fixes/cleanup, device tree updates, and some misc cleanup." - Many and varied fixes and minor enhancements as always. Thanks to: Alexey Kardashevskiy, Andrew Donnellan, Aneesh Kumar K.V, Anshuman Khandual, Anton Blanchard, Balbir Singh, Bartlomiej Zolnierkiewicz, Christophe Jaillet, Christophe Leroy, Denis Kirjanov, Elimar Riesebieter, Frederic Barrat, Gautham R. Shenoy, Geliang Tang, Geoff Levand, Jack Miller, Johan Hovold, Lars-Peter Clausen, Libin, Madhavan Srinivasan, Michael Neuling, Nathan Fontenot, Naveen N. Rao, Nicholas Piggin, Pan Xinhui, Peter Senna Tschudin, Rashmica Gupta, Rui Teng, Russell Currey, Scott Wood, Simon Guo, Suraj Jitindar Singh, Thiago Jung Bauermann, Tobias Klauser, Vaibhav Jain. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJYU4YSAAoJEFHr6jzI4aWAC4gQALtIAqqPon0Cd5b/FVVcMbW7 mMqB2b/0FGEl5GoRTzGUDaQqElilm6AEVfHO86C7DFji/a6olneFfw87iz+mtWuZ JvrNq68ZiSnoeszdUy4MgtXFLb5sTzNMev4skaHfjI9E5CepWBoR0zH4G+kNVnd5 WSgudv8Cq4Px+MEuTOigt3QYjHzZ3cw/XNOOm9c+oGj+PDW4O9UItVI+S1WLoey4 rAB2nRcLMDPuwfRQC9XsF3zEbkv4h1dEXo/EBRuRpcF+0lLTzFw1lv1WE8OxlUmS kAXbty3dIytBfSbtJT0c0Ps6sfQ4HFhu6ZV2fjnxNTz2KDkBIN7LBYHmBYiqY9oZ 9zvbUWtfiTu5ocfRtTq7rC/Hcj4Kbr9S9F/FvXR0WyDsKgu4xxAovqC3gcn6YjYK Rr1tcCI4nUzyhVJVmd+OEhUvc5JbFy9aGage+YeOyejfvvSbXIunaxWlPjoDkvim Vjl+UKU8gw51XFssqY5ZBi/HNlMFKYedLpMFp/fItnLglhj50V0eFWkpDgdSCYom vo9ifPLZx8n8m8De3H7TV4E0F4gCHcTeqZdu7tW9AAUVM6iLJcDLm3asGmtNh21t snOHNOJ5QSIno6ezUUg29T6VBjbPh46fdJJSlIZrEe8OzLZ1haGyttf0tD00PQvY Z2W/m3gxafnOeGgBqvyv =xOzf -----END PGP SIGNATURE----- Merge tag 'powerpc-4.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "Highlights include: - Support for the kexec_file_load() syscall, which is a prereq for secure and trusted boot. - Prevent kernel execution of userspace on P9 Radix (similar to SMEP/PXN). - Sort the exception tables at build time, to save time at boot, and store them as relative offsets to save space in the kernel image & memory. - Allow building the kernel with thin archives, which should allow us to build an allyesconfig once some other fixes land. - Build fixes to allow us to correctly rebuild when changing the kernel endian from big to little or vice versa. - Plumbing so that we can avoid doing a full mm TLB flush on P9 Radix. - Initial stack protector support (-fstack-protector). - Support for dumping the radix (aka. Linux) and hash page tables via debugfs. - Fix an oops in cxl coredump generation when cxl_get_fd() is used. - Freescale updates from Scott: "Highlights include 8xx hugepage support, qbman fixes/cleanup, device tree updates, and some misc cleanup." - Many and varied fixes and minor enhancements as always. Thanks to: Alexey Kardashevskiy, Andrew Donnellan, Aneesh Kumar K.V, Anshuman Khandual, Anton Blanchard, Balbir Singh, Bartlomiej Zolnierkiewicz, Christophe Jaillet, Christophe Leroy, Denis Kirjanov, Elimar Riesebieter, Frederic Barrat, Gautham R. Shenoy, Geliang Tang, Geoff Levand, Jack Miller, Johan Hovold, Lars-Peter Clausen, Libin, Madhavan Srinivasan, Michael Neuling, Nathan Fontenot, Naveen N. Rao, Nicholas Piggin, Pan Xinhui, Peter Senna Tschudin, Rashmica Gupta, Rui Teng, Russell Currey, Scott Wood, Simon Guo, Suraj Jitindar Singh, Thiago Jung Bauermann, Tobias Klauser, Vaibhav Jain" [ And thanks to Michael, who took time off from a new baby to get this pull request done. - Linus ] * tag 'powerpc-4.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (174 commits) powerpc/fsl/dts: add FMan node for t1042d4rdb powerpc/fsl/dts: add sg_2500_aqr105_phy4 alias on t1024rdb powerpc/fsl/dts: add QMan and BMan nodes on t1024 powerpc/fsl/dts: add QMan and BMan nodes on t1023 soc/fsl/qman: test: use DEFINE_SPINLOCK() powerpc/fsl-lbc: use DEFINE_SPINLOCK() powerpc/8xx: Implement support of hugepages powerpc: get hugetlbpage handling more generic powerpc: port 64 bits pgtable_cache to 32 bits powerpc/boot: Request no dynamic linker for boot wrapper soc/fsl/bman: Use resource_size instead of computation soc/fsl/qe: use builtin_platform_driver powerpc/fsl_pmc: use builtin_platform_driver powerpc/83xx/suspend: use builtin_platform_driver powerpc/ftrace: Fix the comments for ftrace_modify_code powerpc/perf: macros for power9 format encoding powerpc/perf: power9 raw event format encoding powerpc/perf: update attribute_group data structure powerpc/perf: factor out the event format field powerpc/mm/iommu, vfio/spapr: Put pages on VFIO container shutdown ...	2016-12-16 09:26:42 -08:00
Linus Torvalds	179a7ba680	This release has a few updates: o STM can hook into the function tracer o Function filtering now supports more advance glob matching o Ftrace selftests updates and added tests o Softirq tag in traces now show only softirqs o ARM nop added to non traced locations at compile time o New trace_marker_raw file that allows for binary input o Optimizations to the ring buffer o Removal of kmap in trace_marker o Wakeup and irqsoff tracers now adhere to the set_graph_notrace file o Other various fixes and clean ups Note, there are two patches marked for stable. These were discovered near the end of the 4.9 rc release cycle. By the time I had them tested it was just a matter of days before 4.9 would be released, and I figured I would just submit them in the merge window. They are old bugs and not critical. Nothing non-root could abuse. -----BEGIN PGP SIGNATURE----- iQExBAABCAAbBQJYUrFHFBxyb3N0ZWR0QGdvb2RtaXMub3JnAAoJEMm5BfJq2Y3L 2+AIAIr20kSQV/nA5htGAeCTobVk3WUxY6bvjd9mIJDKPP19akNLyREW0G3KnfCr yhx4aFRZG98fRu/6F8qieRosyN36lADDVYHelMFHMpcTOpE2aZGjaaOuNGxOEA9v FmMPTX+K3+dzKyFP4l68R3+5JuQ1/AqLTioTWeLW8IDQ2OOVsjD8+0BuXrNKMJDY o6U4Hk5U/vn+zHc6BmgBzloAXemBd7iJ1t5V3FRRGvm8yv3HU85Twc5ofGeYTWvB J8PboEywRlIzxg0Kd8mxnMI5PgaKZSEc2ub8E7cY/CZ5PYpDE2xDA2hJmJgfYp00 1VW+DHRpRZfElsCcya6S6P4bs5Y= =MGZ/ -----END PGP SIGNATURE----- Merge tag 'trace-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "This release has a few updates: - STM can hook into the function tracer - Function filtering now supports more advance glob matching - Ftrace selftests updates and added tests - Softirq tag in traces now show only softirqs - ARM nop added to non traced locations at compile time - New trace_marker_raw file that allows for binary input - Optimizations to the ring buffer - Removal of kmap in trace_marker - Wakeup and irqsoff tracers now adhere to the set_graph_notrace file - Other various fixes and clean ups" * tag 'trace-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (42 commits) selftests: ftrace: Shift down default message verbosity kprobes/trace: Fix kprobe selftest for newer gcc tracing/kprobes: Add a helper method to return number of probe hits tracing/rb: Init the CPU mask on allocation tracing: Use SOFTIRQ_OFFSET for softirq dectection for more accurate results tracing/fgraph: Have wakeup and irqsoff tracers ignore graph functions too fgraph: Handle a case where a tracer ignores set_graph_notrace tracing: Replace kmap with copy_from_user() in trace_marker writing ftrace/x86_32: Set ftrace_stub to weak to prevent gcc from using short jumps to it tracing: Allow benchmark to be enabled at early_initcall() tracing: Have system enable return error if one of the events fail tracing: Do not start benchmark on boot up tracing: Have the reg function allow to fail ring-buffer: Force rb_end_commit() and rb_set_commit_to_write() inline ring-buffer: Froce rb_update_write_stamp() to be inlined ring-buffer: Force inline of hotpath helper functions tracing: Make __buffer_unlock_commit() always_inline tracing: Make tracepoint_printk a static_key ring-buffer: Always inline rb_event_data() ring-buffer: Make rb_reserve_next_event() always inlined ...	2016-12-15 13:49:34 -08:00
Thomas Gleixner	5bae156241	x86/tsc: Force TSC_ADJUST register to value >= zero Roland reported that his DELL T5810 sports a value add BIOS which completely wreckages the TSC. The squirmware [(TM) Ingo Molnar] boots with random negative TSC_ADJUST values, different on all CPUs. That renders the TSC useless because the sycnchronization check fails. Roland tested the new TSC_ADJUST mechanism. While it manages to readjust the TSCs he needs to disable the TSC deadline timer, otherwise the machine just stops booting. Deeper investigation unearthed that the TSC deadline timer is sensitive to the TSC_ADJUST value. Writing TSC_ADJUST to a negative value results in an interrupt storm caused by the TSC deadline timer. This does not make any sense and it's hard to imagine what kind of hardware wreckage is behind that misfeature, but it's reliably reproducible on other systems which have TSC_ADJUST and TSC deadline timer. While it would be understandable that a big enough negative value which moves the resulting TSC readout into the negative space could have the described effect, this happens even with a adjust value of -1, which keeps the TSC readout definitely in the positive space. The compare register for the TSC deadline timer is set to a positive value larger than the TSC, but despite not having reached the deadline the interrupt is raised immediately. If this happens on the boot CPU, then the machine dies silently because this setup happens before the NMI watchdog is armed. Further experiments showed that any other adjustment of TSC_ADJUST works as expected as long as it stays in the positive range. The direction of the adjustment has no influence either. See the lkml link for further analysis. Yet another proof for the theory that timers are designed by janitors and the underlying (obviously undocumented) mechanisms which allow BIOSes to wreckage them are considered a feature. Well done Intel - NOT! To address this wreckage add the following sanity measures: - If the TSC_ADJUST value on the boot cpu is not 0, set it to 0 - If the TSC_ADJUST value on any cpu is negative, set it to 0 - Prevent the cross package synchronization mechanism from setting negative TSC_ADJUST values. Reported-and-tested-by: Roland Scheidegger <rscheidegger_lists@hispeed.ch> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Bruce Schlobohm <bruce.schlobohm@intel.com> Cc: Kevin Stanton <kevin.b.stanton@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Allen Hung <allen_hung@dell.com> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/20161213131211.397588033@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-12-15 11:44:29 +01:00
Thomas Gleixner	6a36958317	x86/tsc: Validate TSC_ADJUST after resume Some 'feature' BIOSes fiddle with the TSC_ADJUST register during suspend/resume which renders the TSC unusable. Add sanity checks into the resume path and restore the original value if it was adjusted. Reported-and-tested-by: Roland Scheidegger <rscheidegger_lists@hispeed.ch> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Bruce Schlobohm <bruce.schlobohm@intel.com> Cc: Kevin Stanton <kevin.b.stanton@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Allen Hung <allen_hung@dell.com> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/20161213131211.317654500@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-12-15 11:44:29 +01:00
Boris Ostrovsky	4370a3ef39	x86/acpi: Use proper macro for invalid node Use NUMA_NO_NODE instead of -1. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: len.brown@intel.com Cc: rjw@rjwysocki.net Cc: linux-acpi@vger.kernel.org Cc: pavel@ucw.cz Link: http://lkml.kernel.org/r/1481570993-13941-1-git-send-email-boris.ostrovsky@oracle.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-12-15 11:32:32 +01:00
Thomas Gleixner	427d77a323	x86/smpboot: Prevent false positive out of bounds cpumask access warning prefill_possible_map() reinitializes the cpu_possible_map by setting the possible cpu bits and clearing all other bits up to NR_CPUS. This is technically always correct because cpu_possible_map is statically allocated and sized NR_CPUS. With CPUMASK_OFFSTACK and DEBUG_PER_CPU_MAPS enabled the bounds check of cpu masks happens on nr_cpu_ids. nr_cpu_ids is initialized to NR_CPUS and only limited after the set/clear bit loops have been executed. But if the system was booted with "nr_cpus=N" on the command line, where N is < NR_CPUS then nr_cpu_ids is limited in the parameter parsing function before prefill_possible_map() is invoked. As a consequence the cpumask bounds check triggers when clearing the bits past nr_cpu_ids. Add a helper which allows to reset cpu_possible_map w/o the bounds check and then set only the possible bits which are well inside bounds. Reported-by: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: 0x7f454c46@gmail.com Cc: Jan Beulich <JBeulich@novell.com> Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1612131836050.3415@nanos Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-12-15 11:32:31 +01:00
Baoquan He	401721ecd1	kexec: export the value of phys_base instead of symbol address Currently in x86_64, the symbol address of phys_base is exported to vmcoreinfo. Dave Anderson complained this is really useless for his Crash implementation. Because in user-space utility Crash and Makedumpfile which exported vmcore information is mainly used for, value of phys_base is needed to covert virtual address of exported kernel symbol to physical address. Especially init_level4_pgt, if we want to access and go over the page table to look up a PA corresponding to VA, firstly we need calculate page_dir = SYMBOL(init_level4_pgt) - __START_KERNEL_map + phys_base; Now in Crash and Makedumpfile, we have to analyze the vmcore elf program header to get value of phys_base. As Dave said, it would be preferable if it were readily availabl in vmcoreinfo rather than depending upon the PT_LOAD semantics. Hence in this patch change to export the value of phys_base instead of its virtual address. And people also complained that KERNEL_IMAGE_SIZE exporting is x86_64 only, should be moved into arch dependent function arch_crash_save_vmcoreinfo. Do the moving in this patch. Link: http://lkml.kernel.org/r/1478568596-30060-2-git-send-email-bhe@redhat.com Signed-off-by: Baoquan He <bhe@redhat.com> Cc: Thomas Garnier <thgarnie@google.com> Cc: Baoquan He <bhe@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Xunlei Pang <xlpang@redhat.com> Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Cc: Kees Cook <keescook@chromium.org> Cc: Eugene Surovegin <surovegin@google.com> Cc: Dave Young <dyoung@redhat.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Atsushi Kumagai <ats-kumagai@wm.jp.nec.com> Cc: Dave Anderson <anderson@redhat.com> Cc: Pratyush Anand <panand@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-12-14 16:04:07 -08:00
Baoquan He	69f5838479	Revert "kdump, vmcoreinfo: report memory sections virtual addresses" This reverts commit `0549a3c02e` ("kdump, vmcoreinfo: report memory sections virtual addresses"). Commit `0549a3c02e` tells the userspace utility makedumpfile the randomized base address of these memmory sections when mm kaslr is enabled. However the following patch "kexec: export the value of phys_base instead of symbol address" makes makedumpfile not need these addresses any more. Besides we should use VMCOREINFO_NUMBER to export the value of the variable so that we can use the existing number_table mechanism of Makedumpfile to fetch it. So revert it now. If needed we can add it later. http://lists.infradead.org/pipermail/kexec/2016-October/017540.html Link: http://lkml.kernel.org/r/1478568596-30060-1-git-send-email-bhe@redhat.com Signed-off-by: Baoquan He <bhe@redhat.com> Cc: Thomas Garnier <thgarnie@google.com> Cc: Baoquan He <bhe@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Xunlei Pang <xlpang@redhat.com> Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Cc: Kees Cook <keescook@chromium.org> Cc: Eugene Surovegin <surovegin@google.com> Cc: Dave Young <dyoung@redhat.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Atsushi Kumagai <ats-kumagai@wm.jp.nec.com> Cc: Dave Anderson <anderson@redhat.com> Cc: Pratyush Anand <panand@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-12-14 16:04:07 -08:00
Josh Poimboeuf	31dcfec11f	x86/boot/64: Push correct start_cpu() return address start_cpu() pushes a text address on the stack so that stack traces from idle tasks will show start_cpu() at the end. But it currently shows the wrong function offset. It's more correct to show the address immediately after the 'lretq' instruction. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/2cadd9f16c77da7ee7957bfc5e1c67928c23ca48.1481685203.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-12-14 08:48:05 +01:00
Josh Poimboeuf	ec2d86a9b6	x86/boot/64: Use 'push' instead of 'call' in start_cpu() start_cpu() pushes a text address on the stack so that stack traces from idle tasks will show start_cpu() at the end. But it uses a call instruction to do that, which is rather obtuse. Use a straightforward push instead. Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/4d8a1952759721d42d1e62ba9e4a7e3ac5df8574.1481685203.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-12-14 08:48:05 +01:00
Linus Torvalds	c11a6cfb01	Merge branch 'for-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue updates from Tejun Heo: "Mostly patches to initialize workqueue subsystem earlier and get rid of keventd_up(). The patches were headed for the last merge cycle but got delayed due to a bug found late minute, which is fixed now. Also, to help debugging, destroy_workqueue() is more chatty now on a sanity check failure." * 'for-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: move wq_numa_init() to workqueue_init() workqueue: remove keventd_up() debugobj, workqueue: remove keventd_up() usage slab, workqueue: remove keventd_up() usage power, workqueue: remove keventd_up() usage tty, workqueue: remove keventd_up() usage mce, workqueue: remove keventd_up() usage workqueue: make workqueue available early during boot workqueue: dump workqueue state on sanity check failures in destroy_workqueue()	2016-12-13 12:59:57 -08:00
Linus Torvalds	098c30557a	Driver core patches for 4.10-rc1 Here's the new driver core patches for 4.10-rc1. Big thing here is the nice addition of "functional dependencies" to the driver core. The idea has been talked about for a very long time, great job to Rafael for stepping up and implementing it. It's been tested for longer than the 4.9-rc1 date, we held off on merging it earlier in order to feel more comfortable about it. Other than that, it's just a handful of small other patches, some good cleanups to the mess that is the firmware class code, and we have a test driver for the deferred probe logic. All of these have been in linux-next for a while with no reported issues. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWFAvPQ8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ym3NgCgmhFeWEkp9SDt17YGGavmnzQUlBQAoJlUipJp PHeQkq15ZWw3wWC9FEvM =91M1 -----END PGP SIGNATURE----- Merge tag 'driver-core-4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here's the new driver core patches for 4.10-rc1. Big thing here is the nice addition of "functional dependencies" to the driver core. The idea has been talked about for a very long time, great job to Rafael for stepping up and implementing it. It's been tested for longer than the 4.9-rc1 date, we held off on merging it earlier in order to feel more comfortable about it. Other than that, it's just a handful of small other patches, some good cleanups to the mess that is the firmware class code, and we have a test driver for the deferred probe logic. All of these have been in linux-next for a while with no reported issues" * tag 'driver-core-4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (30 commits) firmware: Correct handling of fw_state_wait() return value driver core: Silence device links sphinx warning firmware: remove warning at documentation generation time drivers: base: dma-mapping: Fix typo in dmam_alloc_non_coherent comments driver core: test_async: fix up typo found by 0-day firmware: move fw_state_is_done() into UHM section firmware: do not use fw_lock for fw_state protection firmware: drop bit ops in favor of simple state machine firmware: refactor loading status firmware: fix usermode helper fallback loading driver core: firmware_class: convert to use class_groups driver core: devcoredump: convert to use class_groups driver core: class: add class_groups support kernfs: Declare two local data structures static driver-core: fix platform_no_drv_owner.cocci warnings drivers/base/memory.c: Remove unused 'first_page' variable driver core: add CLASS_ATTR_WO() drivers: base: cacheinfo: support DT overrides for cache properties drivers: base: cacheinfo: add pr_fmt logging drivers: base: cacheinfo: fix boot error message when acpi is enabled ...	2016-12-13 11:42:18 -08:00
Linus Torvalds	a67485d4bf	ACPI material for v4.10-rc1 - ACPICA update including upstream revision 20160930 and several commits beyond it (Bob Moore, Lv Zheng). - Initial support for ACPI APEI on ARM64 (Tomasz Nowicki). - New document describing the handling of _OSI and _REV in Linux (Len Brown). - New document describing the usage rules for _DSD properties (Rafael Wysocki). - Update of the ACPI properties-parsing code to reflect recent changes in the (external) documentation it is based on (Rafael Wysocki). - Updates of the ACPI LPSS and ACPI APD SoC drivers for additional hardware support (Andy Shevchenko, Nehal Shah). - New blacklist entries for _REV and video handling (Alex Hung, Hans de Goede, Michael Pobega). - ACPI battery driver fix to fall back to _BIF if _BIX fails (Dave Lambley). - NMI notifications handling fix for APEI (Prarit Bhargava). - Error code path fix for the ACPI CPPC library (Dan Carpenter). - Assorted cleanups (Andy Shevchenko, Longpeng Mike). -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJYTx6WAAoJEILEb/54YlRxFksP/0oZUm4dxHJFT6ED1ogBLid6 o+T7PA46i7VpyyT64tq3YcBqccFAYq9jHvK0FasK6WA3GKF+fj8cc5FsFM0lfdlw pMFfkdVTVajzFAM1QcxxeNr+TNuAGhx1ENf3us4xOP1Nt++kESBMwA112emoqEJL kzb2M3sCWyHNUxLtbis5CpYXLNFifFf8PP+LgmfRk0u2EYYW2nOShd6A7w5USmDh cYsfKcrBHs+nmNh6uZrQbGg+6zTcQT7XORyqcIsgT2JoWooVfwOrBjgLymFvuLUc ShZ1dHqR+RwIu1ZTIWImpDcBz/dALGIDuGAxad1YRhx7N7Eg4jmmht3hASYKWabG lqU4PWMBERonIW0MCFJ7Pg8+Ny7+kAF/rZjDyw09P2DGGQjsG4aJGAdoG5Dtjidc 1W+OAJC6SY494U+r/kHnsR0/JWTX24H7sVP5IBCFxHkByhe5daSngtknrYzIV4kE dV4h8JJATrSyvdgwAEHmVSpTCR0tmFvsc5J87Mg/g/b6NM3tPVxb70eE9tRr4xw1 oW0X9YI9M8NFnRP6RbCVg6uO06xDD2SMfb0e8fiiAp+/eGGyjp1PVR9SreuUdqaJ XJwntAWxKOXBPXMRuCeOuXBUNe5mT+WkMF6AuQyfBoM7rIhkqJb328buVAsyAKBx 74gsPkkeA6/Z1n7HWUFn =Nzrb -----END PGP SIGNATURE----- Merge tag 'acpi-4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI updates from Rafael Wysocki: "The ACPICA code in the kernel gets updated as usual (included is upstream revision 20160930 and a few commits from the next one, with the rest waiting for an issue discovered in linux-next to be addressed) which brings in a couple of fixes and cleanups On top of that initial support for APEI on ARM64 is added, two new pieces of documentation are introduced, the properties-parsing code is updated to follow changes in the (external) documentation it is based on and there are a few updates of SoC drivers, some new blacklist entries, plus some assorted fixes and cleanups Specifics: - ACPICA update including upstream revision 20160930 and several commits beyond it (Bob Moore, Lv Zheng) - Initial support for ACPI APEI on ARM64 (Tomasz Nowicki) - New document describing the handling of _OSI and _REV in Linux (Len Brown) - New document describing the usage rules for _DSD properties (Rafael Wysocki) - Update of the ACPI properties-parsing code to reflect recent changes in the (external) documentation it is based on (Rafael Wysocki) - Updates of the ACPI LPSS and ACPI APD SoC drivers for additional hardware support (Andy Shevchenko, Nehal Shah) - New blacklist entries for _REV and video handling (Alex Hung, Hans de Goede, Michael Pobega) - ACPI battery driver fix to fall back to _BIF if _BIX fails (Dave Lambley) - NMI notifications handling fix for APEI (Prarit Bhargava) - Error code path fix for the ACPI CPPC library (Dan Carpenter) - Assorted cleanups (Andy Shevchenko, Longpeng Mike)" * tag 'acpi-4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (31 commits) ACPICA: Utilities: Add new decode function for parser values ACPI / osl: Refactor acpi_os_get_root_pointer() to drop 'else':s ACPI / osl: Propagate actual error code for kstrtoul() ACPI / property: Document usage rules for _DSD properties ACPI: Document _OSI and _REV for Linux BIOS writers ACPI / APEI / ARM64: APEI initial support for ARM64 ACPI / APEI: Fix NMI notification handling ACPICA: Tables: Add an error message complaining driver bugs ACPICA: Tables: Add acpi_tb_unload_table() ACPICA: Tables: Cleanup acpi_tb_install_and_load_table() ACPICA: Events: Fix acpi_ev_initialize_region() return value ACPICA: Back port of "ACPICA: Dispatcher: Tune interpreter lock around AcpiEvInitializeRegion()" ACPICA: Namespace: Add acpi_ns_handle_to_name() ACPI / CPPC: set an error code on probe error path ACPI / video: Add force_native quirk for HP Pavilion dv6 ACPI / video: Add force_native quirk for Dell XPS 17 L702X ACPI / property: Hierarchical properties support update ACPI / LPSS: enable hard LLP for DMA ACPI / battery: If _BIX fails, retry with _BIF ACPI / video: Move ACPI_VIDEO_NOTIFY_* defines to acpi/video.h ..	2016-12-13 11:06:21 -08:00
Linus Torvalds	7b9dc3f75f	Power management material for v4.10-rc1 - New cpufreq driver for Broadcom STB SoCs and a Device Tree binding for it (Markus Mayer). - Support for ARM Integrator/AP and Integrator/CP in the generic DT cpufreq driver and elimination of the old Integrator cpufreq driver (Linus Walleij). - Support for the zx296718, r8a7743 and r8a7745, Socionext UniPhier, and PXA SoCs in the the generic DT cpufreq driver (Baoyou Xie, Geert Uytterhoeven, Masahiro Yamada, Robert Jarzmik). - cpufreq core fix to eliminate races that may lead to using inactive policy objects and related cleanups (Rafael Wysocki). - cpufreq schedutil governor update to make it use SCHED_FIFO kernel threads (instead of regular workqueues) for doing delayed work (to reduce the response latency in some cases) and related cleanups (Viresh Kumar). - New cpufreq sysfs attribute for resetting statistics (Markus Mayer). - cpufreq governors fixes and cleanups (Chen Yu, Stratos Karafotis, Viresh Kumar). - Support for using generic cpufreq governors in the intel_pstate driver (Rafael Wysocki). - Support for per-logical-CPU P-state limits and the EPP/EPB (Energy Performance Preference/Energy Performance Bias) knobs in the intel_pstate driver (Srinivas Pandruvada). - New CPU ID for Knights Mill in intel_pstate (Piotr Luc). - intel_pstate driver modification to use the P-state selection algorithm based on CPU load on platforms with the system profile in the ACPI tables set to "mobile" (Srinivas Pandruvada). - intel_pstate driver cleanups (Arnd Bergmann, Rafael Wysocki, Srinivas Pandruvada). - cpufreq powernv driver updates including fast switching support (for the schedutil governor), fixes and cleanus (Akshay Adiga, Andrew Donnellan, Denis Kirjanov). - acpi-cpufreq driver rework to switch it over to the new CPU offline/online state machine (Sebastian Andrzej Siewior). - Assorted cleanups in cpufreq drivers (Wei Yongjun, Prashanth Prakash). - Idle injection rework (to make it use the regular idle path instead of a home-grown custom one) and related powerclamp thermal driver updates (Peter Zijlstra, Jacob Pan, Petr Mladek, Sebastian Andrzej Siewior). - New CPU IDs for Atom Z34xx and Knights Mill in intel_idle (Andy Shevchenko, Piotr Luc). - intel_idle driver cleanups and switch over to using the new CPU offline/online state machine (Anna-Maria Gleixner, Sebastian Andrzej Siewior). - cpuidle DT driver update to support suspend-to-idle properly (Sudeep Holla). - cpuidle core cleanups and misc updates (Daniel Lezcano, Pan Bian, Rafael Wysocki). - Preliminary support for power domains including CPUs in the generic power domains (genpd) framework and related DT bindings (Lina Iyer). - Assorted fixes and cleanups in the generic power domains (genpd) framework (Colin Ian King, Dan Carpenter, Geert Uytterhoeven). - Preliminary support for devices with multiple voltage regulators and related fixes and cleanups in the Operating Performance Points (OPP) library (Viresh Kumar, Masahiro Yamada, Stephen Boyd). - System sleep state selection interface rework to make it easier to support suspend-to-idle as the default system suspend method (Rafael Wysocki). - PM core fixes and cleanups, mostly related to the interactions between the system suspend and runtime PM frameworks (Ulf Hansson, Sahitya Tummala, Tony Lindgren). - Latency tolerance PM QoS framework imorovements (Andrew Lutomirski). - New Knights Mill CPU ID for the Intel RAPL power capping driver (Piotr Luc). - Intel RAPL power capping driver fixes, cleanups and switch over to using the new CPU offline/online state machine (Jacob Pan, Thomas Gleixner, Sebastian Andrzej Siewior). - Fixes and cleanups in the exynos-ppmu, exynos-nocp, rk3399_dmc, rockchip-dfi devfreq drivers and the devfreq core (Axel Lin, Chanwoo Choi, Javier Martinez Canillas, MyungJoo Ham, Viresh Kumar). - Fix for false-positive KASAN warnings during resume from ACPI S3 (suspend-to-RAM) on x86 (Josh Poimboeuf). - Memory map verification during resume from hibernation on x86 to ensure a consistent address space layout (Chen Yu). - Wakeup sources debugging enhancement (Xing Wei). - rockchip-io AVS driver cleanup (Shawn Lin). -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJYTx4+AAoJEILEb/54YlRx9f8P/2SlNHUENW5qh6FtCw00oC2u UqJerQJ2L38UgbgxbE/0VYblma9rFABDWC1eO2xN2XdcdW5UPBKPVvNcOgNe1Clh gjy3RxZXVpmjfzt2kGfsTLEuGnHqwvx51hTUkeA2LwvkOal45xb8ZESmy8opCtiv iG4LwmPHoxdX5Za5nA9ItFKzxyO1EoyNSnBYAVwALDHxmNOfxEcRevfurASt/0M9 brCCZJA0/sZxeL0lBdy8fNQPIBTUfCoTJG/MtmzGrObJ9wMFvEDfXrVEyZiWs/zA AAZ4kQL77enrIKgrLN8e0G6LzTLHoVcvn38Xjf24dKUqhd7ACBhYcnW+jK3+7EAd gjZ8efObQsiuyK/EDLUNw35tt96CHOqfrQCj2tIwRVvk9EekLqAGXdIndTCr2kYW RpefmP5kMljnm/nQFOVLwMEUQMuVkvUE7EgxADy7DoDmepBFC4ICRDWPye70R2kC 0O1Tn2PAQq4Fd1tyI9TYYz0YQQkRoaRb5rfYUSzbRbeCdsphUopp4Vhsiyn6IcnF XnLbg6pRAat82MoS9n4pfO/VCo8vkErKA8tut9G7TDakkrJoEE7l31PdKW0hP3f6 sBo6xXy6WTeivU/o/i8TbM6K4mA37pBaj78ooIkWLgg5fzRaS2+0xSPVy2H9x1m5 LymHcobCK9rSZ1l208Fe =vhxI -----END PGP SIGNATURE----- Merge tag 'pm-4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "Again, cpufreq gets more changes than the other parts this time (one new driver, one old driver less, a bunch of enhancements of the existing code, new CPU IDs, fixes, cleanups) There also are some changes in cpuidle (idle injection rework, a couple of new CPU IDs, online/offline rework in intel_idle, fixes and cleanups), in the generic power domains framework (mostly related to supporting power domains containing CPUs), and in the Operating Performance Points (OPP) library (mostly related to supporting devices with multiple voltage regulators) In addition to that, the system sleep state selection interface is modified to make it easier for distributions with unchanged user space to support suspend-to-idle as the default system suspend method, some issues are fixed in the PM core, the latency tolerance PM QoS framework is improved a bit, the Intel RAPL power capping driver is cleaned up and there are some fixes and cleanups in the devfreq subsystem Specifics: - New cpufreq driver for Broadcom STB SoCs and a Device Tree binding for it (Markus Mayer) - Support for ARM Integrator/AP and Integrator/CP in the generic DT cpufreq driver and elimination of the old Integrator cpufreq driver (Linus Walleij) - Support for the zx296718, r8a7743 and r8a7745, Socionext UniPhier, and PXA SoCs in the the generic DT cpufreq driver (Baoyou Xie, Geert Uytterhoeven, Masahiro Yamada, Robert Jarzmik) - cpufreq core fix to eliminate races that may lead to using inactive policy objects and related cleanups (Rafael Wysocki) - cpufreq schedutil governor update to make it use SCHED_FIFO kernel threads (instead of regular workqueues) for doing delayed work (to reduce the response latency in some cases) and related cleanups (Viresh Kumar) - New cpufreq sysfs attribute for resetting statistics (Markus Mayer) - cpufreq governors fixes and cleanups (Chen Yu, Stratos Karafotis, Viresh Kumar) - Support for using generic cpufreq governors in the intel_pstate driver (Rafael Wysocki) - Support for per-logical-CPU P-state limits and the EPP/EPB (Energy Performance Preference/Energy Performance Bias) knobs in the intel_pstate driver (Srinivas Pandruvada) - New CPU ID for Knights Mill in intel_pstate (Piotr Luc) - intel_pstate driver modification to use the P-state selection algorithm based on CPU load on platforms with the system profile in the ACPI tables set to "mobile" (Srinivas Pandruvada) - intel_pstate driver cleanups (Arnd Bergmann, Rafael Wysocki, Srinivas Pandruvada) - cpufreq powernv driver updates including fast switching support (for the schedutil governor), fixes and cleanus (Akshay Adiga, Andrew Donnellan, Denis Kirjanov) - acpi-cpufreq driver rework to switch it over to the new CPU offline/online state machine (Sebastian Andrzej Siewior) - Assorted cleanups in cpufreq drivers (Wei Yongjun, Prashanth Prakash) - Idle injection rework (to make it use the regular idle path instead of a home-grown custom one) and related powerclamp thermal driver updates (Peter Zijlstra, Jacob Pan, Petr Mladek, Sebastian Andrzej Siewior) - New CPU IDs for Atom Z34xx and Knights Mill in intel_idle (Andy Shevchenko, Piotr Luc) - intel_idle driver cleanups and switch over to using the new CPU offline/online state machine (Anna-Maria Gleixner, Sebastian Andrzej Siewior) - cpuidle DT driver update to support suspend-to-idle properly (Sudeep Holla) - cpuidle core cleanups and misc updates (Daniel Lezcano, Pan Bian, Rafael Wysocki) - Preliminary support for power domains including CPUs in the generic power domains (genpd) framework and related DT bindings (Lina Iyer) - Assorted fixes and cleanups in the generic power domains (genpd) framework (Colin Ian King, Dan Carpenter, Geert Uytterhoeven) - Preliminary support for devices with multiple voltage regulators and related fixes and cleanups in the Operating Performance Points (OPP) library (Viresh Kumar, Masahiro Yamada, Stephen Boyd) - System sleep state selection interface rework to make it easier to support suspend-to-idle as the default system suspend method (Rafael Wysocki) - PM core fixes and cleanups, mostly related to the interactions between the system suspend and runtime PM frameworks (Ulf Hansson, Sahitya Tummala, Tony Lindgren) - Latency tolerance PM QoS framework imorovements (Andrew Lutomirski) - New Knights Mill CPU ID for the Intel RAPL power capping driver (Piotr Luc) - Intel RAPL power capping driver fixes, cleanups and switch over to using the new CPU offline/online state machine (Jacob Pan, Thomas Gleixner, Sebastian Andrzej Siewior) - Fixes and cleanups in the exynos-ppmu, exynos-nocp, rk3399_dmc, rockchip-dfi devfreq drivers and the devfreq core (Axel Lin, Chanwoo Choi, Javier Martinez Canillas, MyungJoo Ham, Viresh Kumar) - Fix for false-positive KASAN warnings during resume from ACPI S3 (suspend-to-RAM) on x86 (Josh Poimboeuf) - Memory map verification during resume from hibernation on x86 to ensure a consistent address space layout (Chen Yu) - Wakeup sources debugging enhancement (Xing Wei) - rockchip-io AVS driver cleanup (Shawn Lin)" * tag 'pm-4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (127 commits) devfreq: rk3399_dmc: Don't use OPP structures outside of RCU locks devfreq: rk3399_dmc: Remove dangling rcu_read_unlock() devfreq: exynos: Don't use OPP structures outside of RCU locks Documentation: intel_pstate: Document HWP energy/performance hints cpufreq: intel_pstate: Support for energy performance hints with HWP cpufreq: intel_pstate: Add locking around HWP requests PM / sleep: Print active wakeup sources when blocking on wakeup_count reads PM / core: Fix bug in the error handling of async suspend PM / wakeirq: Fix dedicated wakeirq for drivers not using autosuspend PM / Domains: Fix compatible for domain idle state PM / OPP: Don't WARN on multiple calls to dev_pm_opp_set_regulators() PM / OPP: Allow platform specific custom set_opp() callbacks PM / OPP: Separate out _generic_set_opp() PM / OPP: Add infrastructure to manage multiple regulators PM / OPP: Pass struct dev_pm_opp_supply to _set_opp_voltage() PM / OPP: Manage supply's voltage/current in a separate structure PM / OPP: Don't use OPP structure outside of rcu protected section PM / OPP: Reword binding supporting multiple regulators per device PM / OPP: Fix incorrect cpu-supply property in binding cpuidle: Add a kerneldoc comment to cpuidle_use_deepest_state() ..	2016-12-13 10:41:53 -08:00
Thomas Gleixner	9d85eb9119	x86/smpboot: Make logical package management more robust The logical package management has several issues: - The APIC ids provided by ACPI are not required to be the same as the initial APIC id which can be retrieved by CPUID. The APIC ids provided by ACPI are those which are written by the BIOS into the APIC. The initial id is set by hardware and can not be changed. The hardware provided ids contain the real hardware package information. Especially AMD sets the effective APIC id different from the hardware id as they need to reserve space for the IOAPIC ids starting at id 0. As a consequence those machines trigger the currently active firmware bug printouts in dmesg, These are obviously wrong. - Virtual machines have their own interesting of enumerating APICs and packages which are not reliably covered by the current implementation. The sizing of the mapping array has been tweaked to be generously large to handle systems which provide a wrong core count when HT is disabled so the whole magic which checks for space in the physical hotplug case is not needed anymore. Simplify the whole machinery and do the mapping when the CPU starts and the CPUID derived physical package information is available. This solves the observed problems on AMD machines and works for the virtualization issues as well. Remove the extra call from XEN cpu bringup code as it is not longer required. Fixes: `d49597fd3b` ("x86/cpu: Deal with broken firmware (VMWare/XEN)") Reported-and-tested-by: Borislav Petkov <bp@suse.de> Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Juergen Gross <jgross@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: M. Vefa Bicakci <m.v.b@runbox.com> Cc: xen-devel <xen-devel@lists.xen.org> Cc: Charles (Chas) Williams <ciwillia@brocade.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Alok Kataria <akataria@vmware.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1612121102260.3429@nanos Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-12-13 10:22:39 +01:00
Linus Torvalds	e34bac726d	Merge branch 'akpm' (patches from Andrew) Merge updates from Andrew Morton: - various misc bits - most of MM (quite a lot of MM material is awaiting the merge of linux-next dependencies) - kasan - printk updates - procfs updates - MAINTAINERS - /lib updates - checkpatch updates * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (123 commits) init: reduce rootwait polling interval time to 5ms binfmt_elf: use vmalloc() for allocation of vma_filesz checkpatch: don't emit unified-diff error for rename-only patches checkpatch: don't check c99 types like uint8_t under tools checkpatch: avoid multiple line dereferences checkpatch: don't check .pl files, improve absolute path commit log test scripts/checkpatch.pl: fix spelling checkpatch: don't try to get maintained status when --no-tree is given lib/ida: document locking requirements a bit better lib/rbtree.c: fix typo in comment of ____rb_erase_color lib/Kconfig.debug: make CONFIG_STRICT_DEVMEM depend on CONFIG_DEVMEM MAINTAINERS: add drm and drm/i915 irc channels MAINTAINERS: add "C:" for URI for chat where developers hang out MAINTAINERS: add drm and drm/i915 bug filing info MAINTAINERS: add "B:" for URI where to file bugs get_maintainer: look for arbitrary letter prefixes in sections printk: add Kconfig option to set default console loglevel printk/sound: handle more message headers printk/btrfs: handle more message headers printk/kdb: handle more message headers ...	2016-12-12 20:50:02 -08:00
Linus Torvalds	9465d9cc31	Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "The time/timekeeping/timer folks deliver with this update: - Fix a reintroduced signed/unsigned issue and cleanup the whole signed/unsigned mess in the timekeeping core so this wont happen accidentaly again. - Add a new trace clock based on boot time - Prevent injection of random sleep times when PM tracing abuses the RTC for storage - Make posix timers configurable for real tiny systems - Add tracepoints for the alarm timer subsystem so timer based suspend wakeups can be instrumented - The usual pile of fixes and updates to core and drivers" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) timekeeping: Use mul_u64_u32_shr() instead of open coding it timekeeping: Get rid of pointless typecasts timekeeping: Make the conversion call chain consistently unsigned timekeeping_Force_unsigned_clocksource_to_nanoseconds_conversion alarmtimer: Add tracepoints for alarm timers trace: Update documentation for mono, mono_raw and boot clock trace: Add an option for boot clock as trace clock timekeeping: Add a fast and NMI safe boot clock timekeeping/clocksource_cyc2ns: Document intended range limitation timekeeping: Ignore the bogus sleep time if pm_trace is enabled selftests/timers: Fix spelling mistake "Asyncrhonous" -> "Asynchronous" clocksource/drivers/bcm2835_timer: Unmap region obtained by of_iomap clocksource/drivers/arm_arch_timer: Map frame with of_io_request_and_map() arm64: dts: rockchip: Arch counter doesn't tick in system suspend clocksource/drivers/arm_arch_timer: Don't assume clock runs in suspend posix-timers: Make them configurable posix_cpu_timers: Move the add_device_randomness() call to a proper place timer: Move sys_alarm from timer.c to itimer.c ptp_clock: Allow for it to be optional Kconfig: Regenerate *.c_shipped files after previous changes ...	2016-12-12 19:56:15 -08:00
Linus Torvalds	e71c3978d6	Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull smp hotplug updates from Thomas Gleixner: "This is the final round of converting the notifier mess to the state machine. The removal of the notifiers and the related infrastructure will happen around rc1, as there are conversions outstanding in other trees. The whole exercise removed about 2000 lines of code in total and in course of the conversion several dozen bugs got fixed. The new mechanism allows to test almost every hotplug step standalone, so usage sites can exercise all transitions extensively. There is more room for improvement, like integrating all the pointlessly different architecture mechanisms of synchronizing, setting cpus online etc into the core code" * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits) tracing/rb: Init the CPU mask on allocation soc/fsl/qbman: Convert to hotplug state machine soc/fsl/qbman: Convert to hotplug state machine zram: Convert to hotplug state machine KVM/PPC/Book3S HV: Convert to hotplug state machine arm64/cpuinfo: Convert to hotplug state machine arm64/cpuinfo: Make hotplug notifier symmetric mm/compaction: Convert to hotplug state machine iommu/vt-d: Convert to hotplug state machine mm/zswap: Convert pool to hotplug state machine mm/zswap: Convert dst-mem to hotplug state machine mm/zsmalloc: Convert to hotplug state machine mm/vmstat: Convert to hotplug state machine mm/vmstat: Avoid on each online CPU loops mm/vmstat: Drop get_online_cpus() from init_cpu_node_state/vmstat_cpu_dead() tracing/rb: Convert to hotplug state machine oprofile/nmi timer: Convert to hotplug state machine net/iucv: Use explicit clean up labels in iucv_init() x86/pci/amd-bus: Convert to hotplug state machine x86/oprofile/nmi: Convert to hotplug state machine ...	2016-12-12 19:25:04 -08:00
Andrey Ryabinin	8d5341a626	x86/ldt: use vfree_atomic() to free ldt entries vfree() is going to use sleeping lock. free_ldt_struct() may be called with disabled preemption, therefore we must use vfree_atomic() here. E.g. call trace: vfree() free_ldt_struct() destroy_context_ldt() __mmdrop() finish_task_switch() schedule_tail() ret_from_fork() Link: http://lkml.kernel.org/r/1479474236-4139-7-git-send-email-hch@lst.de Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Joel Fernandes <joelaf@google.com> Cc: Jisheng Zhang <jszhang@marvell.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: John Dias <joaodias@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-12-12 18:55:08 -08:00
Reza Arbab	39fa104d5b	mm: remove x86-only restriction of movable_node In commit `c5320926e3` ("mem-hotplug: introduce movable_node boot option"), the memblock allocation direction is changed to bottom-up and then back to top-down like this: 1. memblock_set_bottom_up(true), called by cmdline_parse_movable_node(). 2. memblock_set_bottom_up(false), called by x86's numa_init(). Even though (1) occurs in generic mm code, it is wrapped by #ifdef CONFIG_MOVABLE_NODE, which depends on X86_64. This means that when we extend CONFIG_MOVABLE_NODE to non-x86 arches, things will be unbalanced. (1) will happen for them, but (2) will not. This toggle was added in the first place because x86 has a delay between adding memblocks and marking them as hotpluggable. Since other arches do this marking either immediately or not at all, they do not require the bottom-up toggle. So, resolve things by moving (1) from cmdline_parse_movable_node() to x86's setup_arch(), immediately after the movable_node parameter has been parsed. Link: http://lkml.kernel.org/r/1479160961-25840-3-git-send-email-arbab@linux.vnet.ibm.com Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com> Acked-by: Balbir Singh <bsingharora@gmail.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Alistair Popple <apopple@au1.ibm.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Bharata B Rao <bharata@linux.vnet.ibm.com> Cc: Frank Rowand <frowand.list@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Rob Herring <robh+dt@kernel.org> Cc: Stewart Smith <stewart@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-12-12 18:55:07 -08:00
Linus Torvalds	f797484c26	Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 platform updates from Ingo Molnar: "Two changes: - implement various VMWare guest OS improvements/fixes (Alexey Makhalov) - unexport a spurious export from the intel-mid platform driver (Lukas Wunner)" * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/vmware: Add paravirt sched clock x86/vmware: Add basic paravirt ops support x86/vmware: Use tsc_khz value for calibrate_cpu() x86/platform/intel-mid: Unexport intel_mid_pci_set_power_state() x86/vmware: Read tsc_khz only once at boot time	2016-12-12 15:29:06 -08:00
Linus Torvalds	991bc36254	Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 microcode update from Ingo Molnar: "The biggest change (by Borislav Petkov) is a thorough rewrite of the Intel microcode loader and its interactions with the core code. The biggest conceptual change is the decoupling of the microcode loading on boot and application processors (which load the microcode in different scenarios), so that both parse the input patches with as few assumptions as possible - this also fixes various kernel address space randomization bugs. (The AP side then goes on and caches the result to improve boot performance.) Since the AMD side already did this, this change also opened up the path towards more unification/simplification of the core microcode loading infrastructure: 10 files changed, 647 insertions(+), 940 deletions(-) which speaks for itself" * 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/microcode: Bump driver version, update copyrights x86/microcode: Rework microcode loading x86/microcode/intel: Remove intel_lib.c x86/microcode/amd: Move private inlines to .c and mark local functions static x86/microcode: Collect CPU info on resume x86/microcode: Issue the debug printk on resume only on success x86/microcode/amd: Hand down the CPU family x86/microcode: Export the microcode cache linked list x86/microcode: Remove one #ifdef clause x86/microcode/intel: Simplify generic_load_microcode() x86/microcode: Move driver authors to CREDITS x86/microcode: Run the AP-loading routine only on the application processors	2016-12-12 15:23:02 -08:00
Linus Torvalds	212f30008a	Merge branch 'x86-idle-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 idle updates from Ingo Molnar: "There were two bigger changes in this development cycle: - remove idle notifiers: 32 files changed, 74 insertions(+), 803 deletions(-) These notifiers were of questionable value and the main usecase, the i7300 driver, was essentially unmaintained and can be removed, plus modern power management concepts don't need the callback - so use this golden opportunity and get rid of this opaque and fragile callback from a latency sensitive code path. (Len Brown, Thomas Gleixner) - improve the AMD Erratum 400 workaround that used high overhead MSR polling in the idle loop (Borisla Petkov, Thomas Gleixner)" * 'x86-idle-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Remove empty idle.h header x86/amd: Simplify AMD E400 aware idle routine x86/amd: Check for the C1E bug post ACPI subsystem init x86/bugs: Separate AMD E400 erratum and C1E bug x86/cpufeature: Provide helper to set bugs bits x86/idle: Remove enter_idle(), exit_idle() x86: Remove x86_test_and_clear_bit_percpu() x86/idle: Remove is_idle flag x86/idle: Remove idle_notifier i7300_idle: Remove this driver	2016-12-12 14:55:04 -08:00
Linus Torvalds	6f3be0f043	Merge branch 'x86-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 header fixlet from Ingo Molnar: "Remove unnecessary module.h inclusion from core code (Paul Gortmaker)" * 'x86-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/percpu: Remove unnecessary include of module.h, add asm/desc.h	2016-12-12 14:53:24 -08:00
Linus Torvalds	518bacf5a5	Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 FPU updates from Ingo Molnar: "The main changes in this cycle were: - do a large round of simplifications after all CPUs do 'eager' FPU context switching in v4.9: remove CR0 twiddling, remove leftover eager/lazy bts, etc (Andy Lutomirski) - more FPU code simplifications: remove struct fpu::counter, clarify nomenclature, remove unnecessary arguments/functions and better structure the code (Rik van Riel)" * 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/fpu: Remove clts() x86/fpu: Remove stts() x86/fpu: Handle #NM without FPU emulation as an error x86/fpu, lguest: Remove CR0.TS support x86/fpu, kvm: Remove host CR0.TS manipulation x86/fpu: Remove irq_ts_save() and irq_ts_restore() x86/fpu: Stop saving and restoring CR0.TS in fpu__init_check_bugs() x86/fpu: Get rid of two redundant clts() calls x86/fpu: Finish excising 'eagerfpu' x86/fpu: Split old_fpu & new_fpu handling into separate functions x86/fpu: Remove 'cpu' argument from __cpu_invalidate_fpregs_state() x86/fpu: Split old & new FPU code paths x86/fpu: Remove __fpregs_(de)activate() x86/fpu: Rename lazy restore functions to "register state valid" x86/fpu, kvm: Remove KVM vcpu->fpu_counter x86/fpu: Remove struct fpu::counter x86/fpu: Remove use_eager_fpu() x86/fpu: Remove the XFEATURE_MASK_EAGER/LAZY distinction x86/fpu: Hard-disable lazy FPU mode x86/crypto, x86/fpu: Remove X86_FEATURE_EAGER_FPU #ifdef from the crc32c code	2016-12-12 14:27:49 -08:00
Linus Torvalds	535b2f73f6	Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 CPU updates from Ingo Molnar: "The changes in this development cycle were: - AMD CPU topology enhancements that are cleanups on current CPUs but which enable future Fam17 hardware. (Yazen Ghannam) - unify bugs.c and bugs_64.c (Borislav Petkov) - remove the show_msr= boot option (Borislav Petkov) - simplify a boot message (Borislav Petkov)" * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu/AMD: Clean up cpu_llc_id assignment per topology feature x86/cpu: Get rid of the show_msr= boot option x86/cpu: Merge bugs.c and bugs_64.c x86/cpu: Remove the printk format specifier in "CPU0: "	2016-12-12 14:25:21 -08:00
Linus Torvalds	ef486c599a	Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: "Two cleanups in the LDT handling code, by Dan Carpenter and Thomas Gleixner" * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/ldt: Make all size computations unsigned x86/ldt: Make a size argument unsigned	2016-12-12 14:20:14 -08:00
Linus Torvalds	06cc6b969c	Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 boot updates from Ingo Molnar: "Misc cleanups/simplifications by Borislav Petkov, Paul Bolle and Wei Yang" * 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot/64: Optimize fixmap page fixup x86/boot: Simplify the GDTR calculation assembly code a bit x86/boot/build: Remove always empty $(USERINCLUDE)	2016-12-12 14:13:30 -08:00
Linus Torvalds	5645688f9d	Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 asm updates from Ingo Molnar: "The main changes in this development cycle were: - a large number of call stack dumping/printing improvements: higher robustness, better cross-context dumping, improved output, etc. (Josh Poimboeuf) - vDSO getcpu() performance improvement for future Intel CPUs with the RDPID instruction (Andy Lutomirski) - add two new Intel AVX512 features and the CPUID support infrastructure for it: AVX512IFMA and AVX512VBMI. (Gayatri Kammela, He Chen) - more copy-user unification (Borislav Petkov) - entry code assembly macro simplifications (Alexander Kuleshov) - vDSO C/R support improvements (Dmitry Safonov) - misc fixes and cleanups (Borislav Petkov, Paul Bolle)" * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits) scripts/decode_stacktrace.sh: Fix address line detection on x86 x86/boot/64: Use defines for page size x86/dumpstack: Make stack name tags more comprehensible selftests/x86: Add test_vdso to test getcpu() x86/vdso: Use RDPID in preference to LSL when available x86/dumpstack: Handle NULL stack pointer in show_trace_log_lvl() x86/cpufeatures: Enable new AVX512 cpu features x86/cpuid: Provide get_scattered_cpuid_leaf() x86/cpuid: Cleanup cpuid_regs definitions x86/copy_user: Unify the code by removing the 64-bit asm _copy_*_user() variants x86/unwind: Ensure stack grows down x86/vdso: Set vDSO pointer only after success x86/prctl/uapi: Remove #ifdef for CHECKPOINT_RESTORE x86/unwind: Detect bad stack return address x86/dumpstack: Warn on stack recursion x86/unwind: Warn on bad frame pointer x86/decoder: Use stderr if insn sanity test fails x86/decoder: Use stdout if insn decoder test is successful mm/page_alloc: Remove kernel address exposure in free_reserved_area() x86/dumpstack: Remove raw stack dump ...	2016-12-12 13:49:57 -08:00
Linus Torvalds	4ade5b2268	Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 apic updates from Ingo Molnar: "Misc changes: - optimize (reduce) IRQ handler tracing overhead (Wanpeng Li) - clean up MSR helpers (Borislav Petkov) - fix build warning on some configs (Sebastian Andrzej Siewior)" * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/msr: Cleanup/streamline MSR helpers x86/apic: Prevent tracing on apic_msr_write_eoi() x86/msr: Add wrmsr_notrace() x86/apic: Get rid of "warning: 'acpi_ioapic_lock' defined but not used"	2016-12-12 13:24:04 -08:00
Linus Torvalds	df5f0f0a02	Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 RAS updates from Ingo Molnar: "The main changes in this development cycle were: - more AMD northbridge support work, mostly in preparation for Fam17h CPUs (Yazen Ghannam, Borislav Petkov) - cleanups/refactorings and fixes (Borislav Petkov, Tony Luck, Yinghai Lu)" * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Include the PPIN in MCE records when available x86/mce/AMD: Add system physical address translation for AMD Fam17h x86/amd_nb: Add SMN and Indirect Data Fabric access for AMD Fam17h x86/amd_nb: Add Fam17h Data Fabric as "Northbridge" x86/amd_nb: Make all exports EXPORT_SYMBOL_GPL x86/amd_nb: Make amd_northbridges internal to amd_nb.c x86/mce/AMD: Reset Threshold Limit after logging error x86/mce/AMD: Fix HWID_MCATYPE calculation by grouping arguments x86/MCE: Correct TSC timestamping of error records x86/RAS: Hide SMCA bank names x86/RAS: Rename smca_bank_names to smca_names x86/RAS: Simplify SMCA HWID descriptor struct x86/RAS: Simplify SMCA bank descriptor struct x86/MCE: Dump MCE to dmesg if no consumers x86/RAS: Add TSC timestamp to the injected MCE x86/MCE: Do not look at panic_on_oops in the severity grading	2016-12-12 12:58:50 -08:00
Linus Torvalds	92c020d08d	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "The main scheduler changes in this cycle were: - support Intel Turbo Boost Max Technology 3.0 (TBM3) by introducig a notion of 'better cores', which the scheduler will prefer to schedule single threaded workloads on. (Tim Chen, Srinivas Pandruvada) - enhance the handling of asymmetric capacity CPUs further (Morten Rasmussen) - improve/fix load handling when moving tasks between task groups (Vincent Guittot) - simplify and clean up the cputime code (Stanislaw Gruszka) - improve mass fork()ed task spread a.k.a. hackbench speedup (Vincent Guittot) - make struct kthread kmalloc()ed and related fixes (Oleg Nesterov) - add uaccess atomicity debugging (when using access_ok() in the wrong context), under CONFIG_DEBUG_ATOMIC_SLEEP=y (Peter Zijlstra) - implement various fixes, cleanups and other enhancements (Daniel Bristot de Oliveira, Martin Schwidefsky, Rafael J. Wysocki)" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits) sched/core: Use load_avg for selecting idlest group sched/core: Fix find_idlest_group() for fork kthread: Don't abuse kthread_create_on_cpu() in __kthread_create_worker() kthread: Don't use to_live_kthread() in kthread_[un]park() kthread: Don't use to_live_kthread() in kthread_stop() Revert "kthread: Pin the stack via try_get_task_stack()/put_task_stack() in to_live_kthread() function" kthread: Make struct kthread kmalloc'ed x86/uaccess, sched/preempt: Verify access_ok() context sched/x86: Make CONFIG_SCHED_MC_PRIO=y easier to enable sched/x86: Change CONFIG_SCHED_ITMT to CONFIG_SCHED_MC_PRIO x86/sched: Use #include <linux/mutex.h> instead of #include <asm/mutex.h> cpufreq/intel_pstate: Use CPPC to get max performance acpi/bus: Set _OSC for diverse core support acpi/bus: Enable HWP CPPC objects x86/sched: Add SD_ASYM_PACKING flags to x86 ITMT CPU x86/sysctl: Add sysctl for ITMT scheduling feature x86: Enable Intel Turbo Boost Max Technology 3.0 x86/topology: Define x86's arch_update_cpu_topology sched: Extend scheduler's asym packing sched/fair: Clean up the tunable parameter definitions ...	2016-12-12 12:15:10 -08:00
Rafael J. Wysocki	d2c2ba6901	Merge branches 'acpi-soc', 'acpi-battery', 'acpi-video', 'acpi-cppc' and 'acpi-apei' * acpi-soc: ACPI / LPSS: enable hard LLP for DMA ACPI / APD: Add clock frequency for future AMD I2C controller * acpi-battery: ACPI / battery: If _BIX fails, retry with _BIF * acpi-video: ACPI / video: Add force_native quirk for HP Pavilion dv6 ACPI / video: Add force_native quirk for Dell XPS 17 L702X ACPI / video: Move ACPI_VIDEO_NOTIFY_* defines to acpi/video.h * acpi-cppc: ACPI / CPPC: set an error code on probe error path * acpi-apei: ACPI / APEI / ARM64: APEI initial support for ARM64 ACPI / APEI: Fix NMI notification handling	2016-12-12 20:48:01 +01:00
Rafael J. Wysocki	631ddaba59	Merge branches 'pm-sleep' and 'powercap' * pm-sleep: PM / sleep: Print active wakeup sources when blocking on wakeup_count reads x86/suspend: fix false positive KASAN warning on suspend/resume PM / sleep / ACPI: Use the ACPI_FADT_LOW_POWER_S0 flag PM / sleep: System sleep state selection interface rework PM / hibernate: Verify the consistent of e820 memory map by md5 digest * powercap: powercap / RAPL: Add Knights Mill CPUID powercap/intel_rapl: fix and tidy up error handling powercap/intel_rapl: Track active CPUs internally powercap/intel_rapl: Cleanup duplicated init code powercap/intel rapl: Convert to hotplug state machine powercap/intel_rapl: Propagate error code when registration fails powercap/intel_rapl: Add missing domain data update on hotplug	2016-12-12 20:46:35 +01:00
Linus Torvalds	6cdf89b1ca	Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "The tree got pretty big in this development cycle, but the net effect is pretty good: 115 files changed, 673 insertions(+), 1522 deletions(-) The main changes were: - Rework and generalize the mutex code to remove per arch mutex primitives. (Peter Zijlstra) - Add vCPU preemption support: add an interface to query the preemption status of vCPUs and use it in locking primitives - this optimizes paravirt performance. (Pan Xinhui, Juergen Gross, Christian Borntraeger) - Introduce cpu_relax_yield() and remov cpu_relax_lowlatency() to clean up and improve the s390 lock yielding machinery and its core kernel impact. (Christian Borntraeger) - Micro-optimize mutexes some more. (Waiman Long) - Reluctantly add the to-be-deprecated mutex_trylock_recursive() interface on a temporary basis, to give the DRM code more time to get rid of its locking hacks. Any other users will be NAK-ed on sight. (We turned off the deprecation warning for the time being to not pollute the build log.) (Peter Zijlstra) - Improve the rtmutex code a bit, in light of recent long lived bugs/races. (Thomas Gleixner) - Misc fixes, cleanups" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits) x86/paravirt: Fix bool return type for PVOP_CALL() x86/paravirt: Fix native_patch() locking/ww_mutex: Use relaxed atomics locking/rtmutex: Explain locking rules for rt_mutex_proxy_unlock()/init_proxy_locked() locking/rtmutex: Get rid of RT_MUTEX_OWNER_MASKALL x86/paravirt: Optimize native pv_lock_ops.vcpu_is_preempted() locking/mutex: Break out of expensive busy-loop on {mutex,rwsem}_spin_on_owner() when owner vCPU is preempted locking/osq: Break out of spin-wait busy waiting loop for a preempted vCPU in osq_lock() Documentation/virtual/kvm: Support the vCPU preemption check x86/xen: Support the vCPU preemption check x86/kvm: Support the vCPU preemption check x86/kvm: Support the vCPU preemption check kvm: Introduce kvm_write_guest_offset_cached() locking/core, x86/paravirt: Implement vcpu_is_preempted(cpu) for KVM and Xen guests locking/spinlocks, s390: Implement vcpu_is_preempted(cpu) locking/core, powerpc: Implement vcpu_is_preempted(cpu) sched/core: Introduce the vcpu_is_preempted(cpu) interface sched/wake_q: Rename WAKE_Q to DEFINE_WAKE_Q locking/core: Provide common cpu_relax_yield() definition locking/mutex: Don't mark mutex_trylock_recursive() as deprecated, temporarily ...	2016-12-12 10:48:02 -08:00
Linus Torvalds	9ad1aeecdb	Merge branch 'core-smp-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull SMP bootup updates from Ingo Molnar: "Three changes to unify/standardize some of the bootup message printing in kernel/smp.c between architectures" * 'core-smp-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: kernel/smp: Tell the user we're bringing up secondary CPUs kernel/smp: Make the SMP boot message common on all arches kernel/smp: Define pr_fmt() for smp.c	2016-12-12 10:02:01 -08:00
Ingo Molnar	6643aab30f	Merge branch 'linus' into sched/core, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-12-11 13:10:40 +01:00
Peter Zijlstra	45dbea5f55	x86/paravirt: Fix native_patch() While chasing a regression I noticed we potentially patch the wrong code in native_patch(). If we do not select the native code sequence, we must use the default patcher, not fall-through the switch case. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alok Kataria <akataria@vmware.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Anvin <hpa@zytor.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel test robot <xiaolong.ye@intel.com> Fixes: `3cded41794` ("x86/paravirt: Optimize native pv_lock_ops.vcpu_is_preempted()") Link: http://lkml.kernel.org/r/20161208154349.270616999@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-12-11 13:09:19 +01:00
Ingo Molnar	6f38751510	Merge branch 'linus' into locking/core, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-12-11 13:07:13 +01:00
Thomas Gleixner	990e9dc381	x86/ldt: Make all size computations unsigned ldt->size can never be negative. The helper functions take 'unsigned int' arguments which are assigned from ldt->size. The related user space user_desc struct member entry_number is unsigned as well. But ldt->size itself and a few local variables which are related to ldt->size are type 'int' which makes no sense whatsoever and results in typecasts which make the eyes bleed. Clean it up and convert everything which is related to ldt->size to unsigned it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Dan Carpenter <dan.carpenter@oracle.com>	2016-12-10 00:24:39 +01:00
Dan Carpenter	296dc5806d	x86/ldt: Make a size argument unsigned My static checker complains that we put an upper bound on the "size" argument but not a lower bound. The checker is not smart enough to know the possible ranges of "old_mm->context.ldt->size" from init_new_context_ldt() so it thinks maybe it could be negative. Let's make it unsigned to silence the warning and future proof the code a bit. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: kernel-janitors@vger.kernel.org Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20161208105602.GA11382@elgon.mountain Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-12-10 00:24:39 +01:00
Thomas Gleixner	34bc3560c6	x86: Remove empty idle.h header One include less is always a good thing(tm). Good riddance. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20161209182912.2726-6-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-12-09 21:23:22 +01:00
Borislav Petkov	07c94a3812	x86/amd: Simplify AMD E400 aware idle routine Reorganize the E400 detection now that we have everything in place: switch the CPUs to broadcast mode after the LAPIC has been initialized and remove the facilities that were used previously on the idle path. Unfortunately static_cpu_has_bug() cannpt be used in the E400 idle routine because alternatives have been applied when the actual detection happens, so the static switching does not take effect and the test will stay false. Use boot_cpu_has_bug() instead which is definitely an improvement over the RDMSR and the cpumask handling. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20161209182912.2726-5-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-12-09 21:23:21 +01:00
Thomas Gleixner	e7ff3a4763	x86/amd: Check for the C1E bug post ACPI subsystem init AMD CPUs affected by the E400 erratum suffer from the issue that the local APIC timer stops when the CPU goes into C1E. Unfortunately there is no way to detect the affected CPUs on early boot. It's only possible to determine the range of possibly affected CPUs from the family/model range. The actual decision whether to enter C1E and thus cause the bug is done by the firmware and we need to detect that case late, after ACPI has been initialized. The current solution is to check in the idle routine whether the CPU is affected by reading the MSR_K8_INT_PENDING_MSG MSR and checking for the K8_INTP_C1E_ACTIVE_MASK bits. If one of the bits is set then the CPU is affected and the system is switched into forced broadcast mode. This is ineffective and on non-affected CPUs every entry to idle does the extra RDMSR. After doing some research it turns out that the bits are visible on the boot CPU right after the ACPI subsystem is initialized in the early boot process. So instead of polling for the bits in the idle loop, add a detection function after acpi_subsystem_init() and check for the MSR bits. If set, then the X86_BUG_AMD_APIC_C1E is set on the boot CPU and the TSC is marked unstable when X86_FEATURE_NONSTOP_TSC is not set as it will stop in C1E state as well. The switch to broadcast mode cannot be done at this point because the boot CPU still uses HPET as a clockevent device and the local APIC timer is not yet calibrated and installed. The switch to broadcast mode on the affected CPUs needs to be done when the local APIC timer is actually set up. This allows to cleanup the amd_e400_idle() function in the next step. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20161209182912.2726-4-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-12-09 21:23:21 +01:00
Thomas Gleixner	3344ed3079	x86/bugs: Separate AMD E400 erratum and C1E bug The workaround for the AMD Erratum E400 (Local APIC timer stops in C1E state) is a two step process: - Selection of the E400 aware idle routine - Detection whether the platform is affected The idle routine selection happens for possibly affected CPUs depending on family/model/stepping information. These range of CPUs is not necessarily affected as the decision whether to enable the C1E feature is made by the firmware. Unfortunately there is no way to query this at early boot. The current implementation polls a MSR in the E400 aware idle routine to detect whether the CPU is affected. This is inefficient on non affected CPUs because every idle entry has to do the MSR read. There is a better way to detect this before going idle for the first time which requires to seperate the bug flags: X86_BUG_AMD_E400 - Selects the E400 aware idle routine and enables the detection X86_BUG_AMD_APIC_C1E - Set when the platform is affected by E400 Replace the current X86_BUG_AMD_APIC_C1E usage by the new X86_BUG_AMD_E400 bug bit to select the idle routine which currently does an unconditional detection poll. X86_BUG_AMD_APIC_C1E is going to be used in later patches to remove the MSR polling and simplify the handling of this misfeature. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20161209182912.2726-3-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-12-09 21:23:20 +01:00
Steven Rostedt (Red Hat)	8cf868affd	tracing: Have the reg function allow to fail Some tracepoints have a registration function that gets enabled when the tracepoint is enabled. There may be cases that the registraction function must fail (for example, can't allocate enough memory). In this case, the tracepoint should also fail to register, otherwise the user would not know why the tracepoint is not working. Cc: David Howells <dhowells@redhat.com> Cc: Seiji Aguchi <seiji.aguchi@hds.com> Cc: Anton Blanchard <anton@samba.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2016-12-09 09:13:30 -05:00
Shaohua Li	76ae054c69	x86/intel_rdt: Implement show_options() for resctrlfs Implement show_options() callback for intel resource control filesystem to expose the active mount options in /proc/ Signed-off-by: Shaohua Li <shli@fb.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/7dce7c1886ac9289442d254ea18322c92bd968da.1480717072.git.shli@fb.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-12-09 14:12:18 +01:00
Josh Poimboeuf	b53f40db59	x86/suspend: fix false positive KASAN warning on suspend/resume Resuming from a suspend operation is showing a KASAN false positive warning: BUG: KASAN: stack-out-of-bounds in unwind_get_return_address+0x11d/0x130 at addr ffff8803867d7878 Read of size 8 by task pm-suspend/7774 page:ffffea000e19f5c0 count:0 mapcount:0 mapping: (null) index:0x0 flags: 0x2ffff0000000000() page dumped because: kasan: bad access detected CPU: 0 PID: 7774 Comm: pm-suspend Tainted: G B 4.9.0-rc7+ #8 Hardware name: Gigabyte Technology Co., Ltd. Z170X-UD5/Z170X-UD5-CF, BIOS F5 03/07/2016 Call Trace: dump_stack+0x63/0x82 kasan_report_error+0x4b4/0x4e0 ? acpi_hw_read_port+0xd0/0x1ea ? kfree_const+0x22/0x30 ? acpi_hw_validate_io_request+0x1a6/0x1a6 __asan_report_load8_noabort+0x61/0x70 ? unwind_get_return_address+0x11d/0x130 unwind_get_return_address+0x11d/0x130 ? unwind_next_frame+0x97/0xf0 __save_stack_trace+0x92/0x100 save_stack_trace+0x1b/0x20 save_stack+0x46/0xd0 ? save_stack_trace+0x1b/0x20 ? save_stack+0x46/0xd0 ? kasan_kmalloc+0xad/0xe0 ? kasan_slab_alloc+0x12/0x20 ? acpi_hw_read+0x2b6/0x3aa ? acpi_hw_validate_register+0x20b/0x20b ? acpi_hw_write_port+0x72/0xc7 ? acpi_hw_write+0x11f/0x15f ? acpi_hw_read_multiple+0x19f/0x19f ? memcpy+0x45/0x50 ? acpi_hw_write_port+0x72/0xc7 ? acpi_hw_write+0x11f/0x15f ? acpi_hw_read_multiple+0x19f/0x19f ? kasan_unpoison_shadow+0x36/0x50 kasan_kmalloc+0xad/0xe0 kasan_slab_alloc+0x12/0x20 kmem_cache_alloc_trace+0xbc/0x1e0 ? acpi_get_sleep_type_data+0x9a/0x578 acpi_get_sleep_type_data+0x9a/0x578 acpi_hw_legacy_wake_prep+0x88/0x22c ? acpi_hw_legacy_sleep+0x3c7/0x3c7 ? acpi_write_bit_register+0x28d/0x2d3 ? acpi_read_bit_register+0x19b/0x19b acpi_hw_sleep_dispatch+0xb5/0xba acpi_leave_sleep_state_prep+0x17/0x19 acpi_suspend_enter+0x154/0x1e0 ? trace_suspend_resume+0xe8/0xe8 suspend_devices_and_enter+0xb09/0xdb0 ? printk+0xa8/0xd8 ? arch_suspend_enable_irqs+0x20/0x20 ? try_to_freeze_tasks+0x295/0x600 pm_suspend+0x6c9/0x780 ? finish_wait+0x1f0/0x1f0 ? suspend_devices_and_enter+0xdb0/0xdb0 state_store+0xa2/0x120 ? kobj_attr_show+0x60/0x60 kobj_attr_store+0x36/0x70 sysfs_kf_write+0x131/0x200 kernfs_fop_write+0x295/0x3f0 __vfs_write+0xef/0x760 ? handle_mm_fault+0x1346/0x35e0 ? do_iter_readv_writev+0x660/0x660 ? __pmd_alloc+0x310/0x310 ? do_lock_file_wait+0x1e0/0x1e0 ? apparmor_file_permission+0x18/0x20 ? security_file_permission+0x73/0x1c0 ? rw_verify_area+0xbd/0x2b0 vfs_write+0x149/0x4a0 SyS_write+0xd9/0x1c0 ? SyS_read+0x1c0/0x1c0 entry_SYSCALL_64_fastpath+0x1e/0xad Memory state around the buggy address: ffff8803867d7700: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff8803867d7780: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff8803867d7800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f4 ^ ffff8803867d7880: f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 ffff8803867d7900: 00 00 00 f1 f1 f1 f1 04 f4 f4 f4 f3 f3 f3 f3 00 KASAN instrumentation poisons the stack when entering a function and unpoisons it when exiting the function. However, in the suspend path, some functions never return, so their stack never gets unpoisoned, resulting in stale KASAN shadow data which can cause later false positive warnings like the one above. Reported-by: Scott Bauer <scott.bauer@intel.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-12-06 02:22:44 +01:00
Ingo Molnar	1b95b1a06c	Merge branch 'locking/urgent' into locking/core, to pick up dependent fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-12-02 11:13:44 +01:00
Fenghua Yu	74fcdae1a7	x86/intel_rdt: Call intel_rdt_sched_in() with preemption disabled intel_rdt_sched_in() must be called with preemption disabled because the function accesses percpu variables (pqr_state and closid). If a task moves itself via move_myself() preemption is enabled, which violates the calling convention and can result in incorrect closid selection when the task gets preempted or migrated. Add the required protection and a comment about the calling convention. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Marcelo Tosatti" <mtosatti@redhat.com> Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com> Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com> Cc: "H. Peter Anvin" <h.peter.anvin@intel.com> Link: http://lkml.kernel.org/r/1480625714-54246-1-git-send-email-fenghua.yu@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-12-02 01:13:02 +01:00
Tomasz Nowicki	9f9a35a7b6	ACPI / APEI / ARM64: APEI initial support for ARM64 This patch provides APEI arch-specific bits for ARM64 Meanwhile, (1) Move HEST type (ACPI_HEST_TYPE_IA32_CORRECTED_CHECK) checking to a generic place. (2) Select HAVE_ACPI_APEI when EFI and ACPI is set on ARM64, because arch_apei_get_mem_attribute is using efi_mem_attributes() on ARM64. Signed-off-by: Tomasz Nowicki <tomasz.nowicki@linaro.org> Tested-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org> Signed-off-by: Fu Wei <fu.wei@linaro.org> [ Fu Wei: improve && upstream ] Acked-by: Hanjun Guo <hanjun.guo@linaro.org> Tested-by: Tyler Baicar <tbaicar@codeaurora.org> Acked-by: Will Deacon <will.deacon@arm.com> Reviewed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-12-02 00:24:34 +01:00
Thomas Gleixner	31f8a651fc	x86/tsc: Validate cpumask pointer before accessing it 0-day testing encountered a NULL pointer dereference in a cpumask access from tsc_store_and_check_tsc_adjust(). This happens when the function is called on the boot CPU and the topology masks are not yet available due to CPUMASK_OFFSTACK=y. Add a NULL pointer check for the mask pointer. If NULL it's safe to assume that the CPU is the boot CPU and the first one in the package. Fixes: `8b223bc7ab` ("x86/tsc: Store and check TSC ADJUST MSR") Reported-by: kernel test robot <xiaolong.ye@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-12-01 14:40:52 +01:00
Thiago Jung Bauermann	ec2b9bfaac	kexec_file: Change kexec_add_buffer to take kexec_buf as argument. This is done to simplify the kexec_add_buffer argument list. Adapt all callers to set up a kexec_buf to pass to kexec_add_buffer. In addition, change the type of kexec_buf.buffer from char * to void . There is no particular reason for it to be a char , and the change allows us to get rid of 3 existing casts to char * in the code. Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Acked-by: Dave Young <dyoung@redhat.com> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-11-30 23:14:59 +11:00
Thomas Gleixner	b836554386	x86/tsc: Fix broken CONFIG_X86_TSC=n build Add the missing return statement to the inline stub tsc_store_and_check_tsc_adjust() and add the other stubs to make a SMP=y,TSC=n build happy. While at it, remove the unused variable from the UP variant of tsc_store_and_check_tsc_adjust(). Fixes: commit ba75fb646931 ("x86/tsc: Sync test only for the first cpu in a package") Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-30 09:44:52 +01:00
Tim Chen	de966cf4a4	sched/x86: Change CONFIG_SCHED_ITMT to CONFIG_SCHED_MC_PRIO Rename CONFIG_SCHED_ITMT for Intel Turbo Boost Max Technology 3.0 to CONFIG_SCHED_MC_PRIO. This makes the configuration extensible in future to other architectures that wish to similarly establish CPU core priorities support in the scheduler. The description in Kconfig is updated to reflect this change with added details for better clarity. The configuration is explicitly default-y, to enable the feature on CPUs that have this feature. It has no effect on non-TBM3 CPUs. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bp@suse.de Cc: jolsa@redhat.com Cc: linux-acpi@vger.kernel.org Cc: linux-pm@vger.kernel.org Cc: rjw@rjwysocki.net Link: http://lkml.kernel.org/r/2b2ee29d93e3f162922d72d0165a1405864fbb23.1480444902.git.tim.c.chen@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-30 08:27:08 +01:00
Thomas Gleixner	cc4db26899	x86/tsc: Try to adjust TSC if sync test fails If the first CPU of a package comes online, it is necessary to test whether the TSC is in sync with a CPU on some other package. When a deviation is observed (time going backwards between the two CPUs) the TSC is marked unstable, which is a problem on large machines as they have to fall back to the HPET clocksource, which is insanely slow. It has been attempted to compensate the TSC by adding the offset to the TSC and writing it back some time ago, but this never was merged because it did not turn out to be stable, especially not on older systems. Modern systems have become more stable in that regard and the TSC_ADJUST MSR allows us to compensate for the time deviation in a sane way. If it's available allow up to three synchronization runs and if a time warp is detected the starting CPU can compensate the time warp via the TSC_ADJUST MSR and retry. If the third run still shows a deviation or when random time warps are detected the test terminally fails. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/20161119134018.048237517@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-29 19:23:18 +01:00
Thomas Gleixner	76d3b85158	x86/tsc: Prepare warp test for TSC adjustment To allow TSC compensation cross nodes its necessary to know in which direction the TSC warp was observed. Return the maximum observed value on the calling CPU so the caller can determine the direction later. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/20161119134017.970859287@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-29 19:23:18 +01:00
Thomas Gleixner	4c5e3c6375	x86/tsc: Move sync cleanup to a safe place Cleaning up the stop marker on the control CPU is wrong when we want to add retry support. Move the cleanup to the starting CPU. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/20161119134017.892095627@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-29 19:23:18 +01:00
Thomas Gleixner	a36f513681	x86/tsc: Sync test only for the first cpu in a package If the TSC_ADJUST MSR is available all CPUs in a package are forced to the same value. So TSCs cannot be out of sync when the first CPU in the package was in sync. That allows to skip the sync test for all CPUs except the first starting CPU in a package. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/20161119134017.809901363@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-29 19:23:17 +01:00
Thomas Gleixner	1d0095feea	x86/tsc: Verify TSC_ADJUST from idle When entering idle, it's a good oportunity to verify that the TSC_ADJUST MSR has not been tampered with (BIOS hiding SMM cycles). If tampering is detected, emit a warning and restore it to the previous value. This is especially important for machines, which mark the TSC reliable because there is no watchdog clocksource available (SoCs). This is not sufficient for HPC (NOHZ_FULL) situations where a CPU never goes idle, but adding a timer to do the check periodically is not an option either. On a machine, which has this issue, the check triggeres right during boot, so there is a decent chance that the sysadmin will notice. Rate limit the check to once per second and warn only once per cpu. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/20161119134017.732180441@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-29 19:23:16 +01:00
Thomas Gleixner	8b223bc7ab	x86/tsc: Store and check TSC ADJUST MSR The TSC_ADJUST MSR shows whether the TSC has been modified. This is helpful in a two aspects: 1) It allows to detect BIOS wreckage, where SMM code tries to 'hide' the cycles spent by storing the TSC value at SMM entry and restoring it at SMM exit. On affected machines the TSCs run slowly out of sync up to the point where the clocksource watchdog (if available) detects it. The TSC_ADJUST MSR allows to detect the TSC modification before that and eventually restore it. This is also important for SoCs which have no watchdog clocksource and therefore TSC wreckage cannot be detected and acted upon. 2) All threads in a package are required to have the same TSC_ADJUST value. Broken BIOSes break that and as a result the TSC synchronization check fails. The TSC_ADJUST MSR allows to detect the deviation when a CPU comes online. If detected set it to the value of an already online CPU in the same package. This also allows to reduce the number of sync tests because with that in place the test is only required for the first CPU in a package. In principle all CPUs in a system should have the same TSC_ADJUST value even across packages, but with physical CPU hotplug this assumption is not true because the TSC starts with power on, so physical hotplug has to do some trickery to bring the TSC into sync with already running packages, which requires to use an TSC_ADJUST value different from CPUs which got powered earlier. A final enhancement is the opportunity to compensate for unsynced TSCs accross nodes at boot time and make the TSC usable that way. It won't help for TSCs which run apart due to frequency skew between packages, but this gets detected by the clocksource watchdog later. The first step toward this is to store the TSC_ADJUST value of a starting CPU and compare it with the value of an already online CPU in the same package. If they differ, emit a warning and adjust it to the reference value. The !SMP version just stores the boot value for later verification. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/20161119134017.655323776@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-29 19:23:16 +01:00
Thomas Gleixner	bec8520dca	x86/tsc: Detect random warps If time warps can be observed then they should only ever be observed on one CPU. If they are observed on both CPUs then the system is completely hosed. Add a check for this condition and notify if it happens. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/20161119134017.574838461@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-29 19:23:15 +01:00
Thomas Gleixner	7b3d2f6e08	x86/tsc: Use X86_FEATURE_TSC_ADJUST in detect_art() The art detection uses rdmsrl_safe() to detect the availablity of the TSC_ADJUST MSR. That's pointless because we have a feature bit for this. Use it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/20161119134017.483561692@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-29 19:23:15 +01:00
Chen Yu	ba58d1020a	timekeeping: Ignore the bogus sleep time if pm_trace is enabled Power management suspend/resume tracing (ab)uses the RTC to store suspend/resume information persistently. As a consequence the RTC value is clobbered when timekeeping is resumed and tries to inject the sleep time. Commit `a4f8f6667f` ("timekeeping: Cap array access in timekeeping_debug") plugged a out of bounds array access in the timekeeping debug code which was caused by the clobbered RTC value, but we still use the clobbered RTC value for sleep time injection into kernel timekeeping, which will result in random adjustments depending on the stored "hash" value. To prevent this keep track of the RTC clobbering and ignore the invalid RTC timestamp at resume. If the system resumed successfully clear the flag, which marks the RTC as unusable, warn the user about the RTC clobber and recommend to adjust the RTC with 'ntpdate' or 'rdate'. [jstultz: Fixed up pr_warn formating, and implemented suggestions from Ingo] [ tglx: Rewrote changelog ] Originally-from: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: John Stultz <john.stultz@linaro.org> Acked-by: Pavel Machek <pavel@ucw.cz> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Prarit Bhargava <prarit@redhat.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Xunlei Pang <xlpang@redhat.com> Cc: Len Brown <lenb@kernel.org> Link: http://lkml.kernel.org/r/1480372524-15181-3-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-29 18:02:58 +01:00
Fenghua Yu	0efc89be94	x86/intel_rdt: Update task closid immediately on CPU in rmdir and unmount When removing a sub directory/rdtgroup by rmdir or umount, closid in a task in the sub directory is set to default rdtgroup's closid which is 0. If the task is running on a CPU, the PQR_ASSOC MSR is only updated when the task runs through a context switch. Up to the context switch, the task runs with the wrong closid. Make the change immediately effective by invoking a smp function call on all CPUs which are running moved task. If one of the affected tasks was moved or scheduled out before the function call is executed on the CPU the only damage is the extra interruption of the CPU. [ tglx: Reworked it to avoid blindly interrupting all CPUs and extra loops ] Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com> Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com> Cc: "H. Peter Anvin" <h.peter.anvin@intel.com> Link: http://lkml.kernel.org/r/1479511084-59727-2-git-send-email-fenghua.yu@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-28 11:07:50 +01:00
Fenghua Yu	2659f46da8	x86/intel_rdt: Fix setting of closid when adding CPUs to a group There was a cut & paste error when adding code to update the per-cpu closid when changing the bitmask of CPUs to an rdt group. The update erronously assigns the closid of the default group to the CPUs which are moved to a group instead of assigning the closid of their new group. Use the proper closid. Fixes: `f410770293` ("x86/intel_rdt: Update percpu closid immeditately on CPUs affected by change") Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com> Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com> Cc: "H. Peter Anvin" <h.peter.anvin@intel.com> Link: http://lkml.kernel.org/r/1479511084-59727-1-git-send-email-fenghua.yu@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-28 11:07:50 +01:00
Ingo Molnar	a293b3954a	x86/sched: Use #include <linux/mutex.h> instead of #include <asm/mutex.h> asm/mutex.h is gone from the locking tree, which makes sched/core break the build. Use linux/mutex.h instead, which is the canonical method. Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: peterz@infradead.org Cc: jolsa@redhat.com Cc: rjw@rjwysocki.net Cc: bp@suse.de Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-28 09:43:49 +01:00
Josh Poimboeuf	55f856e640	x86/unwind: Fix guess-unwinder regression My attempt at fixing some KASAN false positive warnings was rather brain dead, and it broke the guess unwinder. With frame pointers disabled, /proc/<pid>/stack is broken: # cat /proc/1/stack [<ffffffffffffffff>] 0xffffffffffffffff Restore the code flow to more closely resemble its previous state, while still using READ_ONCE_NOCHECK() macros to silence KASAN false positives. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: `c2d75e03d6` ("x86/unwind: Prevent KASAN false positive warnings in guess unwinder") Link: http://lkml.kernel.org/r/b824f92c2c22eca5ec95ac56bd2a7c84cf0b9df9.1480309971.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-28 07:47:54 +01:00
Borislav Petkov	6248f45674	x86/boot/64: Optimize fixmap page fixup Single-stepping through head_64.S made me look at the fixmap page PTEs fixup loop: So we're going through the whole level2_fixmap_pgt 4K page, looking at whether PAGE_PRESENT is set in those PTEs and add the delta between where we're compiled to run and where we actually end up running. However, if that delta is 0 (most cases) we go through all those 512 PTEs for no reason at all. Oh well, we add 0 but that's no reason to me. Skipping that useless fixup gives us a boot speedup of 0.004 seconds in my guest. Not a lot but considering how cheap it is, I'll take it. Here is the printk time difference: before: ... [ 0.000000] tsc: Marking TSC unstable due to TSCs unsynchronized [ 0.013590] Calibrating delay loop (skipped), value calculated using timer frequency.. 8027.17 BogoMIPS (lpj=16054348) [ 0.017094] pid_max: default: 32768 minimum: 301 ... after: ... [ 0.000000] tsc: Marking TSC unstable due to TSCs unsynchronized [ 0.009587] Calibrating delay loop (skipped), value calculated using timer frequency.. 8026.86 BogoMIPS (lpj=16053724) [ 0.013090] pid_max: default: 32768 minimum: 301 ... For the other two changes converting naked numbers to defines: # arch/x86/kernel/head_64.o: text data bss dec hex filename 1124 290864 4096 296084 48494 head_64.o.before 1124 290864 4096 296084 48494 head_64.o.after md5: 87086e202588939296f66e892414ffe2 head_64.o.before.asm 87086e202588939296f66e892414ffe2 head_64.o.after.asm Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161125111448.23623-1-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-28 07:45:17 +01:00
Borislav Petkov	9b032d21f6	x86/boot/64: Use defines for page size ... instead of naked numbers like the rest of the asm does in this file. No code changed: # arch/x86/kernel/head_64.o: text data bss dec hex filename 1124 290864 4096 296084 48494 head_64.o.before 1124 290864 4096 296084 48494 head_64.o.after md5: 87086e202588939296f66e892414ffe2 head_64.o.before.asm 87086e202588939296f66e892414ffe2 head_64.o.after.asm Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161124210550.15025-1-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-25 07:11:29 +01:00
Tim Chen	d3d37d850d	x86/sched: Add SD_ASYM_PACKING flags to x86 ITMT CPU Some Intel cores in a package can be boosted to a higher turbo frequency with ITMT 3.0 technology. The scheduler can use the asymmetric packing feature to move tasks to the more capable cores. If ITMT is enabled, add SD_ASYM_PACKING flag to the thread and core sched domains to enable asymmetric packing. Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Cc: linux-pm@vger.kernel.org Cc: peterz@infradead.org Cc: jolsa@redhat.com Cc: rjw@rjwysocki.net Cc: linux-acpi@vger.kernel.org Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: bp@suse.de Link: http://lkml.kernel.org/r/9bbb885bedbef4eb50e197305eb16b160cff0831.1479844244.git.tim.c.chen@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-24 20:44:20 +01:00
Tim Chen	f9793e3495	x86/sysctl: Add sysctl for ITMT scheduling feature Intel Turbo Boost Max Technology 3.0 (ITMT) feature allows some cores to be boosted to higher turbo frequency than others. Add /proc/sys/kernel/sched_itmt_enabled so operator can enable/disable scheduling of tasks that favor cores with higher turbo boost frequency potential. By default, system that is ITMT capable and single socket has this feature turned on. It is more likely to be lightly loaded and operates in Turbo range. When there is a change in the ITMT scheduling operation desired, a rebuild of the sched domain is initiated so the scheduler can set up sched domains with appropriate flag to enable/disable ITMT scheduling operations. Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org> Co-developed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Cc: linux-pm@vger.kernel.org Cc: peterz@infradead.org Cc: jolsa@redhat.com Cc: rjw@rjwysocki.net Cc: linux-acpi@vger.kernel.org Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: bp@suse.de Link: http://lkml.kernel.org/r/07cc62426a28bad57b01ab16bb903a9c84fa5421.1479844244.git.tim.c.chen@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-24 20:44:19 +01:00
Tim Chen	5e76b2ab36	x86: Enable Intel Turbo Boost Max Technology 3.0 On platforms supporting Intel Turbo Boost Max Technology 3.0, the maximum turbo frequencies of some cores in a CPU package may be higher than for the other cores in the same package. In that case, better performance (and possibly lower energy consumption as well) can be achieved by making the scheduler prefer to run tasks on the CPUs with higher max turbo frequencies. To that end, set up a core priority metric to abstract the core preferences based on the maximum turbo frequency. In that metric, the cores with higher maximum turbo frequencies are higher-priority than the other cores in the same package and that causes the scheduler to favor them when making load-balancing decisions using the asymmertic packing approach. At the same time, the priority of SMT threads with a higher CPU number is reduced so as to avoid scheduling tasks on all of the threads that belong to a favored core before all of the other cores have been given a task to run. The priority metric will be initialized by the P-state driver with the help of the sched_set_itmt_core_prio() function. The P-state driver will also determine whether or not ITMT is supported by the platform and will call sched_set_itmt_support() to indicate that. Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org> Co-developed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Cc: linux-pm@vger.kernel.org Cc: peterz@infradead.org Cc: jolsa@redhat.com Cc: rjw@rjwysocki.net Cc: linux-acpi@vger.kernel.org Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: bp@suse.de Link: http://lkml.kernel.org/r/cd401ccdff88f88c8349314febdc25d51f7c48f7.1479844244.git.tim.c.chen@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-24 20:44:19 +01:00
Tim Chen	7d25127cef	x86/topology: Define x86's arch_update_cpu_topology The scheduler calls arch_update_cpu_topology() to check whether the scheduler domains have to be rebuilt. So far x86 has no requirement for this, but the upcoming ITMT support makes this necessary. Request the rebuild when the x86 internal update flag is set. Suggested-by: Morten Rasmussen <morten.rasmussen@arm.com> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Cc: linux-pm@vger.kernel.org Cc: peterz@infradead.org Cc: jolsa@redhat.com Cc: rjw@rjwysocki.net Cc: linux-acpi@vger.kernel.org Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: bp@suse.de Link: http://lkml.kernel.org/r/bfbf5591276ec60b2af2da798adc1060df1e2a5f.1479844244.git.tim.c.chen@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-24 20:44:19 +01:00
Dan Carpenter	c4597fd756	x86/apic/uv: Silence a shift wrapping warning 'm_io' is stored in 6 bits so it's a number in the 0-63 range. Static analysis tools complain that 1 << 63 will wrap so I have changed it to 1ULL << m_io. This code is over three years old so presumably the bug doesn't happen very frequently in real life or someone would have complained by now. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Alex Thorlton <athorlton@sgi.com> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Travis <travis@sgi.com> Cc: Nathan Zimmer <nzimmer@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel-janitors@vger.kernel.org Fixes: `b15cc4a12b` ("x86, uv, uv3: Update x2apic Support for SGI UV3") Link: http://lkml.kernel.org/r/20161123221908.GA23997@mwanda Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-24 06:01:05 +01:00
Tony Luck	3f5a7896a5	x86/mce: Include the PPIN in MCE records when available Intel Xeons from Ivy Bridge onwards support a processor identification number set in the factory. To the user this is a handy unique number to identify a particular CPU. Intel can decode this to the fab/production run to track errors. On systems that have it, include it in the machine check record. I'm told that this would be helpful for users that run large data centers with multi-socket servers to keep track of which CPUs are seeing errors. Boris: * Add some clarifying comments and spacing. * Mask out [63:2] in the disabled-but-not-locked case * Call the MSR variable "val" for more readability. Signed-off-by: Tony Luck <tony.luck@intel.com> Cc: Ashok Raj <ashok.raj@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/20161123114855.njguoaygp3qnbkia@pd.tnic Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-23 16:51:52 +01:00
Ingo Molnar	ec84f00567	Merge branch 'linus' into sched/core, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-23 10:23:09 +01:00
Ingo Molnar	064e6a8ba6	Merge branch 'linus' into x86/fpu, to resolve conflicts Conflicts: arch/x86/kernel/fpu/core.c Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-23 07:18:09 +01:00
Sebastian Andrzej Siewior	8fba38c937	x86/msr: Convert to hotplug state machine Install the callbacks via the state machine and let the core invoke the callbacks on the already online CPUs. Move the callbacks to online/offline as there is no point in having the files around before the cpu is online and until its completely gone. [ tglx: Move the callbacks to online/offline ] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: rt@linuxtronix.de Link: http://lkml.kernel.org/r/20161117183541.8588-4-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-22 23:34:39 +01:00
Thomas Gleixner	ee92be9b0d	x86/cpuid: Move the hotplug callbacks to online No point to have this file around before the cpu is online and no point to have it around until the cpu is dead. Get rid of the explicit state. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Siewior <bigeasy@linutronix.de>	2016-11-22 23:34:39 +01:00
Sebastian Andrzej Siewior	8c07b494ab	x86/cpuid: Convert to hotplug state machine Install the callbacks via the state machine and let the core invoke the callbacks on the already online CPUs. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: rt@linuxtronix.de Link: http://lkml.kernel.org/r/20161117183541.8588-3-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-22 23:34:39 +01:00
Thomas Gleixner	33d97302eb	x86/mce/therm_throt: Move hotplug callbacks to online No point to have the sysfs files around before the cpu is online and no point to have them around until the cpu is dead. Get rid of the explicit state. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Borislav Petkov <bp@alien8.de>	2016-11-22 23:34:38 +01:00
Sebastian Andrzej Siewior	d6526e73db	x86/mce/therm_throt: Convert to hotplug state machine Install the callbacks via the state machine and let the core invoke the callbacks on the already online CPUs. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: rt@linuxtronix.de Cc: Borislav Petkov <bp@alien8.de> Cc: linux-edac@vger.kernel.org Link: http://lkml.kernel.org/r/20161117183541.8588-2-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-22 23:34:38 +01:00
Peter Zijlstra	3cded41794	x86/paravirt: Optimize native pv_lock_ops.vcpu_is_preempted() Avoid the pointless function call to pv_lock_ops.vcpu_is_preempted() when a paravirt spinlock enabled kernel is ran on native hardware. Do this by patching out the CALL instruction with "XOR %RAX,%RAX" which has the same effect (0 return value). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: David.Laight@ACULAB.COM Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: benh@kernel.crashing.org Cc: boqun.feng@gmail.com Cc: borntraeger@de.ibm.com Cc: bsingharora@gmail.com Cc: dave@stgolabs.net Cc: jgross@suse.com Cc: kernellwp@gmail.com Cc: konrad.wilk@oracle.com Cc: mpe@ellerman.id.au Cc: paulmck@linux.vnet.ibm.com Cc: paulus@samba.org Cc: pbonzini@redhat.com Cc: rkrcmar@redhat.com Cc: will.deacon@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-22 12:48:11 +01:00
Pan Xinhui	1885aa7041	x86/kvm: Support the vCPU preemption check Support the vcpu_is_preempted() functionality under KVM. This will enhance lock performance on overcommitted hosts (more runnable vCPUs than physical CPUs in the system) as doing busy waits for preempted vCPUs will hurt system performance far worse than early yielding. struct kvm_steal_time::preempted indicates that if one vCPU is running or not after commit "x86, kvm/x86.c: support vCPU preempted check". unix benchmark result: host: kernel 4.8.1, i5-4570, 4 cpus guest: kernel 4.8.1, 8 vcpus test-case after-patch before-patch Execl Throughput \| 18307.9 lps \| 11701.6 lps File Copy 1024 bufsize 2000 maxblocks \| 1352407.3 KBps \| 790418.9 KBps File Copy 256 bufsize 500 maxblocks \| 367555.6 KBps \| 222867.7 KBps File Copy 4096 bufsize 8000 maxblocks \| 3675649.7 KBps \| 1780614.4 KBps Pipe Throughput \| 11872208.7 lps \| 11855628.9 lps Pipe-based Context Switching \| 1495126.5 lps \| 1490533.9 lps Process Creation \| 29881.2 lps \| 28572.8 lps Shell Scripts (1 concurrent) \| 23224.3 lpm \| 22607.4 lpm Shell Scripts (8 concurrent) \| 3531.4 lpm \| 3211.9 lpm System Call Overhead \| 10385653.0 lps \| 10419979.0 lps Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: David.Laight@ACULAB.COM Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: benh@kernel.crashing.org Cc: boqun.feng@gmail.com Cc: borntraeger@de.ibm.com Cc: bsingharora@gmail.com Cc: dave@stgolabs.net Cc: jgross@suse.com Cc: kernellwp@gmail.com Cc: konrad.wilk@oracle.com Cc: linuxppc-dev@lists.ozlabs.org Cc: mpe@ellerman.id.au Cc: paulmck@linux.vnet.ibm.com Cc: paulus@samba.org Cc: rkrcmar@redhat.com Cc: virtualization@lists.linux-foundation.org Cc: will.deacon@arm.com Cc: xen-devel-request@lists.xenproject.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1478077718-37424-10-git-send-email-xinhui.pan@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-22 12:48:08 +01:00
Pan Xinhui	446f3dc8cc	locking/core, x86/paravirt: Implement vcpu_is_preempted(cpu) for KVM and Xen guests Optimize spinlock and mutex busy-loops by providing a vcpu_is_preempted(cpu) function on KVM and Xen platforms. Extend the pv_lock_ops interface accordingly and implement the callbacks on KVM and Xen. Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> [ Translated to English. ] Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: David.Laight@ACULAB.COM Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: benh@kernel.crashing.org Cc: boqun.feng@gmail.com Cc: borntraeger@de.ibm.com Cc: bsingharora@gmail.com Cc: dave@stgolabs.net Cc: jgross@suse.com Cc: kernellwp@gmail.com Cc: konrad.wilk@oracle.com Cc: linuxppc-dev@lists.ozlabs.org Cc: mpe@ellerman.id.au Cc: paulmck@linux.vnet.ibm.com Cc: paulus@samba.org Cc: rkrcmar@redhat.com Cc: virtualization@lists.linux-foundation.org Cc: will.deacon@arm.com Cc: xen-devel-request@lists.xenproject.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1478077718-37424-7-git-send-email-xinhui.pan@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-22 12:48:07 +01:00
Yazen Ghannam	f5382de9d4	x86/mce/AMD: Add system physical address translation for AMD Fam17h The Unified Memory Controllers (UMCs) on Fam17h log a normalized address in their MCA_ADDR registers. We need to convert that normalized address to a system physical address in order to support a few facilities: 1) To offline poisoned pages in DRAM proactively in the deferred error handler. 2) To print sysaddr and page info for DRAM ECC errors in EDAC. [ Boris: fixes/cleanups ontop: * hi_addr_offset = 0 - no need for that branch. Stick it all under the HiAddrOffsetEn case. It confines hi_addr_offset's declaration too. * Move variables to the innermost scope they're used at so that we save on stack and not blow it up immediately on function entry. * Do not modify sys_addr prematurely - we want to not exit early and have modified sys_addr some, which callers get to see. We either convert to a sys_addr or we don't do anything. And we signal that with the retval of the function. * Rename label out -> out_err - because it is the error path. * No need to pr_err of the conversion failed case: imagine a sparsely-populated machine with UMCs which don't have DIMMs. Callers should look at the retval instead and issue a printk only when really necessary. No need for useless info in dmesg. * s/temp_reg/tmp/ and other variable names shortening => shorter code. * Use BIT() everywhere. * Make error messages more informative. * Small build fix for the !CONFIG_X86_MCE_AMD case. * ... and more minor cleanups. ] Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20161122111133.mjzpvzhf7o7yl2oa@pd.tnic [ Typo fixes. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-22 12:30:16 +01:00
Josh Poimboeuf	3d02a9c48d	x86/dumpstack: Make stack name tags more comprehensible NMI stack dumps are bracketed by the following tags: <NMI> ... <EOE> The ending tag is kind of confusing if you don't already know what "EOE" means (end of exception). The same ending tag is also used to mark the end of all other exceptions' stacks. For example: <#DF> ... <EOE> And similarly, "EOI" is used as the ending tag for interrupts: <IRQ> ... <EOI> Change the tags to be more comprehensible by making them symmetrical and more XML-esque: <NMI> ... </NMI> <#DF> ... </#DF> <IRQ> ... </IRQ> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/180196e3754572540b595bc56b947d43658979a7.1479491159.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-21 13:00:42 +01:00
Borislav Petkov	254fe9c7a4	x86/MCE/AMD: Fix thinko about thresholding_en So adding thresholding_en et al was a good thing for removing the per-CPU thresholding callback, i.e., threshold_cpu_callback. But, in order for it to work and especially that test in mce_threshold_create_device() so that all thresholding banks get properly created and not the whole thing to fail with a NULL ptr dereference at mce_cpu_pre_down() when we offline the CPUs, we need to set the thresholding_en flag before we start creating the devices. Yap, it failed because thresholding_en wasn't set at the time we were creating the banks so we didn't create any and then at mce_cpu_pre_down() -> mce_threshold_remove_device() time, we would blow up. And the fix is actually easy: we have thresholding on the system when we have managed to set the thresholding vector to amd_threshold_interrupt() earlier in mce_amd_feature_init() while we were picking apart the thresholding banks and what is set and what not. So let's do that. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yazen Ghannam <Yazen.Ghannam@amd.com> Fixes: `4d7b02d58c` ("x86/mcheck: Split threshold_cpu_callback into two callbacks") Link: http://lkml.kernel.org/r/20161119103402.5227-1-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-21 11:02:12 +01:00
Yu-cheng Yu	b22cbe404a	x86/fpu: Fix invalid FPU ptrace state after execve() Robert O'Callahan reported that after an execve PTRACE_GETREGSET NT_X86_XSTATE continues to return the pre-exec register values until the exec'ed task modifies FPU state. The test code is at: https://bugzilla.redhat.com/attachment.cgi?id=1164286. What is happening is fpu__clear() does not properly clear fpstate. Fix it by doing just that. Reported-by: Robert O'Callahan <robert@ocallahan.org> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Cc: <stable@vger.kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@suse.de> Cc: David Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1479402695-6553-1-git-send-email-yu-cheng.yu@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-21 10:38:35 +01:00
Len Brown	7a3e686e1b	x86/idle: Remove enter_idle(), exit_idle() Upon removal of the is_idle flag, these routines became NOPs. Signed-off-by: Len Brown <len.brown@intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/822f2c22cc5890f7b8ea0eeec60277eb44505b4e.1479449716.git.len.brown@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-18 12:07:57 +01:00
Len Brown	f08b5fe2d4	x86/idle: Remove is_idle flag Upon removal of the idle_notifier, all accesses to the "is_idle" flag serve no purpose. Signed-off-by: Len Brown <len.brown@intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/e4a24197cf9c227fcd1ca2df09999eaec9052f49.1479449716.git.len.brown@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-18 12:07:57 +01:00
Len Brown	8e7a7ee9dd	x86/idle: Remove idle_notifier Upon removal of the i7300_idle driver, the idle_notifer is unused. Signed-off-by: Len Brown <len.brown@intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/f15385a82ec4bf51f4f06777193d83f03b28cfdd.1479449716.git.len.brown@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-18 12:07:56 +01:00
Thomas Gleixner	984fecebda	x86/tsc: Finalize the split of the TSC_RELIABLE flag All places which used the TSC_RELIABLE to skip the delayed calibration have been converted to use the TSC_KNOWN_FREQ flag. Make the immeditate clocksource registration, which skips the long term calibration, solely depend on TSC_KNOWN_FREQ. The TSC_RELIABLE now merily removes the requirement for a watchdog clocksource. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Bin Gao <bin.gao@intel.com> Cc: Peter Zijlstra <peterz@infradead.org>	2016-11-18 10:58:31 +01:00
Bin Gao	f3a02ecebe	x86/tsc: Set TSC_KNOWN_FREQ and TSC_RELIABLE flags on Intel Atom SoCs TSC on Intel Atom SoCs capable of determining TSC frequency by MSR is reliable and the frequency is known (provided by HW). On these platforms PIT/HPET is generally not available so calibration won't work at all and there is no other clocksource to act as a watchdog for the TSC, so we have no other choice than to trust it. Set both X86_FEATURE_TSC_KNOWN_FREQ and X86_FEATURE_TSC_RELIABLE flags to make sure the calibration is skipped and no watchdog is required. Signed-off-by: Bin Gao <bin.gao@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1479241644-234277-5-git-send-email-bin.gao@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-18 10:58:31 +01:00
Bin Gao	4635fdc696	x86/tsc: Mark Intel ATOM_GOLDMONT TSC reliable On Intel GOLDMONT Atom SoC TSC is the only available clocksource, so there is no way to do software calibration or have a watchdog clocksource for it. Software calibration is already disabled via the TSC_KNOWN_FREQ flag, but the watchdog requirement still persists, so such systems cannot switch to high resolution/nohz mode. Mark it reliable, so it becomes usable. Hardware teams confirmed that this is safe on that SoC. Signed-off-by: Bin Gao <bin.gao@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1479241644-234277-4-git-send-email-bin.gao@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-18 10:58:30 +01:00
Bin Gao	4ca4df0b7e	x86/tsc: Mark TSC frequency determined by CPUID as known CPUs/SoCs with CPUID leaf 0x15 come with a known frequency and will report the frequency to software via CPUID instruction. This hardware provided frequency is the "real" frequency of TSC. Set the X86_FEATURE_TSC_KNOWN_FREQ flag for such systems to skip the software calibration process. A 24 hours test on one of the CPUID 0x15 capable platforms was conducted. PIT calibrated frequency resulted in more than 3 seconds drift whereas the CPUID determined frequency showed less than 0.5 second drift. Signed-off-by: Bin Gao <bin.gao@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1479241644-234277-3-git-send-email-bin.gao@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-18 10:58:30 +01:00
Bin Gao	47c95a46d0	x86/tsc: Add X86_FEATURE_TSC_KNOWN_FREQ flag The X86_FEATURE_TSC_RELIABLE flag in Linux kernel implies both reliable (at runtime) and trustable (at calibration). But reliable running and trustable calibration independent of each other. Add a new flag X86_FEATURE_TSC_KNOWN_FREQ, which denotes that the frequency is known (via MSR/CPUID). This flag is only meant to skip the long term calibration on systems which have a known frequency. Add X86_FEATURE_TSC_KNOWN_FREQ to the skip the delayed calibration and leave X86_FEATURE_TSC_RELIABLE in place. After converting the existing users of X86_FEATURE_TSC_RELIABLE to use either both flags or just X86_FEATURE_TSC_KNOWN_FREQ we can seperate the functionality. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Bin Gao <bin.gao@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1479241644-234277-2-git-send-email-bin.gao@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-18 10:58:30 +01:00
Josh Poimboeuf	91e08ab0c8	x86/dumpstack: Prevent KASAN false positive warnings The oops stack dump code scans the entire stack, which can cause KASAN "stack-out-of-bounds" false positive warnings. Tell KASAN to ignore it. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: davej@codemonkey.org.uk Cc: dvyukov@google.com Link: http://lkml.kernel.org/r/5f6e80c4b0c7f7f0b6211900847a247cdaad753c.1479398226.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-18 09:38:00 +01:00
Josh Poimboeuf	c2d75e03d6	x86/unwind: Prevent KASAN false positive warnings in guess unwinder The guess unwinder scans the entire stack, which can cause KASAN "stack-out-of-bounds" false positive warnings. Tell KASAN to ignore it. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Stephane Eranian <eranian@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: davej@codemonkey.org.uk Cc: dvyukov@google.com Link: http://lkml.kernel.org/r/61939c0b2b2d63ce97ba59cba3b00fd47c2962cf.1479398226.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-18 09:38:00 +01:00
Ingo Molnar	89a01c51cb	Merge branch 'x86/cpufeature' into x86/asm, to pick up dependency Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-17 08:30:54 +01:00
Josh Poimboeuf	f4474c9f0b	x86/dumpstack: Handle NULL stack pointer in show_trace_log_lvl() When show_trace_log_lvl() is called from show_regs(), it completely fails to dump the stack. This bug was introduced when show_stack_log_lvl() was removed with the following commit: `0ee1dd9f5e` ("x86/dumpstack: Remove raw stack dump") Previous callers of that function now call show_trace_log_lvl() directly. That resulted in a subtle change, in that the 'stack' argument can now be NULL in certain cases. A NULL 'stack' pointer means that the stack dump should start from the topmost stack frame unless 'regs' is valid, in which case it should start from 'regs->sp'. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: `0ee1dd9f5e` ("x86/dumpstack: Remove raw stack dump") Link: http://lkml.kernel.org/r/c551842302a9c222d96a14e42e4003f059509f69.1479362652.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-17 07:48:39 +01:00
Arnd Bergmann	553bbc11aa	x86/boot: Avoid warning for zero-filling .bss The latest binutils are warning about a .fill directive with an explicit value in a .bss section: arch/x86/kernel/head_32.S: Assembler messages: arch/x86/kernel/head_32.S:677: Warning: ignoring fill value in section `.bss..page_aligned' arch/x86/kernel/head_32.S:679: Warning: ignoring fill value in section `.bss..page_aligned' This comes from the 'ENTRY()' macro padding the space between the symbols with 'nop' via: .align 4,0x90 Open-coding the .globl directive without the padding avoids that warning, as all the symbols are already page aligned. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161116141726.2013389-1-arnd@arndb.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-17 07:34:58 +01:00
Gayatri Kammela	a8d9df5a50	x86/cpufeatures: Enable new AVX512 cpu features Add a few new AVX512 instruction groups/features for enumeration in /proc/cpuinfo: AVX512IFMA and AVX512VBMI. Clear the flags in fpu_xstate_clear_all_cpu_caps(). CPUID.(EAX=7,ECX=0):EBX[bit 21] AVX512IFMA CPUID.(EAX=7,ECX=0):ECX[bit 1] AVX512VBMI Detailed information of cpuid bits for the features can be found at https://bugzilla.kernel.org/show_bug.cgi?id=187891 Signed-off-by: Gayatri Kammela <gayatri.kammela@intel.com> Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Ravi Shankar <ravi.v.shankar@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: mingo@elte.hu Link: http://lkml.kernel.org/r/1479327060-18668-1-git-send-email-gayatri.kammela@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-17 01:09:40 +01:00
Yazen Ghannam	ddfe43cdc0	x86/amd_nb: Add SMN and Indirect Data Fabric access for AMD Fam17h Some devices on Fam17h can only be accessed through the System Management Network (SMN). The SMN is accessed by a pair of index/data registers in PCI config space. Add a pair of functions to read from and write to the SMN. The Data Fabric on Fam17h allows multiple devices to use the same register space. The registers of a specific device are accessed indirectly using the device's DF InstanceId. Currently, we only need to read from these devices, so only define a read function for now. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/1478812257-5424-5-git-send-email-Yazen.Ghannam@amd.com [ Boris: make __amd_smn_rw() even more compact. ] Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-16 20:46:38 +01:00
Yazen Ghannam	b791c6b6a5	x86/amd_nb: Add Fam17h Data Fabric as "Northbridge" AMD Fam17h uses a Data Fabric component instead of a traditional Northbridge. However, the DF is similar to a NB in that there is one per die and it uses PCI config D18Fx registers. So let's reuse the existing AMD_NB infrastructure for Data Fabrics. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/1478812257-5424-4-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-16 20:46:38 +01:00
Yazen Ghannam	de6bd0835a	x86/amd_nb: Make all exports EXPORT_SYMBOL_GPL Make all EXPORT_SYMBOL's into EXPORT_SYMBOL_GPL. While we're at it let's fix some checkpatch warnings. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/1478812257-5424-3-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-16 20:46:38 +01:00
Yazen Ghannam	c7993890e7	x86/amd_nb: Make amd_northbridges internal to amd_nb.c Hide amd_northbridges in amd_nb.c so that external callers will have to use the exported accessor functions. Also, fix some checkpatch.pl warnings. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/1478812257-5424-2-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-16 20:46:37 +01:00
Thomas Gleixner	7ce7f35b33	Merge branch 'x86/cpufeature' into x86/cache Resolve the cpu/scattered conflict. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-16 14:19:34 +01:00
He Chen	47bdf3378d	x86/cpuid: Provide get_scattered_cpuid_leaf() Sparse populated CPUID leafs are collected in a software provided leaf to avoid bloat of the x86_capability array, but there is no way to rebuild the real leafs (e.g. for KVM CPUID enumeration) other than rereading the CPUID leaf from the CPU. While this is possible it is problematic as it does not take software disabled features into account. If a feature is disabled on the host it should not be exposed to a guest either. Add get_scattered_cpuid_leaf() which rebuilds the leaf from the scattered cpuid table information and the active CPU features. [ tglx: Rewrote changelog ] Signed-off-by: He Chen <he.chen@linux.intel.com> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Luwei Kang <luwei.kang@intel.com> Cc: kvm@vger.kernel.org Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Piotr Luc <Piotr.Luc@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Link: http://lkml.kernel.org/r/1478856336-9388-3-git-send-email-he.chen@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-16 11:13:09 +01:00
He Chen	47f10a3600	x86/cpuid: Cleanup cpuid_regs definitions cpuid_regs is defined multiple times as structure and enum. Rename the enum and move all of it to processor.h so we don't end up with more instances. Rename the misnomed register enumeration from CR_* to the obvious CPUID_*. [ tglx: Rewrote changelog ] Signed-off-by: He Chen <he.chen@linux.intel.com> Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Luwei Kang <luwei.kang@intel.com> Cc: kvm@vger.kernel.org Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Piotr Luc <Piotr.Luc@intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Link: http://lkml.kernel.org/r/1478856336-9388-2-git-send-email-he.chen@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-16 11:13:09 +01:00
Ingo Molnar	0acbc7aa47	Merge branch 'linus' into sched/core, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-16 10:16:28 +01:00
Yazen Ghannam	18807ddb7f	x86/mce/AMD: Reset Threshold Limit after logging error The error count field in MCA_MISC does not get reset by hardware when the threshold has been reached. Software is expected to reset it. Currently, the threshold limit only gets reset during init or when a user writes to sysfs. If the user is not monitoring threshold interrupts and resetting the limit then the user will only see 1 interrupt when the limit is first hit. So if, for example, the limit is set to 10 then only 1 interrupt will be recorded after 10 errors even if 100 errors have occurred. The user may then assume that only 10 errors have occurred. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/1479244433-69267-1-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-16 09:57:11 +01:00
David Herrmann	f96acec8c8	x86/sysfb: Fix lfb_size calculation The screen_info.lfb_size field is shifted by 16 bits only in case of VBE. This has historical reasons since VBE advertised it similarly. However, in case of EFI framebuffers, the size is no longer shifted. Fix the x86 simple-framebuffer setup code to use the correct size in the non-VBE case. While at it, avoid variable abbreviations and rename 'len' to 'length', and use the correct types matching the screen_info definition. Signed-off-by: David Herrmann <dh.herrmann@gmail.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt.fleming@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tom Gundersen <teg@jklm.no> Link: http://lkml.kernel.org/r/20161115120158.15388-3-dh.herrmann@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-16 09:38:23 +01:00
David Herrmann	9164b4ceb7	x86/sysfb: Add support for 64bit EFI lfb_base The screen_info object was extended to support 64-bit lfb_base addresses in: `ae2ee627dc` ("efifb: Add support for 64-bit frame buffer addresses") However, the x86 simple-framebuffer setup code never made use of it. Fix it to properly assemble and verify the lfb_base before advertising simple-framebuffer devices. In particular, this means if VIDEO_CAPABILITY_64BIT_BASE is set, the screen_info->ext_lfb_base field will contain the upper 32bit of the actual lfb_base. Make sure the address is not 0 (i.e., unset), as well as does not overflow the physical address type. Signed-off-by: David Herrmann <dh.herrmann@gmail.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt.fleming@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tom Gundersen <teg@jklm.no> Link: http://lkml.kernel.org/r/20161115120158.15388-2-dh.herrmann@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-16 09:38:22 +01:00
Sebastian Andrzej Siewior	0e285d36bd	x86/mcheck: Move CPU_DEAD to hotplug state machine This moves the last piece of the old hotplug notifier code in MCE to the new hotplug state machine. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Tony Luck <tony.luck@intel.com> Cc: rt@linutronix.de Cc: linux-edac@vger.kernel.org Link: http://lkml.kernel.org/r/20161110174447.11848-8-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-16 09:34:18 +01:00
Sebastian Andrzej Siewior	8c0eeac819	x86/mcheck: Move CPU_ONLINE and CPU_DOWN_PREPARE to hotplug state machine The CPU_ONLINE and CPU_DOWN_PREPARE look fully symmetrical and could be move to the hotplug state machine. On a failure during registration we have the tear down callback invoked (mce_cpu_pre_down()) so there should be no timer around and so no need to need keep notifier installed (this was the reason according to the comment why the notifier was registered despite of errors). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Tony Luck <tony.luck@intel.com> Cc: rt@linutronix.de Cc: linux-edac@vger.kernel.org Link: http://lkml.kernel.org/r/20161110174447.11848-7-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-16 09:34:18 +01:00
Sebastian Andrzej Siewior	39f152ffbf	x86/mcheck: Reorganize the hotplug callbacks Initially I wanted to remove mcheck_cpu_init() from identify_cpu() and let it become an independent early hotplug callback. The main problem here was that the init on the boot CPU may happen too late (device_initcall_sync(mcheck_init_device)) and nobody wanted to risk receiving and MCE event at boot time leading to a shutdown (if the MCE feature is not yet enabled). Here is attempt two: the timming stays as-is but the ordering of the functions is changed: - mcheck_cpu_init() (which is run from identify_cpu()) will setup the timer struct but won't fire the timer. This is moved to CPU_ONLINE since its cleanup part is in CPU_DOWN_PREPARE. So if it is okay to stop the timer early in the shutdown phase, it should be okay to start it late in the bring up phase. - CPU_DOWN_PREPARE disables the MCE feature flags for !INTEL CPUs in mce_disable_cpu(). If a failure occures it would be re-enabled on all vendor CPUs (including Intel where it was not disabled during shutdown). To keep this working I am moving it to CPU_ONLINE. smp_call_function_single() is dropped beause the notifier runs nowdays on the target CPU. - CPU_ONLINE is invoking mce_device_create() + mce_threshold_create_device() but its cleanup part is in CPU_DEAD (mce_threshold_remove_device() and mce_device_remove()). In order to keep this symmetrical I am moving the clean up from CPU_DEAD to CPU_DOWN_PREPARE. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Tony Luck <tony.luck@intel.com> Cc: rt@linutronix.de Cc: linux-edac@vger.kernel.org Link: http://lkml.kernel.org/r/20161110174447.11848-6-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-16 09:34:18 +01:00
Sebastian Andrzej Siewior	4d7b02d58c	x86/mcheck: Split threshold_cpu_callback into two callbacks The threshold_cpu_callback callbacks looks like one of the notifier and its arguments are almost the same. Split this out and have one ONLINE and one DEAD callback. This will come handy later once the main code gets changed to use the callback mechanism. Also, handle threshold_cpu_callback_online() return value so we don't continue if the function fails. Boris Petkov removed the callback pointer and replaced it with proper functions. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Tony Luck <tony.luck@intel.com> Cc: rt@linutronix.de Cc: linux-edac@vger.kernel.org Link: http://lkml.kernel.org/r/20161110174447.11848-5-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-16 09:34:17 +01:00
Sebastian Andrzej Siewior	7f34b935e8	x86/mcheck: Be prepared for a rollback back to the ONLINE state If we try a CPU down and fail in the middle then we roll back to the online state. This means we would perform CPU_ONLINE / mce_device_create() without invoking CPU_DEAD / mce_device_remove() for the cleanup of what was allocated in CPU_ONLINE. Be prepared for this and don't allocate the struct if we have it already. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Tony Luck <tony.luck@intel.com> Cc: rt@linutronix.de Cc: linux-edac@vger.kernel.org Link: http://lkml.kernel.org/r/20161110174447.11848-4-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-16 09:34:17 +01:00
Sebastian Andrzej Siewior	ec553abb31	x86/mcheck: Explicit cleanup on failure in mce_amd If the ONLINE callback fails, the driver does not any clean up right away instead it waits to get to the DEAD stage to do it. Yes, it waits. Since we don't pass the error code back to the caller, no one knows. Do the clean up right away so it does not look like a leak. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Tony Luck <tony.luck@intel.com> Cc: rt@linutronix.de Cc: linux-edac@vger.kernel.org Link: http://lkml.kernel.org/r/20161110174447.11848-3-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-16 09:34:17 +01:00
Sebastian Andrzej Siewior	0943637293	x86/mcheck: Move threshold_create_device() Move the threshold_create_device() so it can use threshold_remove_device() without a forward declaration. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Tony Luck <tony.luck@intel.com> Cc: rt@linutronix.de Cc: linux-edac@vger.kernel.org Link: http://lkml.kernel.org/r/20161110174447.11848-2-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-16 09:34:16 +01:00
Fenghua Yu	f410770293	x86/intel_rdt: Update percpu closid immeditately on CPUs affected by changee If CPUs are moved to or removed from a rdtgroup, the percpu closid storage is updated. If tasks running on an affected CPU use the percpu closid then the PQR_ASSOC MSR is only updated when the task runs through a context switch. Up to the context switch the CPUs operate on the wrong closid. This state is potentially unbound. Make the change immediately effective by invoking a smp function call on the affected CPUs which stores the new closid in the perpu storage and calls the rdt_sched_in() function which updates the MSR, if the current task uses the percpu closid. [ tglx: Made it work and massaged changelog once more ] Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com> Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com> Cc: "Ingo Molnar" <mingo@elte.hu> Cc: "H. Peter Anvin" <h.peter.anvin@intel.com> Link: http://lkml.kernel.org/r/1478912558-55514-3-git-send-email-fenghua.yu@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-15 18:35:50 +01:00
Fenghua Yu	c7cc0cc10c	x86/intel_rdt: Reset per cpu closids on unmount All CPUs in a rdtgroup are given back to the default rdtgroup before the rdtgroup is removed during umount. After umount, the default rdtgroup contains all online CPUs, but the per cpu closids are not cleared. As a result the stale closid value will be used immediately after the next mount. Move all cpus to the default group and update the percpu closid storage. [ tglx: Massaged changelong ] Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com> Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com> Cc: "Ingo Molnar" <mingo@elte.hu> Cc: "H. Peter Anvin" <h.peter.anvin@intel.com> Link: http://lkml.kernel.org/r/1478912558-55514-2-git-send-email-fenghua.yu@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-15 18:35:50 +01:00
Thomas Gleixner	a2584e1d5a	x86/intel_rdt: Prevent deadlock against hotplug lock The cpu online/offline callbacks of intel_rdt lock rdtgroup_mutex nested inside of cpu hotplug lock. rdtgroup_cpus_write() does it in reverse order. Remove the get/put_online_cpus() calls from rdtgroup_cpus_write(). This is safe against cpu hotplug as the resource group cpumasks are protected by rdtgroup_mutex. Found by review, but should have been found if authors would have bothered to test cpu hotplug with lockdep enabled. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Shaohua Li <shli@fb.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Tony Luck <tony.luck@intel.com>	2016-11-15 18:35:49 +01:00
Fenghua Yu	f57b308728	x86/intel_rdt: Protect info directory from removal The info directory and the per-resource subdirectories of the info directory have no reference to a struct rdtgroup in kn->priv. An attempt to remove one of those directories results in a NULL pointer dereference. Protect the directories from removal and return -EPERM instead of -ENOENT. [ tglx: Massaged changelog ] Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com> Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com> Cc: "Ingo Molnar" <mingo@elte.hu> Cc: "H. Peter Anvin" <h.peter.anvin@intel.com> Link: http://lkml.kernel.org/r/1478912558-55514-1-git-send-email-fenghua.yu@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-15 18:35:49 +01:00
Stanislaw Gruszka	353c50ebe3	sched/cputime: Simplify task_cputime() Now since fetch_task_cputime() has no other users than task_cputime(), its code could be used directly in task_cputime(). Moreover since only 2 task_cputime() calls of 17 use a NULL argument, we can add dummy variables to those calls and remove NULL checks from task_cputimes(). Also remove NULL checks from task_cputimes_scaled(). Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Neuling <mikey@neuling.org> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1479175612-14718-5-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-15 09:51:05 +01:00
Paul Gortmaker	523d0fb4f0	x86/percpu: Remove unnecessary include of module.h, add asm/desc.h This was originally a part of commit `186f43608a`: ("x86/kernel: Audit and remove any unnecessary uses of module.h") ...but without the asm/desc.h addition. As such, Ingo reported a build failure on i386 allnoconfig with SMP=y during his pre-merge testing. For expediency the chunk was just dropped at that time. The failure was as follows: arch/x86/kernel/setup_percpu.c: In function ‘setup_percpu_segment’: arch/x86/kernel/setup_percpu.c:159:2: error: implicit declaration of function ‘pack_descriptor’ [-Werror=implicit-function-declaration] arch/x86/kernel/setup_percpu.c:162:2: error: implicit declaration of function ‘write_gdt_entry’ [-Werror=implicit-function-declaration] arch/x86/kernel/setup_percpu.c:162:18: error: implicit declaration of function ‘get_cpu_gdt_table’ [-Werror=implicit-function-declaration] As pack_descriptor(), write_gdt_entry() and get_cpu_gdt_table() all live in the file arch/x86/include/asm/desc.h -- calling that header out explicitly should fix things. Reported-by: Ingo Molnar <mingo@redhat.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161114190443.10873-1-paul.gortmaker@windriver.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-15 07:26:37 +01:00
Linus Torvalds	8528d66248	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes: - fix an Intel/MID boot crash/hang bug - fix a cache topology mis-parsing bug on certain AMD CPUs - fix a virtualization firmware bug by adding a check+quirk workaround on the kernel side" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu: Deal with broken firmware (VMWare/XEN) x86/cpu/AMD: Fix cpu_llc_id for AMD Fam17h systems x86/platform/intel-mid: Retrofit pci_platform_pm_ops ->get_state hook	2016-11-14 08:39:56 -08:00
Arnd Bergmann	3a6d867612	x86: apm: avoid uninitialized data apm_bios_call() can fail, and return a status in its argument structure. If that status however is zero during a call from apm_get_power_status(), we end up using data that may have never been set, as reported by "gcc -Wmaybe-uninitialized": arch/x86/kernel/apm_32.c: In function ‘apm’: arch/x86/kernel/apm_32.c:1729:17: error: ‘bx’ may be used uninitialized in this function [-Werror=maybe-uninitialized] arch/x86/kernel/apm_32.c:1835:5: error: ‘cx’ may be used uninitialized in this function [-Werror=maybe-uninitialized] arch/x86/kernel/apm_32.c:1730:17: note: ‘cx’ was declared here arch/x86/kernel/apm_32.c:1842:27: error: ‘dx’ may be used uninitialized in this function [-Werror=maybe-uninitialized] arch/x86/kernel/apm_32.c:1731:17: note: ‘dx’ was declared here This changes the function to return "APM_NO_ERROR" here, which makes the code more robust to broken BIOS versions, and avoids the warning. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Jiri Kosina <jkosina@suse.cz> Reviewed-by: Luis R. Rodriguez <mcgrof@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-11-11 08:45:08 -08:00
Borislav Petkov	54467353a9	x86/MCE: Correct TSC timestamping of error records We did have logic in the MCE code which would TSC-timestamp an error record only when it is exact - i.e., when it wasn't detected by polling. This isn't the case anymore. So let's fix that: We have a valid TSC timestamp in the error record only when it has been a precise detection, i.e., either in the #MC handler or in one of the interrupt handlers (thresholding, deferred, ...). All other error records still have mce.time which contains the wall time in order to be able to place the error record in time at least approximately. Also, this fixes another bug where machine_check_poll() would clear mce.tsc unconditionally even if we requested precise MCP_TIMESTAMP logging. The proper fix would be to generate timestamp only when it has been requested and not always. But that would require a more thorough code audit of all mce_gather_info/mce_setup() users. Add a FIXME for now. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony <tony.luck@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: kernel test robot <xiaolong.ye@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: lkp@01.org Link: http://lkml.kernel.org/r/20161110131053.kybsijfs5venpjnf@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-11 08:08:24 +01:00
Sudeep Holla	fac5148257	drivers: base: cacheinfo: fix x86 with CONFIG_OF enabled With CONFIG_OF enabled on x86, we get the following error on boot: " Failed to find cpu0 device node Unable to detect cache hierarchy from DT for CPU 0 " and the cacheinfo fails to get populated in the corresponding sysfs entries. This is because cache_setup_of_node looks for of_node for setting up the shared cpu_map without checking that it's already populated in the architecture specific callback. In order to indicate that the shared cpu_map is already populated, this patch introduces a boolean `cpu_map_populated` in struct cpu_cacheinfo that can be used by the generic code to skip cache_shared_cpu_map_setup. This patch also sets that boolean for x86. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-11-10 17:30:53 +01:00
Wanpeng Li	8ca225520e	x86/apic: Prevent tracing on apic_msr_write_eoi() The following RCU lockdep warning led to adding irq_enter()/irq_exit() into smp_reschedule_interrupt(): RCU used illegally from idle CPU! rcu_scheduler_active = 1, debug_locks = 0 RCU used illegally from extended quiescent state! no locks held by swapper/1/0. do_trace_write_msr native_write_msr native_apic_msr_eoi_write smp_reschedule_interrupt reschedule_interrupt As Peterz pointed out: \| So now we're making a very frequent interrupt slower because of debug \| code. \| \| The thing is, many many smp_reschedule_interrupt() invocations don't \| actually execute anything much at all and are only sent to tickle the \| return to user path (which does the actual preemption). \| \| Having to do the whole irq_enter/irq_exit dance just for this unlikely \| debug case totally blows. Use the wrmsr_notrace() variant in native_apic_msr_write_eoi, annotate the kvm variant with notrace and add a native_apic_eoi callback to the apic structure so KVM guests are covered as well. This allows to revert the irq_enter/irq_exit dance in smp_reschedule_interrupt(). Suggested-by: Peter Zijlstra <peterz@infradead.org> Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: kvm@vger.kernel.org Cc: Mike Galbraith <efault@gmx.de> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/1478488420-5982-3-git-send-email-wanpeng.li@hotmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-09 22:03:14 +01:00
Thomas Gleixner	d49597fd3b	x86/cpu: Deal with broken firmware (VMWare/XEN) Both ACPI and MP specifications require that the APIC id in the respective tables must be the same as the APIC id in CPUID. The kernel retrieves the physical package id from the APIC id during the ACPI/MP table scan and builds the physical to logical package map. The physical package id which is used after a CPU comes up is retrieved from CPUID. So we rely on ACPI/MP tables and CPUID agreeing in that respect. There exist VMware and XEN implementations which violate the spec. As a result the physical to logical package map, which relies on the ACPI/MP tables does not work on those systems, because the CPUID initialized physical package id does not match the firmware id. This causes system crashes and malfunction due to invalid package mappings. The only way to cure this is to sanitize the physical package id after the CPUID enumeration and yell when the APIC ids are different. Fix up the initial APIC id, which is fine as it is only used printout purposes. If the physical package IDs differ yell and use the package information from the ACPI/MP tables so the existing logical package map just works. Chas provided the resulting dmesg output for his affected 4 virtual sockets, 1 core per socket VM: [Firmware Bug]: CPU1: APIC id mismatch. Firmware: 1 CPUID: 2 [Firmware Bug]: CPU1: Using firmware package id 1 instead of 2 .... Reported-and-tested-by: "Charles (Chas) Williams" <ciwillia@brocade.com>, Reported-by: M. Vefa Bicakci <m.v.b@runbox.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Alok Kataria <akataria@vmware.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: #4.6+ <stable@vger,kernel.org> Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1611091613540.3501@nanos Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-09 21:05:01 +01:00
Yazen Ghannam	b6a50cddbc	x86/cpu/AMD: Clean up cpu_llc_id assignment per topology feature These changes do not affect current hw - just a cleanup: Currently, we assume that a system has a single Last Level Cache (LLC) per node, and that the cpu_llc_id is thus equal to the node_id. This no longer applies since Fam17h can have multiple last level caches within a node. So group the cpu_llc_id assignment by topology feature and family in order to make the computation of cpu_llc_id on the different families more clear. Here is how the LLC ID is being computed on the different families: The NODEID_MSR feature only applies to Fam10h in which case the LLC is at the node level. The TOPOEXT feature is used on families 15h, 16h and 17h. So far we only see multiple last level caches if L3 caches are available. Otherwise, the cpu_llc_id will default to be the phys_proc_id. We have L3 caches only on families 15h and 17h: - on Fam15h, the LLC is at the node level. - on Fam17h, the LLC is at the core complex level and can be found by right shifting the APIC ID. Also, keep the family checks explicit so that new families will fall back to the default, which will be node_id for TOPOEXT systems. Single node systems in families 10h and 15h will have a Node ID of 0 which will be the same as the phys_proc_id, so we don't need to check for multiple nodes before using the node_id. Tested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> [ Rewrote the commit message. ] Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20161108153054.bs3sajbyevq6a6uu@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-09 17:07:43 +01:00
Ingo Molnar	ca4b2df651	Merge branch 'x86/urgent' into x86/cpu, to pick up dependent fix Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-09 17:07:11 +01:00
Yazen Ghannam	b0b6e86846	x86/cpu/AMD: Fix cpu_llc_id for AMD Fam17h systems cpu_llc_id (Last Level Cache ID) derivation on AMD Fam17h has an underflow bug when extracting the socket_id value. It starts from 0 so subtracting 1 from it will result in an invalid value. This breaks scheduling topology later on since the cpu_llc_id will be incorrect. For example, the the cpu_llc_id of the other CPU in the loops in set_cpu_sibling_map() underflows and we're generating the funniest thread_siblings masks and then when I run 8 threads of nbench, they get spread around the LLC domains in a very strange pattern which doesn't give you the normal scheduling spread one would expect for performance. Other things like EDAC use cpu_llc_id so they will be b0rked too. So, the APIC ID is preset in APICx020 for bits 3 and above: they contain the core complex, node and socket IDs. The LLC is at the core complex level so we can find a unique cpu_llc_id by right shifting the APICID by 3 because then the least significant bit will be the Core Complex ID. Tested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> [ Cleaned up and extended the commit message. ] Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> # v4.4.. Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: `3849e91f57` ("x86/AMD: Fix last level cache topology for AMD Fam17h systems") Link: http://lkml.kernel.org/r/20161108083506.rvqb5h4chrcptj7d@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-09 17:06:08 +01:00
Borislav Petkov	c09a8c40e0	x86/RAS: Hide SMCA bank names Add accessor functions and hide the smca_names array. Also, add a sanity-check to bank HWID assignment in get_smca_bank_info(). Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/20161104152317.5r276t35df53qk76@pd.tnic Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-08 17:10:15 +01:00
Borislav Petkov	a9a1c0ee04	x86/RAS: Rename smca_bank_names to smca_names Make it differ more from struct smca_bank_name for better readability. Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Yazen Ghannam <yazen.ghannam@amd.com> Link: http://lkml.kernel.org/r/20161103125556.15482-3-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-08 17:10:14 +01:00
Borislav Petkov	1ce9cd7f9f	x86/RAS: Simplify SMCA HWID descriptor struct Call it simply smca_hwid and call local variables "hwid". More readable. Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Yazen Ghannam <yazen.ghannam@amd.com> Link: http://lkml.kernel.org/r/20161103125556.15482-2-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-08 17:10:14 +01:00
Borislav Petkov	79349f529a	x86/RAS: Simplify SMCA bank descriptor struct Call the struct simply smca_bank, it's instance ID can be simply ->id. Makes the code much more readable. Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Yazen Ghannam <yazen.ghannam@amd.com> Link: http://lkml.kernel.org/r/20161103125556.15482-1-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-08 17:10:14 +01:00
Borislav Petkov	cd9c57cad3	x86/MCE: Dump MCE to dmesg if no consumers When there are no error record consumers registered with the kernel, the only thing that appears in dmesg is something like: [ 300.000326] mce: [Hardware Error]: Machine check events logged and the error records are gone. Which is seriously counterproductive. So let's dump them to dmesg instead, in such a case. Requested-by: Eric Morton <Eric.Morton@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/20161101120911.13163-4-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-08 17:10:13 +01:00
Yinghai Lu	f5e886ef9b	x86/MCE: Do not look at panic_on_oops in the severity grading The MCE tolerance levels control whether we panic on a machine check or do something else like generating a signal and logging error information. This is controlled by the mce=<level> command line parameter. However, if panic_on_oops is set, it will force a panic for such an MCE even though the user didn't want to. So don't check panic_on_oops in the severity grading anymore. One of the use cases for that is recovery from uncorrectable errors with mce=2. [ Boris: rewrite commit message. ] Signed-off-by: Yinghai Lu <yinghai.lu@oracle.com> Acked-by: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/20160916202325.4972-1-yinghai@kernel.org Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-08 17:10:12 +01:00
Shaohua Li	53a114a690	x86/intel_rdt: Export the minimum number of set mask bits in sysfs The minimum number of bits set for a cache mask is checked by the kernel when writing a mask, but there is no way for the user to retrieve this information. Add a new file 'min_cbm_bits' to the info directory and export the information to user space. [ tglx: Massaged changelog ] Signed-off-by: Shaohua Li <shli@fb.com> Cc: fenghua.yu@intel.com Cc: tony.luck@intel.com Link: http://lkml.kernel.org/r/e69b1ffa206d0353eea58101e1bf9b677d9732f7.1478207143.git.shli@fb.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-07 12:20:52 +01:00
Shaohua Li	7bff0af510	x86/intel_rdt: Propagate error in rdt_mount() properly gcc complains: "warning: ‘dentry’ may be used uninitialized in this function" The error exit path in rdt_mount(), which deals with a failure in rdtgroup_create_info_dir(), does not set the error code in dentry and returns the uninitialized dentry value. Add the missing error propagation. [tglx: Massaged changelog ] Fixes: `4e978d06de` ("x86/intel_rdt: Add "info" files to resctrl file system") Signed-off-by: Shaohua Li <shli@fb.com> Cc: fenghua.yu@intel.com Cc: tony.luck@intel.com Link: http://lkml.kernel.org/r/a56a556f6768dc12cadbf97f49e000189056f90e.1478207143.git.shli@fb.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-07 12:20:52 +01:00
Andy Lutomirski	af25ed59b5	x86/fpu: Remove clts() The kernel doesn't use clts() any more. Remove it and all of its paravirt infrastructure. A careful reader may notice that xen_clts() appears to have been buggy -- it didn't update xen_cr0_value. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Rik van Riel <riel@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kvm list <kvm@vger.kernel.org> Link: http://lkml.kernel.org/r/3d3c8ca62f17579b9849a013d71e59a4d5d1b079.1477951965.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-01 07:47:55 +01:00
Andy Lutomirski	bef8b6da95	x86/fpu: Handle #NM without FPU emulation as an error Don't use CR0.TS. Make it an error rather than making nonsensical changes to the FPU state. (The cond_local_irq_enable() appears to have been pointless, too.) Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Rik van Riel <riel@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kvm list <kvm@vger.kernel.org> Link: http://lkml.kernel.org/r/f1ee6bf73ed1025fccaab321ba43d0594245f927.1477951965.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-01 07:47:54 +01:00
Andy Lutomirski	5a83d60c07	x86/fpu: Remove irq_ts_save() and irq_ts_restore() Now that lazy FPU is gone, we don't use CR0.TS (except possibly in KVM guest mode). Remove irq_ts_save(), irq_ts_restore(), and all of their callers. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Rik van Riel <riel@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kvm list <kvm@vger.kernel.org> Link: http://lkml.kernel.org/r/70b9b9e7ba70659bedcb08aba63d0f9214f338f2.1477951965.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-01 07:47:54 +01:00
Andy Lutomirski	fc560a80ba	x86/fpu: Stop saving and restoring CR0.TS in fpu__init_check_bugs() fpu__init_check_bugs() runs long after the early FPU init, so CR0.TS will be clear by the time it runs. The save-and-restore dance would have been unnecessary anyway, though, as kernel_fpu_begin() would have been good enough. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Rik van Riel <riel@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kvm list <kvm@vger.kernel.org> Link: http://lkml.kernel.org/r/76d1f1eacb5caead98197d1eb50ac6110ab20c6a.1477951965.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-01 07:47:53 +01:00
Andy Lutomirski	36fd4f0249	x86/fpu: Get rid of two redundant clts() calls CR0.TS is cleared by a direct CR0 write in fpu__init_cpu_generic(). We don't need to call clts() two more times right after that. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Rik van Riel <riel@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kvm list <kvm@vger.kernel.org> Link: http://lkml.kernel.org/r/476d2d5066eda24838853426ea74c94140b50c85.1477951965.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-01 07:47:53 +01:00
Ingo Molnar	c29c716662	Merge branch 'core/urgent' into x86/fpu, to merge fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-01 07:47:40 +01:00
Ingo Molnar	05b93c19d5	Merge branch 'linus' into x86/asm, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-01 07:41:06 +01:00
Fenghua Yu	4f341a5e48	x86/intel_rdt: Add scheduler hook Hook the x86 scheduler code to update closid based on whether the current task is assigned to a specific closid or running on a CPU assigned to a specific closid. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Shaohua Li" <shli@fb.com> Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com> Cc: "Peter Zijlstra" <peterz@infradead.org> Cc: "Stephane Eranian" <eranian@google.com> Cc: "Dave Hansen" <dave.hansen@intel.com> Cc: "David Carrillo-Cisneros" <davidcc@google.com> Cc: "Nilay Vaish" <nilayvaish@gmail.com> Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com> Cc: "Ingo Molnar" <mingo@elte.hu> Cc: "Borislav Petkov" <bp@suse.de> Cc: "H. Peter Anvin" <h.peter.anvin@intel.com> Link: http://lkml.kernel.org/r/1477692289-37412-10-git-send-email-fenghua.yu@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-10-30 19:10:16 -06:00
Tony Luck	60ec2440c6	x86/intel_rdt: Add schemata file Last of the per resource group files. Also mode 0644. This one shows the resources available to the group. Syntax depends on whether the "cdp" mount option was given. With code/data prioritization disabled it is simply a list of masks for each cache domain. Initial value allows access to all of the L3 cache on all domains. E.g. on a 2 socket Broadwell: L3:0=fffff;1=fffff With CDP enabled, separate masks for data and instructions are provided: L3DATA:0=fffff;1=fffff L3CODE:0=fffff;1=fffff Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com> Cc: "Shaohua Li" <shli@fb.com> Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com> Cc: "Peter Zijlstra" <peterz@infradead.org> Cc: "Stephane Eranian" <eranian@google.com> Cc: "Dave Hansen" <dave.hansen@intel.com> Cc: "David Carrillo-Cisneros" <davidcc@google.com> Cc: "Nilay Vaish" <nilayvaish@gmail.com> Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com> Cc: "Ingo Molnar" <mingo@elte.hu> Cc: "Borislav Petkov" <bp@suse.de> Cc: "H. Peter Anvin" <h.peter.anvin@intel.com> Link: http://lkml.kernel.org/r/1477692289-37412-9-git-send-email-fenghua.yu@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-10-30 19:10:16 -06:00
Fenghua Yu	e02737d5b8	x86/intel_rdt: Add tasks files The root directory all subdirectories are automatically populated with a read/write (mode 0644) file named "tasks". When read it will show all the task IDs assigned to the resource group. Tasks can be added (one at a time) to a group by writing the task ID to the file. E.g. Membership in a resource group is indicated by a new field in the task_struct "int closid" which holds the CLOSID for each task. The default resource group uses CLOSID=0 which means that all existing tasks when the resctrl file system is mounted belong to the default group. If a group is removed, tasks which are members of that group are moved to the default group. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Shaohua Li" <shli@fb.com> Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com> Cc: "Peter Zijlstra" <peterz@infradead.org> Cc: "Stephane Eranian" <eranian@google.com> Cc: "Dave Hansen" <dave.hansen@intel.com> Cc: "David Carrillo-Cisneros" <davidcc@google.com> Cc: "Nilay Vaish" <nilayvaish@gmail.com> Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com> Cc: "Ingo Molnar" <mingo@elte.hu> Cc: "Borislav Petkov" <bp@suse.de> Cc: "H. Peter Anvin" <h.peter.anvin@intel.com> Link: http://lkml.kernel.org/r/1477692289-37412-8-git-send-email-fenghua.yu@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-10-30 19:10:15 -06:00
Tony Luck	12e0110c11	x86/intel_rdt: Add cpus file Now we populate each directory with a read/write (mode 0644) file named "cpus". This is used to over-ride the resources available to processes in the default resource group when running on specific CPUs. Each "cpus" file reads as a cpumask showing which CPUs belong to this resource group. Initially all online CPUs are assigned to the default group. They can be added to other groups by writing a cpumask to the "cpus" file in the directory for the resource group (which will remove them from the previous group to which they were assigned). CPU online/offline operations will delete CPUs that go offline from whatever group they are in and add new CPUs to the default group. If there are CPUs assigned to a group when the directory is removed, they are returned to the default group. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com> Cc: "Shaohua Li" <shli@fb.com> Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com> Cc: "Peter Zijlstra" <peterz@infradead.org> Cc: "Stephane Eranian" <eranian@google.com> Cc: "Dave Hansen" <dave.hansen@intel.com> Cc: "David Carrillo-Cisneros" <davidcc@google.com> Cc: "Nilay Vaish" <nilayvaish@gmail.com> Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com> Cc: "Ingo Molnar" <mingo@elte.hu> Cc: "Borislav Petkov" <bp@suse.de> Cc: "H. Peter Anvin" <h.peter.anvin@intel.com> Link: http://lkml.kernel.org/r/1477692289-37412-7-git-send-email-fenghua.yu@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-10-30 19:10:15 -06:00
Fenghua Yu	60cf5e101f	x86/intel_rdt: Add mkdir to resctrl file system Resource control groups are represented as directories in the resctrl file system. The root directory describes the default resources available to tasks that have not been assigned specific resources. Other directories can be created at the root level to make new resource groups. It is not permitted to make directories within other directories. Hardware uses a CLOSID (Class of service ID) to determine which resource limits are currently in effect. The exact number available is enumerated by CPUID leaf 0x10, but on current implementations it is a small number. We implement a simple bitmask allocator for CLOSIDs. Each resource control group uses one CLOSID, which limits the total number of directories that can be created. Resource groups can be removed using rmdir. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Shaohua Li" <shli@fb.com> Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com> Cc: "Peter Zijlstra" <peterz@infradead.org> Cc: "Stephane Eranian" <eranian@google.com> Cc: "Dave Hansen" <dave.hansen@intel.com> Cc: "David Carrillo-Cisneros" <davidcc@google.com> Cc: "Nilay Vaish" <nilayvaish@gmail.com> Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com> Cc: "Ingo Molnar" <mingo@elte.hu> Cc: "Borislav Petkov" <bp@suse.de> Cc: "H. Peter Anvin" <h.peter.anvin@intel.com> Link: http://lkml.kernel.org/r/1477692289-37412-6-git-send-email-fenghua.yu@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-10-30 19:10:14 -06:00
Fenghua Yu	4e978d06de	x86/intel_rdt: Add "info" files to resctrl file system For the convenience of applications we make the decoded values of some of the CPUID values available in read-only (0444) files. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Shaohua Li" <shli@fb.com> Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com> Cc: "Peter Zijlstra" <peterz@infradead.org> Cc: "Stephane Eranian" <eranian@google.com> Cc: "Dave Hansen" <dave.hansen@intel.com> Cc: "David Carrillo-Cisneros" <davidcc@google.com> Cc: "Nilay Vaish" <nilayvaish@gmail.com> Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com> Cc: "Ingo Molnar" <mingo@elte.hu> Cc: "Borislav Petkov" <bp@suse.de> Cc: "H. Peter Anvin" <h.peter.anvin@intel.com> Link: http://lkml.kernel.org/r/1477692289-37412-5-git-send-email-fenghua.yu@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-10-30 19:10:14 -06:00
Fenghua Yu	5ff193fbde	x86/intel_rdt: Add basic resctrl filesystem support Use kernfs as basis for our user interface filesystem. This patch supports mount/umount, and one mount parameter "cdp" to enable code/data prioritization (though all we do at this point is ensure that the system can support CDP). The file system is not populated yet in this patch. [ tglx: Fixed up a few nits and added cdp handling in case of error ] Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Shaohua Li" <shli@fb.com> Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com> Cc: "Peter Zijlstra" <peterz@infradead.org> Cc: "Stephane Eranian" <eranian@google.com> Cc: "Dave Hansen" <dave.hansen@intel.com> Cc: "David Carrillo-Cisneros" <davidcc@google.com> Cc: "Nilay Vaish" <nilayvaish@gmail.com> Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com> Cc: "Ingo Molnar" <mingo@elte.hu> Cc: "Borislav Petkov" <bp@suse.de> Cc: "H. Peter Anvin" <h.peter.anvin@intel.com> Link: http://lkml.kernel.org/r/1477692289-37412-4-git-send-email-fenghua.yu@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-10-30 19:10:14 -06:00
Tony Luck	2264d9c74d	x86/intel_rdt: Build structures for each resource based on cache topology We use the cpu hotplug notifier to catch each cpu in turn and look at its cache topology w.r.t each of the resource groups. As we discover new resources, we initialize the bitmask array for each to the default (full access) value. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com> Cc: "Shaohua Li" <shli@fb.com> Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com> Cc: "Peter Zijlstra" <peterz@infradead.org> Cc: "Stephane Eranian" <eranian@google.com> Cc: "Dave Hansen" <dave.hansen@intel.com> Cc: "David Carrillo-Cisneros" <davidcc@google.com> Cc: "Nilay Vaish" <nilayvaish@gmail.com> Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com> Cc: "Ingo Molnar" <mingo@elte.hu> Cc: "Borislav Petkov" <bp@suse.de> Cc: "H. Peter Anvin" <h.peter.anvin@intel.com> Link: http://lkml.kernel.org/r/1477692289-37412-3-git-send-email-fenghua.yu@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-10-30 19:10:13 -06:00
Alexey Makhalov	80e9a4f21f	x86/vmware: Add paravirt sched clock The default sched_clock() implementation is native_sched_clock(). It contains code to handle non constant frequency TSCs, which creates overhead for systems with constant frequency TSCs. The vmware hypervisor guarantees a constant frequency TSC, so native_sched_clock() is not required and slower than a dedicated function which operates with one time calculated conversion factors. Calculate the conversion factors at boot time from the tsc frequency and install an optimized sched_clock() function via paravirt ops. The paravirtualized clock can be disabled on the kernel command line with the new 'no-vmw-sched-clock' option. Signed-off-by: Alexey Makhalov <amakhalov@vmware.com> Acked-by: Alok N Kataria <akataria@vmware.com> Cc: linux-doc@vger.kernel.org Cc: pv-drivers@vmware.com Cc: corbet@lwn.net Cc: virtualization@lists.linux-foundation.org Link: http://lkml.kernel.org/r/20161028075432.90579-4-amakhalov@vmware.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-10-30 08:57:08 +01:00
Alexey Makhalov	91d1e54ebd	x86/vmware: Add basic paravirt ops support Add basic paravirt support: 1. Set pv_info.name to "VMware hypervisor" to have proper boot log message Booting paravirtualized kernel on VMware hypervisor instead of "... on bare hardware" 2. Set pv_cpu_ops.io_delay() to empty function - paravirt_noop() to avoid vm-exits on IO delays because io delays they are not required. Signed-off-by: Alexey Makhalov <amakhalov@vmware.com> Acked-by: Alok N Kataria <akataria@vmware.com> Cc: linux-doc@vger.kernel.org Cc: pv-drivers@vmware.com Cc: corbet@lwn.net Cc: virtualization@lists.linux-foundation.org Link: http://lkml.kernel.org/r/20161028075432.90579-3-amakhalov@vmware.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-10-30 08:57:07 +01:00
Alexey Makhalov	687bca8d66	x86/vmware: Use tsc_khz value for calibrate_cpu() Commit `aa297292d7` ("x86/tsc: Enumerate SKL cpu_khz and tsc_khz via CPUID") separated the calibration mechanisms for cpu_khz and tsc_khz. Since the vmware hypervisor provides a constant frequency TSC to the guest, this change can lead to divergence between the tsc and the cpu frequency after vmotion, which might confuse the user. Solve this by overriding the x86 platform cpu calibration callback with the vmware specific tsc calibration function. Signed-off-by: Alexey Makhalov <amakhalov@vmware.com> Acked-by: Alok N Kataria <akataria@vmware.com> Cc: linux-doc@vger.kernel.org Cc: pv-drivers@vmware.com Cc: corbet@lwn.net Cc: virtualization@lists.linux-foundation.org Link: http://lkml.kernel.org/r/20161028075432.90579-2-amakhalov@vmware.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-10-30 08:57:07 +01:00
Thomas Gleixner	1e90a13d0c	x86/smpboot: Init apic mapping before usage The recent changes, which forced the registration of the boot cpu on UP systems, which do not have ACPI tables, have been fixed for systems w/o local APIC, but left a wreckage for systems which have neither ACPI nor mptables, but the CPU has an APIC, e.g. virtualbox. The boot process crashes in prefill_possible_map() as it wants to register the boot cpu, which needs to access the local apic, but the local APIC is not yet mapped. There is no reason why init_apic_mapping() can't be invoked before prefill_possible_map(). So instead of playing another silly early mapping game, as the ACPI/mptables code does, we just move init_apic_mapping() before the call to prefill_possible_map(). In hindsight, I should have noticed that combination earlier. Sorry for the churn (also in stable)! Fixes: `ff8560512b` ("x86/boot/smp: Don't try to poke disabled/non-existent APIC") Reported-and-debugged-by: Michal Necasek <michal.necasek@oracle.com> Reported-and-tested-by: Wolfgang Bauer <wbauer@tmo.at> Cc: prarit@redhat.com Cc: ville.syrjala@linux.intel.com Cc: michael.thayer@oracle.com Cc: knut.osmundsen@oracle.com Cc: frank.mehnert@oracle.com Cc: Borislav Petkov <bp@alien8.de> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1610282114380.5053@nanos Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-10-29 14:00:46 +02:00
Linus Torvalds	c067affcd3	ACPI fixes for v4.9-rc3 Specifics: - Fix three ACPICA issues related to the interpreter locking and introduced by recent changes in that area (Lv Zheng). - Fix a PCI IRQ management regression introduced during the 4.7 cycle and related to the configuration of shared IRQs on systems with an ISA bus (Sinan Kaya). - Fix up a return value of one function in the APEI code (Punit Agrawal). -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABCAAGBQJYE+wwAAoJEILEb/54YlRxe0wQAKIyO3ktsxxbz2iACxFPZGmn ML1+OTBIGQKYDSGCINhMV5PGd98IMBaVCB9RllG/B9iALb8VCGiJ6AuJKoR7q2pZ 6mr7ioXNfTNlLISykt63cD/Lp/YobZMG6WhoNWzoKslVUQrSWAISV+wGpBxoj08i 8X7t/QtvRIVWfy4H4reDgpQMIKUeDhk6REeb8FESiXYboOvNhbXZpPS+bv8XXEfD bu/ASQIZs3Z9YB2uTij16Tx95eJETHhr9zYIxbi848YDxjelpZNs1QuVoYxq3GO2 S7vbGHZITMLSEz4jD0w98YvDcb0jywfSXX53NBMaSJqOAleVKNH1rE8KvywG6H0s 2298yBc0o0ldBb+4nszoL7NyGQKDrmxMEGBRFlk67gZ1LA4cpk+Usv9Q0GmNx38H KQfuA144n9ICaM9Kw9CKD8xrQ+PtpoTIBXzKGsdqIwHsS+2XSwzgp4IWEuMRfZNu 5ermtO2tRz47UDnQ5UxdweB0n2pEMrZXDDzBONdqp3ds4aOvOvVqQP3OB+iMaMrT rPvVlYLr2Q+ekIzHPCpB7uBwI//bJ3L2cLqHxp9hlbNeFQWdv/NlMlmbWgYVVpn3 tCvqqpH+jtof8lnmWZtBdc4M0GhwM+CP6TYd65n2yDqVCfPraXokE0UdNBW+je4Z BRuhjVEi0vBsrc3M4IC9 =wjnL -----END PGP SIGNATURE----- Merge tag 'acpi-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fixes from Rafael Wysocki: "These fix recent ACPICA regressions, an older PCI IRQ management regression, and an incorrect return value of a function in the APEI code. Specifics: - Fix three ACPICA issues related to the interpreter locking and introduced by recent changes in that area (Lv Zheng). - Fix a PCI IRQ management regression introduced during the 4.7 cycle and related to the configuration of shared IRQs on systems with an ISA bus (Sinan Kaya). - Fix up a return value of one function in the APEI code (Punit Agrawal)" * tag 'acpi-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPICA: Dispatcher: Fix interpreter locking around acpi_ev_initialize_region() ACPICA: Dispatcher: Fix an unbalanced lock exit path in acpi_ds_auto_serialize_method() ACPICA: Dispatcher: Fix order issue of method termination ACPI / APEI: Fix incorrect return value of ghes_proc() ACPI/PCI: pci_link: Include PIRQ_PENALTY_PCI_USING for ISA IRQs ACPI/PCI: pci_link: penalize SCI correctly ACPI/PCI/IRQ: assign ISA IRQ directly during early boot stages	2016-10-28 18:34:19 -07:00
Borislav Petkov	1c27f646b1	x86/microcode/AMD: Fix more fallout from CONFIG_RANDOMIZE_MEMORY=y We needed the physical address of the container in order to compute the offset within the relocated ramdisk. And we did this by doing __pa() on the virtual address. However, __pa() does checks whether the physical address is within PAGE_OFFSET and __START_KERNEL_map - see __phys_addr() - which fail if we have CONFIG_RANDOMIZE_MEMORY enabled: we feed a virtual address which doesn't have the randomization offset into a function which uses PAGE_OFFSET which does have that offset. This makes this check fire: VIRTUAL_BUG_ON((x > y) \|\| !phys_addr_valid(x)); ^^^^^^ due to the randomization offset. The fix is as simple as using __pa_nodebug() because we do that randomization offset accounting later in that function ourselves. Reported-by: Bob Peterson <rpeterso@redhat.com> Tested-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm <linux-mm@kvack.org> Cc: stable@vger.kernel.org # 4.9 Link: http://lkml.kernel.org/r/20161027123623.j2jri5bandimboff@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-10-28 10:29:59 +02:00
Josh Poimboeuf	24d86f5909	x86/unwind: Ensure stack grows down Add a sanity check to ensure the stack only grows down, and print a warning if the check fails. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161027131058.tpdffwlqipv7pcd6@treble Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-10-28 08:16:45 +02:00
Josh Poimboeuf	b6959a3621	x86/unwind: Detect bad stack return address If __kernel_text_address() doesn't recognize a return address on the stack, it probably means that it's some generated code which __kernel_text_address() doesn't know about yet. Otherwise there's probably some stack corruption. Either way, warn about it. Use printk_deferred_once() because the unwinder can be called with the console lock by lockdep via save_stack_trace(). Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/2d897898f324e275943b590d160b55e482bba65f.1477496147.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-10-27 08:32:38 +02:00
Josh Poimboeuf	0d2b8579ad	x86/dumpstack: Warn on stack recursion Print a warning if stack recursion is detected. Use printk_deferred_once() because the unwinder can be called with the console lock by lockdep via save_stack_trace(). Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/def18247aafaab480844484398e793f552b79bda.1477496147.git.jpoimboe@redhat.com [ Unbroke the lines. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-10-27 08:32:38 +02:00
Josh Poimboeuf	c32c47c68a	x86/unwind: Warn on bad frame pointer Detect situations in the unwinder where the frame pointer refers to a bad address, and print an appropriate warning. Use printk_deferred_once() because the unwinder can be called with the console lock by lockdep via save_stack_trace(). Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/03c888f6f7414d54fa56b393ea25482be6899b5f.1477496147.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-10-27 08:32:37 +02:00
Fenghua Yu	c1c7c3f9d6	x86/intel_rdt: Pick up L3/L2 RDT parameters from CPUID Define struct rdt_resource to hold all the parameterized values for an RDT resource and fill in the CPUID enumerated values from leaf 0x10 if available. Hard code them for the MSR detected Haswells. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "David Carrillo-Cisneros" <davidcc@google.com> Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com> Cc: "Peter Zijlstra" <peterz@infradead.org> Cc: "Stephane Eranian" <eranian@google.com> Cc: "Dave Hansen" <dave.hansen@intel.com> Cc: "Shaohua Li" <shli@fb.com> Cc: "Nilay Vaish" <nilayvaish@gmail.com> Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com> Cc: "Ingo Molnar" <mingo@elte.hu> Cc: "Borislav Petkov" <bp@suse.de> Cc: "H. Peter Anvin" <h.peter.anvin@intel.com> Link: http://lkml.kernel.org/r/1477142405-32078-9-git-send-email-fenghua.yu@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-10-26 23:12:39 +02:00
Fenghua Yu	113c60970c	x86/intel_rdt: Add Haswell feature discovery Some Haswell generation CPUs support RDT, but they don't enumerate this via CPUID. Use rdmsr_safe() and wrmsr_safe() to probe the MSRs on cpu model 63 (INTEL_FAM6_HASWELL_X) Move the relevant defines into a common header file which is shared between RDT/CQM and RDT/Allocation to avoid duplication. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "David Carrillo-Cisneros" <davidcc@google.com> Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com> Cc: "Peter Zijlstra" <peterz@infradead.org> Cc: "Stephane Eranian" <eranian@google.com> Cc: "Dave Hansen" <dave.hansen@intel.com> Cc: "Shaohua Li" <shli@fb.com> Cc: "Nilay Vaish" <nilayvaish@gmail.com> Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com> Cc: "Ingo Molnar" <mingo@elte.hu> Cc: "Borislav Petkov" <bp@suse.de> Cc: "H. Peter Anvin" <h.peter.anvin@intel.com> Link: http://lkml.kernel.org/r/1477142405-32078-8-git-send-email-fenghua.yu@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-10-26 23:12:38 +02:00
Fenghua Yu	78e99b4a2b	x86/intel_rdt: Add CONFIG, Makefile, and basic initialization Introduce CONFIG_INTEL_RDT_A (default: no, dependent on CPU_SUP_INTEL) to control inclusion of Resource Director Technology in the build. Simple init() routine just checks which features are present. If they are pr_info() one line summary for each feature for now. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "David Carrillo-Cisneros" <davidcc@google.com> Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com> Cc: "Peter Zijlstra" <peterz@infradead.org> Cc: "Stephane Eranian" <eranian@google.com> Cc: "Dave Hansen" <dave.hansen@intel.com> Cc: "Shaohua Li" <shli@fb.com> Cc: "Nilay Vaish" <nilayvaish@gmail.com> Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com> Cc: "Ingo Molnar" <mingo@elte.hu> Cc: "Borislav Petkov" <bp@suse.de> Cc: "H. Peter Anvin" <h.peter.anvin@intel.com> Link: http://lkml.kernel.org/r/1477142405-32078-7-git-send-email-fenghua.yu@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-10-26 23:12:38 +02:00
Fenghua Yu	4ab1586488	x86/cpufeature: Add RDT CPUID feature bits Check CPUID leaves for all the Resource Director Technology (RDT) Cache Allocation Technology (CAT) bits. Presence of allocation features: CPUID.(EAX=7H, ECX=0):EBX[bit 15] X86_FEATURE_RDT_A L2 and L3 caches are each separately enabled: CPUID.(EAX=10H, ECX=0):EBX[bit 1] X86_FEATURE_CAT_L3 CPUID.(EAX=10H, ECX=0):EBX[bit 2] X86_FEATURE_CAT_L2 L3 cache may support independent control of allocation for code and data (CDP = Code/Data Prioritization): CPUID.(EAX=10H, ECX=1):ECX[bit 2] X86_FEATURE_CDP_L3 [ tglx: Fixed up Borislavs comments and moved the feature bits into a gap ] Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Acked-by: "Borislav Petkov" <bp@suse.de> Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "David Carrillo-Cisneros" <davidcc@google.com> Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com> Cc: "Peter Zijlstra" <peterz@infradead.org> Cc: "Stephane Eranian" <eranian@google.com> Cc: "Dave Hansen" <dave.hansen@intel.com> Cc: "Shaohua Li" <shli@fb.com> Cc: "Nilay Vaish" <nilayvaish@gmail.com> Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com> Cc: "Ingo Molnar" <mingo@elte.hu> Cc: "H. Peter Anvin" <h.peter.anvin@intel.com> Link: http://lkml.kernel.org/r/1477142405-32078-5-git-send-email-fenghua.yu@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-10-26 23:12:38 +02:00
Fenghua Yu	d57e3ab7e3	x86/intel_cacheinfo: Enable cache id in cache info Cache id is retrieved from APIC ID and CPUID leaf 4 on x86. For more details please see the section on "Cache ID Extraction Parameters" in "Intel 64 Architecture Processor Topology Enumeration". Also the documentation of the CPUID instruction in the "Intel 64 and IA-32 Architectures Software Developer's Manual" Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "David Carrillo-Cisneros" <davidcc@google.com> Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com> Cc: "Peter Zijlstra" <peterz@infradead.org> Cc: "Stephane Eranian" <eranian@google.com> Cc: "Dave Hansen" <dave.hansen@intel.com> Cc: "Shaohua Li" <shli@fb.com> Cc: "Nilay Vaish" <nilayvaish@gmail.com> Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com> Cc: "Ingo Molnar" <mingo@elte.hu> Cc: "Borislav Petkov" <bp@suse.de> Cc: "H. Peter Anvin" <h.peter.anvin@intel.com> Link: http://lkml.kernel.org/r/1477142405-32078-4-git-send-email-fenghua.yu@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-10-26 23:12:37 +02:00
Steven Rostedt	5de0a8c0c2	x86: Fix export for mcount and __fentry__ Commit `784d5699ed` ("x86: move exports to actual definitions") removed the EXPORT_SYMBOL(__fentry__) and EXPORT_SYMBOL(mcount) from x8664_ksyms_64.c, and added EXPORT_SYMBOL(function_hook) in mcount_64.S instead. The problem is that function_hook isn't a function at all, but a macro that is defined as either mcount or __fentry__ depending on the support from gcc. Originally, I thought this was a macro issue, like what __stringify() is used for. But the problem is a bit deeper. The Makefile.build has some magic that does post processing of files to create the CRC bindings. It does some searches for EXPORT_SYMBOL() and because it finds a macro name and not the actual functions, this causes function_hook not to be converted into mcount or __fentry__ and they are missed. Instead of adding more magic to Makefile.build, just add EXPORT_SYMBOL() for mcount and __fentry__ where the ifdef is used. Since this is assembly and not C, it doesn't require being set after the function is defined. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Tested-by: Borislav Petkov <bp@alien8.de> Cc: Gabriel C <nix.or.die@gmail.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Link: http://lkml.kernel.org/r/20161024150148.4f9d90e4@gandalf.local.home Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-10-26 12:38:17 +02:00
Michael Ellerman	92b2327829	kernel/smp: Make the SMP boot message common on all arches Currently after bringing up secondary CPUs all arches print "Brought up %d CPUs". On x86 they also print the number of nodes that were brought online. It would be nice to also print the number of nodes on other arches. Although we could override smp_announce() on the other ~10 NUMA aware arches, it seems simpler to just always print the number of nodes. On non-NUMA arches there is just always 1 node. Having done that, smp_announce() is no longer weak, and seems small enough to just pull directly into smp_init(). Also update the printing of "%d CPUs" to be smart when an SMP kernel is booted on a single CPU system, or when only one CPU is available, eg: smp: Brought up 2 nodes, 1 CPU Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: akpm@osdl.org Cc: jgross@suse.com Cc: ak@linux.intel.com Cc: tim.c.chen@linux.intel.com Cc: len.brown@intel.com Cc: peterz@infradead.org Cc: richard@nod.at Cc: jolsa@redhat.com Cc: boris.ostrovsky@oracle.com Cc: mgorman@techsingularity.net Link: http://lkml.kernel.org/r/1477460275-8266-2-git-send-email-mpe@ellerman.id.au Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-10-26 12:02:35 +02:00
Josh Poimboeuf	0ee1dd9f5e	x86/dumpstack: Remove raw stack dump For mostly historical reasons, the x86 oops dump shows the raw stack values: ... [registers] Stack: ffff880079af7350 ffff880079905400 0000000000000000 ffffc900008f3ae0 ffffffffa0196610 0000000000000001 00010000ffffffff 0000000087654321 0000000000000002 0000000000000000 0000000000000000 0000000000000000 Call Trace: ... This seems to be an artifact from long ago, and probably isn't needed anymore. It generally just adds noise to the dump, and it can be actively harmful because it leaks kernel addresses. Linus says: "The stack dump actually goes back to forever, and it used to be useful back in 1992 or so. But it used to be useful mainly because stacks were simpler and we didn't have very good call traces anyway. I definitely remember having used them - I just do not remember having used them in the last ten+ years. Of course, it's still true that if you can trigger an oops, you've likely already lost the security game, but since the stack dump is so useless, let's aim to just remove it and make games like the above harder." This also removes the related 'kstack=' cmdline option and the 'kstack_depth_to_print' sysctl. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/e83bd50df52d8fe88e94d2566426ae40d813bf8f.1477405374.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-10-25 18:40:37 +02:00
Josh Poimboeuf	bb5e5ce545	x86/dumpstack: Remove kernel text addresses from stack dump Printing kernel text addresses in stack dumps is of questionable value, especially now that address randomization is becoming common. It can be a security issue because it leaks kernel addresses. It also affects the usefulness of the stack dump. Linus says: "I actually spend time cleaning up commit messages in logs, because useless data that isn't actually information (random hex numbers) is actively detrimental. It makes commit logs less legible. It also makes it harder to parse dumps. It's not useful. That makes it actively bad. I probably look at more oops reports than most people. I have not found the hex numbers useful for the last five years, because they are just randomized crap. The stack content thing just makes code scroll off the screen etc, for example." The only real downside to removing these addresses is that they can be used to disambiguate duplicate symbol names. However such cases are rare, and the context of the stack dump should be enough to be able to figure it out. There's now a 'faddr2line' script which can be used to convert a function address to a file name and line: $ ./scripts/faddr2line ~/k/vmlinux write_sysrq_trigger+0x51/0x60 write_sysrq_trigger+0x51/0x60: write_sysrq_trigger at drivers/tty/sysrq.c:1098 Or gdb can be used: $ echo "list *write_sysrq_trigger+0x51" \|gdb ~/k/vmlinux \|grep "is in" (gdb) 0xffffffff815b5d83 is in driver_probe_device (/home/jpoimboe/git/linux/drivers/base/dd.c:378). (But note that when there are duplicate symbol names, gdb will only show the first symbol it finds. faddr2line is recommended over gdb because it handles duplicates and it also does function size checking.) Here's an example of what a stack dump looks like after this change: BUG: unable to handle kernel NULL pointer dereference at (null) IP: sysrq_handle_crash+0x45/0x80 PGD 36bfa067 [ 29.650644] PUD 7aca3067 Oops: 0002 [#1] PREEMPT SMP Modules linked in: ... CPU: 1 PID: 786 Comm: bash Tainted: G E 4.9.0-rc1+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-1.fc24 04/01/2014 task: ffff880078582a40 task.stack: ffffc90000ba8000 RIP: 0010:sysrq_handle_crash+0x45/0x80 RSP: 0018:ffffc90000babdc8 EFLAGS: 00010296 RAX: ffff880078582a40 RBX: 0000000000000063 RCX: 0000000000000001 RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000292 RBP: ffffc90000babdc8 R08: 0000000b31866061 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000007 R14: ffffffff81ee8680 R15: 0000000000000000 FS: 00007ffb43869700(0000) GS:ffff88007d400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000007a3e9000 CR4: 00000000001406e0 Stack: ffffc90000babe00 ffffffff81572d08 ffffffff81572bd5 0000000000000002 0000000000000000 ffff880079606600 00007ffb4386e000 ffffc90000babe20 ffffffff81573201 ffff880036a3fd00 fffffffffffffffb ffffc90000babe40 Call Trace: __handle_sysrq+0x138/0x220 ? __handle_sysrq+0x5/0x220 write_sysrq_trigger+0x51/0x60 proc_reg_write+0x42/0x70 __vfs_write+0x37/0x140 ? preempt_count_sub+0xa1/0x100 ? __sb_start_write+0xf5/0x210 ? vfs_write+0x183/0x1a0 vfs_write+0xb8/0x1a0 SyS_write+0x58/0xc0 entry_SYSCALL_64_fastpath+0x1f/0xc2 RIP: 0033:0x7ffb42f55940 RSP: 002b:00007ffd33bb6b18 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000046 RCX: 00007ffb42f55940 RDX: 0000000000000002 RSI: 00007ffb4386e000 RDI: 0000000000000001 RBP: 0000000000000011 R08: 00007ffb4321ea40 R09: 00007ffb43869700 R10: 00007ffb43869700 R11: 0000000000000246 R12: 0000000000778a10 R13: 00007ffd33bb5c00 R14: 0000000000000007 R15: 0000000000000010 Code: 34 e8 d0 34 bc ff 48 c7 c2 3b 2b 57 81 be 01 00 00 00 48 c7 c7 e0 dd e5 81 e8 a8 55 ba ff c7 05 0e 3f de 00 01 00 00 00 0f ae f8 <c6> 04 25 00 00 00 00 01 5d c3 e8 4c 49 bc ff 84 c0 75 c3 48 c7 RIP: sysrq_handle_crash+0x45/0x80 RSP: ffffc90000babdc8 CR2: 0000000000000000 Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/69329cb29b8f324bb5fcea14d61d224807fb6488.1477405374.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-10-25 18:40:37 +02:00
Borislav Petkov	14cfbe55c7	x86/microcode: Bump driver version, update copyrights Let's increment that number finally: it is long overdue. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161025095522.11964-13-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-10-25 12:28:59 +02:00
Borislav Petkov	06b8534cb7	x86/microcode: Rework microcode loading Yeah, I know, I know, this is a huuge patch and reviewing it is hard. Sorry but this is the only way I could think of in which I can rewrite the microcode patches loading procedure without breaking (knowingly) the driver. So maybe this patch is easier to review if one looks at the files after the patch has been applied instead at the diff. Because then it becomes pretty obvious: * The BSP-loading path - load_ucode_bsp() is working independently from the AP path now and it doesn't save any pointers or patches anymore - it solely parses the builtin or initrd microcode and applies the patch. That's it. This fixes the CONFIG_RANDOMIZE_MEMORY offset fun more solidly. * The AP-loading path - load_ucode_ap() then goes and scans builtin/initrd again for the microcode patches but it caches them this time so that we don't have to do that scan on each AP but only once. This simplifies the code considerably. Then, when we save the microcode from the initrd/builtin, we go and add the relevant patches to our own cache. The AMD side did do that and now the Intel side does it too. So no more pointer copying and blabla, we save the microcode patches ourselves and are independent from initrd/builtin. This whole conversion gives us other benefits like unifying the initrd parsing into a single function: find_microcode_in_initrd() is used by both. The diffstat speaks for itself: 456 insertions(+), 695 deletions(-) Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161025095522.11964-12-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-10-25 12:28:59 +02:00
Borislav Petkov	8027923ab4	x86/microcode/intel: Remove intel_lib.c Its functions are used in intel.c only now, so get rid of it. Make functions static. No functionality change. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161025095522.11964-11-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-10-25 12:28:59 +02:00
Borislav Petkov	76bd11c23a	x86/microcode/amd: Move private inlines to .c and mark local functions static Make them all static as they're used in a single file now. No functionality change. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161025095522.11964-10-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-10-25 12:28:59 +02:00
Borislav Petkov	7f709d0c32	x86/microcode: Collect CPU info on resume We need to reread the CPU's microcode revision after resume because applied microcode gets "forgotten" depending on the sleep state. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161025095522.11964-9-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-10-25 12:28:58 +02:00
Borislav Petkov	6b14b81899	x86/microcode: Issue the debug printk on resume only on success Move it after the patch application function which also checks whether we were successful. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161025095522.11964-8-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-10-25 12:28:58 +02:00
Borislav Petkov	b3763a672d	x86/microcode/amd: Hand down the CPU family Will be needed in a following patch. No functionality change. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161025095522.11964-7-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-10-25 12:28:58 +02:00
Borislav Petkov	058dc49803	x86/microcode: Export the microcode cache linked list It will be used by both drivers so move it to core.c. No functionality change. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161025095522.11964-6-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-10-25 12:28:58 +02:00
Borislav Petkov	f61337d984	x86/microcode/intel: Simplify generic_load_microcode() Make it return the ucode_state directly instead of assigning to a state variable and jumping to an out: label. No functionality change. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161025095522.11964-4-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-10-25 12:28:57 +02:00
Borislav Petkov	5879a26752	x86/microcode: Move driver authors to CREDITS They're not active anymore. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161025095522.11964-3-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-10-25 12:28:57 +02:00
Borislav Petkov	777284b66f	x86/microcode: Run the AP-loading routine only on the application processors cpu_init() is run also on the BSP (in addition to the APs): x86_64_start_kernel \|-> x86_64_start_reservations \|-> start_kernel \|-> trap_init \|-> cpu_init \|-> load_ucode_ap ... but we run the AP (Application Processors) microcode loading routine there too even though we have a BSP-specific routine for that: load_ucode_bsp(). Which is unnecessary. So let's limit the AP microcode loading routine to the APs only. Remove a useless comment while at it. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161025095522.11964-2-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-10-25 12:28:57 +02:00
Borislav Petkov	59c6f278bd	x86/cpu: Get rid of the show_msr= boot option It is useless as it dumps the MSRs too early BUT(!) we do set MSRs later too. Also, it dumps only BSP MSRs as it gets called only for CPU 0. And the MSR range array would need constant updating anyway, and so on and so on... Oh, and we have msr.ko and msr-tools which are the much better solution anyway. So off it goes... Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161024173844.23038-4-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-10-25 11:48:50 +02:00
Borislav Petkov	62a67e123e	x86/cpu: Merge bugs.c and bugs_64.c Should be easier when following boot paths. It probably is a left over from the x86 unification eons ago. No functionality change. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161024173844.23038-3-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-10-25 11:48:50 +02:00
Borislav Petkov	d54ff31dd8	x86/cpu: Remove the printk format specifier in "CPU0: " We're using a literal, move it into the string. No functionality change. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161024173844.23038-2-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-10-25 11:48:49 +02:00
Arnd Bergmann	d320b9a5bd	x86/quirks: Hide maybe-uninitialized warning gcc -Wmaybe-uninitialized detects that quirk_intel_brickland_xeon_ras_cap uses uninitialized data when CONFIG_PCI is not set: arch/x86/kernel/quirks.c: In function ‘quirk_intel_brickland_xeon_ras_cap’: arch/x86/kernel/quirks.c:641:13: error: ‘capid0’ is used uninitialized in this function [-Werror=uninitialized] However, the function is also not called in this configuration, so we can avoid the warning by moving the existing #ifdef to cover it as well. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-pci@vger.kernel.org Link: http://lkml.kernel.org/r/20161024153325.2752428-1-arnd@arndb.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-10-25 11:45:13 +02:00
Josh Poimboeuf	7fbe6ac024	x86/unwind: Fix empty stack dereference in guess unwinder Vince Waver reported the following bug: WARNING: CPU: 0 PID: 21338 at arch/x86/mm/fault.c:435 vmalloc_fault+0x58/0x1f0 CPU: 0 PID: 21338 Comm: perf_fuzzer Not tainted 4.8.0+ #37 Hardware name: Hewlett-Packard HP Compaq Pro 6305 SFF/1850, BIOS K06 v02.57 08/16/2013 Call Trace: <NMI> ? dump_stack+0x46/0x59 ? __warn+0xd5/0xee ? vmalloc_fault+0x58/0x1f0 ? __do_page_fault+0x6d/0x48e ? perf_log_throttle+0xa4/0xf4 ? trace_page_fault+0x22/0x30 ? __unwind_start+0x28/0x42 ? perf_callchain_kernel+0x75/0xac ? get_perf_callchain+0x13a/0x1f0 ? perf_callchain+0x6a/0x6c ? perf_prepare_sample+0x71/0x2eb ? perf_event_output_forward+0x1a/0x54 ? __default_send_IPI_shortcut+0x10/0x2d ? __perf_event_overflow+0xfb/0x167 ? x86_pmu_handle_irq+0x113/0x150 ? native_read_msr+0x6/0x34 ? perf_event_nmi_handler+0x22/0x39 ? perf_ibs_nmi_handler+0x4a/0x51 ? perf_event_nmi_handler+0x22/0x39 ? nmi_handle+0x4d/0xf0 ? perf_ibs_handle_irq+0x3d1/0x3d1 ? default_do_nmi+0x3c/0xd5 ? do_nmi+0x92/0x102 ? end_repeat_nmi+0x1a/0x1e ? entry_SYSCALL_64_after_swapgs+0x12/0x4a ? entry_SYSCALL_64_after_swapgs+0x12/0x4a ? entry_SYSCALL_64_after_swapgs+0x12/0x4a <EOE> ^A4---[ end trace 632723104d47d31a ]--- BUG: stack guard page was hit at ffffc90008500000 (stack is ffffc900084fc000..ffffc900084fffff) kernel stack overflow (page fault): 0000 [#1] SMP ... The NMI hit in the entry code right after setting up the stack pointer from 'cpu_current_top_of_stack', so the kernel stack was empty. The 'guess' version of __unwind_start() attempted to dereference the "top of stack" pointer, which is not actually on the stack. Add a check in the guess unwinder to deal with an empty stack. (The frame pointer unwinder already has such a check.) Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: `7c7900f897` ("x86/unwind: Add new unwind interface and implementations") Link: http://lkml.kernel.org/r/20161024133127.e5evgeebdbohnmpb@treble Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-10-25 11:36:43 +02:00
Sinan Kaya	f1caa61df2	ACPI/PCI: pci_link: penalize SCI correctly Ondrej reported that IRQs stopped working in v4.7 on several platforms. A typical scenario, from Ondrej's VT82C694X/694X, is: ACPI: Using PIC for interrupt routing ACPI: PCI Interrupt Link [LNKA] (IRQs 1 3 4 5 6 7 10 *11 12 14 15) ACPI: No IRQ available for PCI Interrupt Link [LNKA] 8139too 0000:00:0f.0: PCI INT A: no GSI We're using PIC routing, so acpi_irq_balance == 0, and LNKA is already active at IRQ 11. In that case, acpi_pci_link_allocate() only tries to use the active IRQ (IRQ 11) which also happens to be the SCI. We should penalize the SCI by PIRQ_PENALTY_PCI_USING, but irq_get_trigger_type(11) returns something other than IRQ_TYPE_LEVEL_LOW, so we penalize it by PIRQ_PENALTY_ISA_ALWAYS instead, which makes acpi_pci_link_allocate() assume the IRQ isn't available and give up. Add acpi_penalize_sci_irq() so platforms can tell us the SCI IRQ, trigger, and polarity directly and we don't have to depend on irq_get_trigger_type(). Fixes: `103544d869` (ACPI,PCI,IRQ: reduce resource requirements) Link: http://lkml.kernel.org/r/201609251512.05657.linux@rainbow-software.org Reported-by: Ondrej Zary <linux@rainbow-software.org> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Sinan Kaya <okaya@codeaurora.org> Tested-by: Jonathan Liu <net147@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-10-24 14:18:14 +02:00
Linus Torvalds	3e9679a365	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Three fixes, a hw-enablement and a cross-arch fix/enablement change: - SGI/UV fix for older platforms - x32 signal handling fix - older x86 platform bootup APIC fix - AVX512-4VNNIW (Neural Network Instructions) and AVX512-4FMAPS (Multiply Accumulation Single precision instructions) enablement. - move thread_info back into x86 specific code, to make life easier for other architectures trying to make use of CONFIG_THREAD_INFO_IN_TASK_STRUCT=y" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot/smp: Don't try to poke disabled/non-existent APIC sched/core, x86: Make struct thread_info arch specific again x86/signal: Remove bogus user_64bit_mode() check from sigaction_compat_abi() x86/platform/UV: Fix support for EFI_OLD_MEMMAP after BIOS callback updates x86/cpufeature: Add AVX512_4VNNIW and AVX512_4FMAPS features x86/vmware: Skip timer_irq_works() check on VMware	2016-10-22 09:58:49 -07:00
Ville Syrjälä	ff8560512b	x86/boot/smp: Don't try to poke disabled/non-existent APIC Apparently trying to poke a disabled or non-existent APIC leads to a box that doesn't even boot. Let's not do that. No real clue if this is the right fix, but at least my P3 machine boots again. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Len Brown <len.brown@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yinghai Lu <yinghai@kernel.org> Cc: dyoung@redhat.com Cc: kexec@lists.infradead.org Cc: stable@vger.kernel.org Fixes: `2a51fe083e` ("arch/x86: Handle non enumerated CPU after physical hotplug") Link: http://lkml.kernel.org/r/1477102684-5092-1-git-send-email-ville.syrjala@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-10-22 10:47:54 +02:00
Sebastian Andrzej Siewior	d2d9c4a3d0	x86/apic: Get rid of "warning: 'acpi_ioapic_lock' defined but not used" kbuild test robot reported this against the -RT tree: \|>> arch/x86/kernel/acpi/boot.c:90:21: warning: 'acpi_ioapic_lock' defined but not used [-Wunused-variable] \| static DEFINE_MUTEX(acpi_ioapic_lock); \| ^ \| include/linux/mutex_rt.h:27:15: note: in definition of macro 'DEFINE_MUTEX' \| struct mutex mutexname = __MUTEX_INITIALIZER(mutexname) ^~~~~~~~~ which is also true (as in non-used) for !RT but the compiler does not emit a warning. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161021084449.32523-1-bigeasy@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-10-21 11:09:14 +02:00
Alexey Makhalov	cf11372949	x86/vmware: Read tsc_khz only once at boot time Re-factor the vmware platform setup code to query the hypervisor for tsc frequency only once during boot. Since the VMware hypervisor guarantees constant TSC, calibrate_tsc now uses the saved value. Signed-off-by: Alexey Makhalov <amakhalov@vmware.com> Acked-by: Alok N Kataria <akataria@vmware.com> Cc: virtualization@lists.linux-foundation.org Link: http://lkml.kernel.org/r/20161020050211.GA25304@amakhalov-virtual-machine Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-10-21 10:12:11 +02:00
Josh Poimboeuf	6fa81a12b3	x86/dumpstack: Print orig_ax in __show_regs() The value of regs->orig_ax contains potentially useful debugging data: For syscalls it contains the syscall number. For interrupts it contains the (negated) vector number. To reduce noise, print it only if it has a useful value (i.e., something other than -1). Here's what it looks like for a write syscall: RIP: 0033:[<00007f53ad7b1940>] 0x7f53ad7b1940 RSP: 002b:00007fff8de66558 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000046 RCX: 00007f53ad7b1940 RDX: 0000000000000002 RSI: 00007f53ae0ca000 RDI: 0000000000000001 ... Suggested-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/93f0fe0307a4af884d3fca00edabcc8cff236002.1476973742.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-10-21 09:26:05 +02:00
Josh Poimboeuf	1141c3e39c	x86/dumpstack: Fix duplicate RIP address display in __show_regs() The RIP address is shown twice in __show_regs(). Before: RIP: 0010:[<ffffffff81070446>] [<ffffffff81070446>] native_write_msr+0x6/0x30 After: RIP: 0010:[<ffffffff81070446>] native_write_msr+0x6/0x30 Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/b3fda66f36761759b000883b059cdd9a7649dcc1.1476973742.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-10-21 09:26:04 +02:00
Josh Poimboeuf	3b3fa11bc7	x86/dumpstack: Print any pt_regs found on the stack Now that we can find pt_regs registers on the stack, print them. Here's an example of what it looks like: Call Trace: <IRQ> [<ffffffff8144b793>] dump_stack+0x86/0xc3 [<ffffffff81142c73>] hrtimer_interrupt+0xb3/0x1c0 [<ffffffff8105eb86>] local_apic_timer_interrupt+0x36/0x60 [<ffffffff818b27cd>] smp_apic_timer_interrupt+0x3d/0x50 [<ffffffff818b06ee>] apic_timer_interrupt+0x9e/0xb0 RIP: 0010:[<ffffffff818aef43>] [<ffffffff818aef43>] _raw_spin_unlock_irq+0x33/0x60 RSP: 0018:ffff880079c4f760 EFLAGS: 00000202 RAX: ffff880078738000 RBX: ffff88007d3da0c0 RCX: 0000000000000007 RDX: 0000000000006d78 RSI: ffff8800787388f0 RDI: ffff880078738000 RBP: ffff880079c4f768 R08: 0000002199088f38 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff81e0d540 R13: ffff8800369fb700 R14: 0000000000000000 R15: ffff880078738000 <EOI> [<ffffffff810e1f14>] finish_task_switch+0xb4/0x250 [<ffffffff810e1ed6>] ? finish_task_switch+0x76/0x250 [<ffffffff818a7b61>] __schedule+0x3e1/0xb20 ... [<ffffffff810759c8>] trace_do_page_fault+0x58/0x2c0 [<ffffffff8106f7dc>] do_async_page_fault+0x2c/0xa0 [<ffffffff818b1dd8>] async_page_fault+0x28/0x30 RIP: 0010:[<ffffffff8145b062>] [<ffffffff8145b062>] __clear_user+0x42/0x70 RSP: 0018:ffff880079c4fd38 EFLAGS: 00010202 RAX: 0000000000000000 RBX: 0000000000000138 RCX: 0000000000000138 RDX: 0000000000000000 RSI: 0000000000000008 RDI: 000000000061b640 RBP: ffff880079c4fd48 R08: 0000002198feefd7 R09: ffffffff82a40928 R10: 0000000000000001 R11: 0000000000000000 R12: 000000000061b640 R13: 0000000000000000 R14: ffff880079c50000 R15: ffff8800791d7400 [<ffffffff8145b043>] ? __clear_user+0x23/0x70 [<ffffffff8145b0fb>] clear_user+0x2b/0x40 [<ffffffff812fbda2>] load_elf_binary+0x1472/0x1750 [<ffffffff8129a591>] search_binary_handler+0xa1/0x200 [<ffffffff8129b69b>] do_execveat_common.isra.36+0x6cb/0x9f0 [<ffffffff8129b5f3>] ? do_execveat_common.isra.36+0x623/0x9f0 [<ffffffff8129bcaa>] SyS_execve+0x3a/0x50 [<ffffffff81003f5c>] do_syscall_64+0x6c/0x1e0 [<ffffffff818afa3f>] entry_SYSCALL64_slow_path+0x25/0x25 RIP: 0033:[<00007fd2e2f2e537>] [<00007fd2e2f2e537>] 0x7fd2e2f2e537 RSP: 002b:00007ffc449c5fc8 EFLAGS: 00000246 RAX: ffffffffffffffda RBX: 00007ffc449c8860 RCX: 00007fd2e2f2e537 RDX: 000000000127cc40 RSI: 00007ffc449c8860 RDI: 00007ffc449c6029 RBP: 00007ffc449c60b0 R08: 65726f632d667265 R09: 00007ffc449c5e20 R10: 00000000000005a7 R11: 0000000000000246 R12: 000000000127cc40 R13: 000000000127ce05 R14: 00007ffc449c6029 R15: 000000000127ce01 Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/5cc2c512ec82cfba00dd22467644d4ed751a48c0.1476973742.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-10-21 09:26:04 +02:00
Josh Poimboeuf	79439d8e15	x86/dumpstack: Print stack identifier on its own line show_trace_log_lvl() prints the stack id (e.g. "<IRQ>") without a newline so that any stack address printed after it will appear on the same line. That causes the first stack address to be vertically misaligned with the rest, making it visually cluttered and slightly confusing: Call Trace: <IRQ> [<ffffffff814431c3>] dump_stack+0x86/0xc3 [<ffffffff8100828b>] perf_callchain_kernel+0x14b/0x160 [<ffffffff811e915f>] get_perf_callchain+0x15f/0x2b0 ... <EOI> [<ffffffff8189c6c3>] ? _raw_spin_unlock_irq+0x33/0x60 [<ffffffff810e1c84>] finish_task_switch+0xb4/0x250 [<ffffffff8106f7dc>] do_async_page_fault+0x2c/0xa0 It will look worse once we start printing pt_regs registers found in the middle of the stack: <IRQ> RIP: 0010:[<ffffffff8189c6c3>] [<ffffffff8189c6c3>] _raw_spin_unlock_irq+0x33/0x60 RSP: 0018:ffff88007876f720 EFLAGS: 00000206 RAX: ffff8800786caa40 RBX: ffff88007d5da140 RCX: 0000000000000007 ... Improve readability by adding a newline to the stack name: Call Trace: <IRQ> [<ffffffff814431c3>] dump_stack+0x86/0xc3 [<ffffffff8100828b>] perf_callchain_kernel+0x14b/0x160 [<ffffffff811e915f>] get_perf_callchain+0x15f/0x2b0 ... <EOI> [<ffffffff8189c6c3>] ? _raw_spin_unlock_irq+0x33/0x60 [<ffffffff810e1c84>] finish_task_switch+0xb4/0x250 [<ffffffff8106f7dc>] do_async_page_fault+0x2c/0xa0 Now that "continued" lines are no longer needed, we can also remove the hack of using the empty string (aka KERN_CONT) and replace it with KERN_DEFAULT. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/9bdd6dee2c74555d45500939fcc155997dc7889e.1476973742.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-10-21 09:26:04 +02:00
Josh Poimboeuf	acb4608ad1	x86/unwind: Create stack frames for saved syscall registers The entry code doesn't encode the pt_regs pointer for syscalls. But the pt_regs are always at the same location, so we can add a manual check for them. A later patch prints them as part of the oops stack dump. They could be useful, for example, to determine the arguments to a system call. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/e176aa9272930cd3f51fda0b94e2eae356677da4.1476973742.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-10-21 09:26:04 +02:00
Josh Poimboeuf	946c191161	x86/entry/unwind: Create stack frames for saved interrupt registers With frame pointers, when a task is interrupted, its stack is no longer completely reliable because the function could have been interrupted before it had a chance to save the previous frame pointer on the stack. So the caller of the interrupted function could get skipped by a stack trace. This is problematic for live patching, which needs to know whether a stack trace of a sleeping task can be relied upon. There's currently no way to detect if a sleeping task was interrupted by a page fault exception or preemption before it went to sleep. Another issue is that when dumping the stack of an interrupted task, the unwinder has no way of knowing where the saved pt_regs registers are, so it can't print them. This solves those issues by encoding the pt_regs pointer in the frame pointer on entry from an interrupt or an exception. This patch also updates the unwinder to be able to decode it, because otherwise the unwinder would be broken by this change. Note that this causes a change in the behavior of the unwinder: each instance of a pt_regs on the stack is now considered a "frame". So callers of unwind_get_return_address() will now get an occasional 'regs->ip' address that would have previously been skipped over. Suggested-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/8b9f84a21e39d249049e0547b559ff8da0df0988.1476973742.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-10-21 09:26:03 +02:00
Dmitry Safonov	ed1e7db33c	x86/signal: Remove bogus user_64bit_mode() check from sigaction_compat_abi() The recent introduction of SA_X32/IA32 sa_flags added a check for user_64bit_mode() into sigaction_compat_abi(). user_64bit_mode() is true for native 64-bit processes and x32 processes. Due to that the function returns w/o setting the SA_X32_ABI flag for X32 processes. In consequence the kernel attempts to deliver the signal to the X32 process in native 64-bit mode causing the process to segfault. Remove the check, so the actual check for X32 mode which sets the ABI flag can be reached. There is no side effect for native 64-bit mode. [ tglx: Rewrote changelog ] Fixes: `6846351052` ("x86/signal: Add SA_{X32,IA32}_ABI sa_flags") Reported-by: Mikulas Patocka <mpatocka@redhat.com> Tested-by: Adam Borowski <kilobyte@angband.pl> Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: linux-mm@kvack.org Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Pavel Emelyanov <xemul@virtuozzo.com> Link: http://lkml.kernel.org/r/CAJwJo6Z8ZWPqNfT6t-i8GW1MKxQrKDUagQqnZ%2B0%2B697%3DMyVeGg@mail.gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-10-20 13:05:15 +02:00
Josh Poimboeuf	e728f61ce0	x86/boot: Move the _stext marker to before the boot code When core_kernel_text() is used to determine whether an address on a task's stack trace is a kernel text address, it incorrectly returns false for early text addresses for the head code between the _text and _stext markers. Among other things, this can cause the unwinder to behave incorrectly when unwinding to x86 head code. Head code is text code too, so mark it as such. This seems to match the intent of other users of the _stext symbol, and it also seems consistent with what other architectures are already doing. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nilay Vaish <nilayvaish@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/789cf978866420e72fa89df44aa2849426ac378d.1474480779.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-10-20 09:15:24 +02:00
Josh Poimboeuf	22dc391865	x86/boot: Fix the end of the stack for idle tasks Thanks to all the recent x86 entry code refactoring, most tasks' kernel stacks start at the same offset right below their saved pt_regs, regardless of which syscall was used to enter the kernel. That creates a nice convention which makes it straightforward to identify the end of the stack, which can be useful for the unwinder to verify the stack is sane. However, the boot CPU's idle "swapper" task doesn't follow that convention. Fix that by starting its stack at a sizeof(pt_regs) offset from the end of the stack page. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nilay Vaish <nilayvaish@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/81aee3beb6ed88e44f1bea6986bb7b65c368f77a.1474480779.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-10-20 09:15:23 +02:00
Josh Poimboeuf	595c1e645d	x86/boot/64: Put a real return address on the idle task stack The frame at the end of each idle task stack has a zeroed return address. This is inconsistent with real task stacks, which have a real return address at that spot. This inconsistency can be confusing for stack unwinders. It also hides useful information about what asm code was involved in calling into C. Make it a real address by using the side effect of a call instruction to push the instruction pointer on the stack. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nilay Vaish <nilayvaish@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/f59593ae7b15d5126f872b0a23143173d28aa32d.1474480779.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-10-20 09:15:23 +02:00
Josh Poimboeuf	a9468df5ad	x86/boot/64: Use a common function for starting CPUs There are two different pieces of code for starting a CPU: start_cpu0() and the end of secondary_startup_64(). They're identical except for the stack setup. Combine the common parts into a shared start_cpu() function. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nilay Vaish <nilayvaish@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1d692ffa62fcb3cc835a5b254e953f2d9bab3549.1474480779.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-10-20 09:15:23 +02:00
Josh Poimboeuf	b9b1a9c363	x86/boot/smp/32: Fix initial idle stack location on 32-bit kernels On 32-bit kernels, the initial idle stack calculation doesn't take into account the TOP_OF_KERNEL_STACK_PADDING, making the stack end address inconsistent with other tasks on 32-bit. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nilay Vaish <nilayvaish@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/6cf569410bfa84cf923902fc4d628444cace94be.1474480779.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-10-20 09:15:23 +02:00
Josh Poimboeuf	6616a147a7	x86/boot/32: Fix the end of the stack for idle tasks The frame at the end of each idle task stack is inconsistent with real task stacks, which have a stack frame header and a real return address before the pt_regs area. This inconsistency can be confusing for stack unwinders. It also hides useful information about what asm code was involved in calling into C. Fix that by changing the initial code jumps to calls. Also add infinite loops after the calls to make it clear that the calls don't return, and to hang if they do. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nilay Vaish <nilayvaish@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/2588f34b6fbac4ae6f6f9ead2a78d7f8d58a6341.1474480779.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-10-20 09:15:23 +02:00
Josh Poimboeuf	1b00255f32	x86/entry/32, x86/boot/32: Use local labels Add the local label prefix to all non-function named labels in head_32.S and entry_32.S. In addition to decluttering the symbol table, it also will help stack traces to be more sensible. For example, the last reported function in the idle task stack trace will be startup_32_smp() instead of is486(). Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nilay Vaish <nilayvaish@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/14f9f7afd478b23a762f40734da1a57c0c273f6e.1474480779.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-10-20 09:15:22 +02:00
Tejun Heo	8bc4a04455	Merge branch 'for-4.9' into for-4.10	2016-10-19 12:12:40 -04:00
Linus Torvalds	63ae602cea	Merge branch 'gup_flag-cleanups' Merge the gup_flags cleanups from Lorenzo Stoakes: "This patch series adjusts functions in the get_user_pages* family such that desired FOLL_* flags are passed as an argument rather than implied by flags. The purpose of this change is to make the use of FOLL_FORCE explicit so it is easier to grep for and clearer to callers that this flag is being used. The use of FOLL_FORCE is an issue as it overrides missing VM_READ/VM_WRITE flags for the VMA whose pages we are reading from/writing to, which can result in surprising behaviour. The patch series came out of the discussion around commit `38e0885465` ("mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing"), which addressed a BUG_ON() being triggered when a page was faulted in with PROT_NONE set but having been overridden by FOLL_FORCE. do_numa_page() was run on the assumption the page _must_ be one marked for NUMA node migration as an actual PROT_NONE page would have been dealt with prior to this code path, however FOLL_FORCE introduced a situation where this assumption did not hold. See https://marc.info/?l=linux-mm&m=147585445805166 for the patch proposal" Additionally, there's a fix for an ancient bug related to FOLL_FORCE and FOLL_WRITE by me. [ This branch was rebased recently to add a few more acked-by's and reviewed-by's ] * gup_flag-cleanups: mm: replace access_process_vm() write parameter with gup_flags mm: replace access_remote_vm() write parameter with gup_flags mm: replace __access_remote_vm() write parameter with gup_flags mm: replace get_user_pages_remote() write/force parameters with gup_flags mm: replace get_user_pages() write/force parameters with gup_flags mm: replace get_vaddr_frames() write/force parameters with gup_flags mm: replace get_user_pages_locked() write/force parameters with gup_flags mm: replace get_user_pages_unlocked() write/force parameters with gup_flags mm: remove write/force parameters from __get_user_pages_unlocked() mm: remove write/force parameters from __get_user_pages_locked() mm: remove gup_flags FOLL_WRITE games from __get_user_pages()	2016-10-19 08:39:47 -07:00
Piotr Luc	8214899342	x86/cpufeature: Add AVX512_4VNNIW and AVX512_4FMAPS features AVX512_4VNNIW - Vector instructions for deep learning enhanced word variable precision. AVX512_4FMAPS - Vector instructions for deep learning floating-point single precision. These new instructions are to be used in future Intel Xeon & Xeon Phi processors. The bits 2&3 of CPUID[level:0x07, EDX] inform that new instructions are supported by a processor. The spec can be found in the Intel Software Developer Manual (SDM) or in the Instruction Set Extensions Programming Reference (ISE). Define new feature flags to enumerate the new instructions in /proc/cpuinfo accordingly to CPUID bits and add the required xsave extensions which are required for proper operation. Signed-off-by: Piotr Luc <piotr.luc@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20161018150111.29926-1-piotr.luc@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-10-19 17:37:13 +02:00
Renat Valiullin	854dd54245	x86/vmware: Skip timer_irq_works() check on VMware The timer_irq_works() boot check may sometimes fail in a VM, when the Host is overcommitted or when the Guest is running nested. Since the intended check is unnecessary on VMware's virtual hardware, by-pass it. Signed-off-by: Renat Valiullin <rvaliullin@vmware.com> Acked-by: Alok N Kataria <akataria@vmware.com> Cc: virtualization@lists.linux-foundation.org Link: http://lkml.kernel.org/r/20161013184539.GA11497@rvaliullin-vm Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-10-19 17:36:33 +02:00
Lorenzo Stoakes	f307ab6dce	mm: replace access_process_vm() write parameter with gup_flags This removes the 'write' argument from access_process_vm() and replaces it with 'gup_flags' as use of this function previously silently implied FOLL_FORCE, whereas after this patch callers explicitly pass this flag. We make this explicit as use of FOLL_FORCE can result in surprising behaviour (and hence bugs) within the mm subsystem. Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-19 08:31:25 -07:00
Linus Torvalds	0832881425	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes, plus hw-enablement changes: - fix persistent RAM handling - remove pkeys warning - remove duplicate macro - fix debug warning in irq handler - add new 'Knights Mill' CPU related constants and enable the perf bits" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel/uncore: Add Knights Mill CPUID perf/x86/intel/rapl: Add Knights Mill CPUID perf/x86/intel: Add Knights Mill CPUID x86/cpu/intel: Add Knights Mill to Intel family x86/e820: Don't merge consecutive E820_PRAM ranges pkeys: Remove easily triggered WARN x86: Remove duplicate rtit status MSR macro x86/smp: Add irq_enter/exit() in smp_reschedule_interrupt()	2016-10-18 09:59:04 -07:00
Ingo Molnar	4d69f155d5	Linux 4.9-rc1 -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJYAoDuAAoJEHm+PkMAQRiGUeEH/03/cUjHeY5aJkcJ0JeHkoU5 GR5nRGcjfFF6cGujw2cSXBf5NzZTcrvBBFSgGNJ/rqm4EeDBsmf6T8qSfEKky/SY 3CNWSzayFU8Na3C8Z/a/xPTPicneX9zVnAi8XMAKXwWPmu21JCLR/hkKaxQ29qGr Nqe4kEdLEF80d5lFRfNjK3CX4bD6w6P7aTBaM6wuRe4u5AXKJlSF+j838o5+/tSQ Q1V7fyXlX+kwNmH4gViim8im0PLm7/7Li8e24pL3cAR2G6DHrUzcsYYoRMHpk5bv HdBeCgZL6TnIaJc0ui2FRqQsifaVfM5J+pK81wr/JhBP2hmuWIN7NMupfCYtCcM= =Mown -----END PGP SIGNATURE----- Merge tag 'v4.9-rc1' into x86/fpu, to resolve conflict Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-10-16 13:04:34 +02:00
Rik van Riel	c474e50711	x86/fpu: Split old_fpu & new_fpu handling into separate functions By moving all of the new_fpu state handling into switch_fpu_finish(), the code can be simplified some more. This gets rid of the prefetch, but given the size of the FPU register state on modern CPUs, and the amount of work done by __switch_to() inbetween both functions, the value of a single cache line prefetch seems somewhat dubious anyway. Signed-off-by: Rik van Riel <riel@redhat.com> Acked-by: Dave Hansen <dave.hansen@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1476447331-21566-3-git-send-email-riel@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-10-16 11:38:41 +02:00
Rik van Riel	317b622cb2	x86/fpu: Remove 'cpu' argument from __cpu_invalidate_fpregs_state() The __{fpu,cpu}_invalidate_fpregs_state() functions can only be used to invalidate a resource they control. Document that, and change the API a little bit to reflect that. Go back to open coding the fpu_fpregs_owner_ctx write in the CPU hotplug code, which should be the exception, and move __kernel_fpu_begin() to this API. This patch has no functional changes to the current code. Signed-off-by: Rik van Riel <riel@redhat.com> Acked-by: Dave Hansen <dave.hansen@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1476447331-21566-2-git-send-email-riel@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-10-16 11:38:31 +02:00
Ingo Molnar	1d33369db2	Linux 4.9-rc1 -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJYAoDuAAoJEHm+PkMAQRiGUeEH/03/cUjHeY5aJkcJ0JeHkoU5 GR5nRGcjfFF6cGujw2cSXBf5NzZTcrvBBFSgGNJ/rqm4EeDBsmf6T8qSfEKky/SY 3CNWSzayFU8Na3C8Z/a/xPTPicneX9zVnAi8XMAKXwWPmu21JCLR/hkKaxQ29qGr Nqe4kEdLEF80d5lFRfNjK3CX4bD6w6P7aTBaM6wuRe4u5AXKJlSF+j838o5+/tSQ Q1V7fyXlX+kwNmH4gViim8im0PLm7/7Li8e24pL3cAR2G6DHrUzcsYYoRMHpk5bv HdBeCgZL6TnIaJc0ui2FRqQsifaVfM5J+pK81wr/JhBP2hmuWIN7NMupfCYtCcM= =Mown -----END PGP SIGNATURE----- Merge tag 'v4.9-rc1' into x86/urgent, to pick up updates Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-10-16 11:31:39 +02:00
Dan Williams	23446cb66c	x86/e820: Don't merge consecutive E820_PRAM ranges Commit: `917db484dc` ("x86/boot: Fix kdump, cleanup aborted E820_PRAM max_pfn manipulation") ... fixed up the broken manipulations of max_pfn in the presence of E820_PRAM ranges. However, it also broke the sanitize_e820_map() support for not merging E820_PRAM ranges. Re-introduce the enabling to keep resource boundaries between consecutive defined ranges. Otherwise, for example, an environment that boots with memmap=2G!8G,2G!10G will end up with a single 4G /dev/pmem0 device instead of a /dev/pmem0 and /dev/pmem1 device 2G in size. Reported-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Cc: <stable@vger.kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Zhang Yi <yizhan@redhat.com> Cc: linux-nvdimm@lists.01.org Fixes: `917db484dc` ("x86/boot: Fix kdump, cleanup aborted E820_PRAM max_pfn manipulation") Link: http://lkml.kernel.org/r/147629530854.10618.10383744751594021268.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-10-16 11:16:48 +02:00
Dmitry Vyukov	9f7d416c36	kprobes: Unpoison stack in jprobe_return() for KASAN I observed false KSAN positives in the sctp code, when sctp uses jprobe_return() in jsctp_sf_eat_sack(). The stray 0xf4 in shadow memory are stack redzones: [ ] ================================================================== [ ] BUG: KASAN: stack-out-of-bounds in memcmp+0xe9/0x150 at addr ffff88005e48f480 [ ] Read of size 1 by task syz-executor/18535 [ ] page:ffffea00017923c0 count:0 mapcount:0 mapping: (null) index:0x0 [ ] flags: 0x1fffc0000000000() [ ] page dumped because: kasan: bad access detected [ ] CPU: 1 PID: 18535 Comm: syz-executor Not tainted 4.8.0+ #28 [ ] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 [ ] ffff88005e48f2d0 ffffffff82d2b849 ffffffff0bc91e90 fffffbfff10971e8 [ ] ffffed000bc91e90 ffffed000bc91e90 0000000000000001 0000000000000000 [ ] ffff88005e48f480 ffff88005e48f350 ffffffff817d3169 ffff88005e48f370 [ ] Call Trace: [ ] [<ffffffff82d2b849>] dump_stack+0x12e/0x185 [ ] [<ffffffff817d3169>] kasan_report+0x489/0x4b0 [ ] [<ffffffff817d31a9>] __asan_report_load1_noabort+0x19/0x20 [ ] [<ffffffff82d49529>] memcmp+0xe9/0x150 [ ] [<ffffffff82df7486>] depot_save_stack+0x176/0x5c0 [ ] [<ffffffff817d2031>] save_stack+0xb1/0xd0 [ ] [<ffffffff817d27f2>] kasan_slab_free+0x72/0xc0 [ ] [<ffffffff817d05b8>] kfree+0xc8/0x2a0 [ ] [<ffffffff85b03f19>] skb_free_head+0x79/0xb0 [ ] [<ffffffff85b0900a>] skb_release_data+0x37a/0x420 [ ] [<ffffffff85b090ff>] skb_release_all+0x4f/0x60 [ ] [<ffffffff85b11348>] consume_skb+0x138/0x370 [ ] [<ffffffff8676ad7b>] sctp_chunk_put+0xcb/0x180 [ ] [<ffffffff8676ae88>] sctp_chunk_free+0x58/0x70 [ ] [<ffffffff8677fa5f>] sctp_inq_pop+0x68f/0xef0 [ ] [<ffffffff8675ee36>] sctp_assoc_bh_rcv+0xd6/0x4b0 [ ] [<ffffffff8677f2c1>] sctp_inq_push+0x131/0x190 [ ] [<ffffffff867bad69>] sctp_backlog_rcv+0xe9/0xa20 [ ... ] [ ] Memory state around the buggy address: [ ] ffff88005e48f380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ ] ffff88005e48f400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ ] >ffff88005e48f480: f4 f4 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ ] ^ [ ] ffff88005e48f500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ ] ffff88005e48f580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ ] ================================================================== KASAN stack instrumentation poisons stack redzones on function entry and unpoisons them on function exit. If a function exits abnormally (e.g. with a longjmp like jprobe_return()), stack redzones are left poisoned. Later this leads to random KASAN false reports. Unpoison stack redzones in the frames we are going to jump over before doing actual longjmp in jprobe_return(). Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: Alexander Potapenko <glider@google.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: kasan-dev@googlegroups.com Cc: surovegin@google.com Cc: rostedt@goodmis.org Link: http://lkml.kernel.org/r/1476454043-101898-1-git-send-email-dvyukov@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-10-16 11:02:31 +02:00
Dmitry Vyukov	9254139ad0	kprobes: Avoid false KASAN reports during stack copy Kprobes save and restore raw stack chunks with memcpy(). With KASAN these chunks can contain poisoned stack redzones, as the result memcpy() interceptor produces false stack out-of-bounds reports. Use __memcpy() instead of memcpy() for stack copying. __memcpy() is not instrumented by KASAN and does not lead to the false reports. Currently there is a spew of KASAN reports during boot if CONFIG_KPROBES_SANITY_TEST is enabled: [ ] Kprobe smoke test: started [ ] ================================================================== [ ] BUG: KASAN: stack-out-of-bounds in setjmp_pre_handler+0x17c/0x280 at addr ffff88085259fba8 [ ] Read of size 64 by task swapper/0/1 [ ] page:ffffea00214967c0 count:0 mapcount:0 mapping: (null) index:0x0 [ ] flags: 0x2fffff80000000() [ ] page dumped because: kasan: bad access detected [...] Reported-by: CAI Qian <caiqian@redhat.com> Tested-by: CAI Qian <caiqian@redhat.com> Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexander Potapenko <glider@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David S. Miller <davem@davemloft.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kasan-dev@googlegroups.com [ Improved various details. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-10-16 10:58:59 +02:00
Linus Torvalds	84d69848c9	Merge branch 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild Pull kbuild updates from Michal Marek: - EXPORT_SYMBOL for asm source by Al Viro. This does bring a regression, because genksyms no longer generates checksums for these symbols (CONFIG_MODVERSIONS). Nick Piggin is working on a patch to fix this. Plus, we are talking about functions like strcpy(), which rarely change prototypes. - Fixes for PPC fallout of the above by Stephen Rothwell and Nick Piggin - fixdep speedup by Alexey Dobriyan. - preparatory work by Nick Piggin to allow architectures to build with -ffunction-sections, -fdata-sections and --gc-sections - CONFIG_THIN_ARCHIVES support by Stephen Rothwell - fix for filenames with colons in the initramfs source by me. * 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild: (22 commits) initramfs: Escape colons in depfile ppc: there is no clear_pages to export powerpc/64: whitelist unresolved modversions CRCs kbuild: -ffunction-sections fix for archs with conflicting sections kbuild: add arch specific post-link Makefile kbuild: allow archs to select link dead code/data elimination kbuild: allow architectures to use thin archives instead of ld -r kbuild: Regenerate genksyms lexer kbuild: genksyms fix for typeof handling fixdep: faster CONFIG_ search ia64: move exports to definitions sparc32: debride memcpy.S a bit [sparc] unify 32bit and 64bit string.h sparc: move exports to definitions ppc: move exports to definitions arm: move exports to definitions s390: move exports to definitions m68k: move exports to definitions alpha: move exports to actual definitions x86: move exports to actual definitions ...	2016-10-14 14:26:58 -07:00
Wanpeng Li	1ec6ec14a2	x86/smp: Add irq_enter/exit() in smp_reschedule_interrupt() =============================== [ INFO: suspicious RCU usage. ] 4.8.0+ #24 Not tainted ------------------------------- ./arch/x86/include/asm/msr-trace.h:47 suspicious rcu_dereference_check() usage! other info that might help us debug this: RCU used illegally from idle CPU! rcu_scheduler_active = 1, debug_locks = 0 RCU used illegally from extended quiescent state! no locks held by swapper/1/0. [<ffffffff9d492b95>] do_trace_write_msr+0x135/0x140 [<ffffffff9d06f860>] native_write_msr+0x20/0x30 [<ffffffff9d065fad>] native_apic_msr_eoi_write+0x1d/0x30 [<ffffffff9d05bd1d>] smp_reschedule_interrupt+0x1d/0x30 [<ffffffff9d8daec6>] reschedule_interrupt+0x96/0xa0 Reschedule interrupt may be called in cpu idle state. This causes lockdep check warning above. Add irq_enter/exit() in smp_reschedule_interrupt(), irq_enter() tells the RCU subsystems to end the extended quiescent state, so the following trace call in ack_APIC_irq() works correctly. Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mike Galbraith <efault@gmx.de> Link: http://lkml.kernel.org/r/1476409733-5133-1-git-send-email-wanpeng.li@hotmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-10-14 14:14:20 +02:00
Linus Torvalds	6b25e21fa6	Merge tag 'drm-for-v4.9' of git://people.freedesktop.org/~airlied/linux Pull drm updates from Dave Airlie: "Core: - Fence destaging work - DRIVER_LEGACY to split off legacy drm drivers - drm_mm refactoring - Splitting drm_crtc.c into chunks and documenting better - Display info fixes - rbtree support for prime buffer lookup - Simple VGA DAC driver Panel: - Add Nexus 7 panel - More simple panels i915: - Refactoring GEM naming - Refactored vma/active tracking - Lockless request lookups - Better stolen memory support - FBC fixes - SKL watermark fixes - VGPU improvements - dma-buf fencing support - Better DP dongle support amdgpu: - Powerplay for Iceland asics - Improved GPU reset support - UVD/VEC powergating support for CZ/ST - Preinitialised VRAM buffer support - Virtual display support - Initial SI support - GTT rework - PCI shutdown callback support - HPD IRQ storm fixes amdkfd: - bugfixes tilcdc: - Atomic modesetting support mediatek: - AAL + GAMMA engine support - Hook up gamma LUT - Temporal dithering support imx: - Pixel clock from devicetree - drm bridge support for LVDS bridges - active plane reconfiguration - VDIC deinterlacer support - Frame synchronisation unit support - Color space conversion support analogix: - PSR support - Better panel on/off support rockchip: - rk3399 vop/crtc support - PSR support vc4: - Interlaced vblank timing - 3D rendering CPU overhead reduction - HDMI output fixes tda998x: - HDMI audio ASoC support sunxi: - Allwinner A33 support - better TCON support msm: - DT binding cleanups - Explicit fence-fd support sti: - remove sti415/416 support etnaviv: - MMUv2 refactoring - GC3000 support exynos: - Refactoring HDMI DCC/PHY - G2D pm regression fix - Page fault issues with wait for vblank There is no nouveau work in this tree, as Ben didn't get a pull request in, and he was fighting moving to atomic and adding mst support, so maybe best it waits for a cycle" * tag 'drm-for-v4.9' of git://people.freedesktop.org/~airlied/linux: (1412 commits) drm/crtc: constify drm_crtc_index parameter drm/i915: Fix conflict resolution from backmerge of v4.8-rc8 to drm-next drm/i915/guc: Unwind GuC workqueue reservation if request construction fails drm/i915: Reset the breadcrumbs IRQ more carefully drm/i915: Force relocations via cpu if we run out of idle aperture drm/i915: Distinguish last emitted request from last submitted request drm/i915: Allow DP to work w/o EDID drm/i915: Move long hpd handling into the hotplug work drm/i915/execlists: Reinitialise context image after GPU hang drm/i915: Use correct index for backtracking HUNG semaphores drm/i915: Unalias obj->phys_handle and obj->userptr drm/i915: Just clear the mmiodebug before a register access drm/i915/gen9: only add the planes actually affected by ddb changes drm/i915: Allow PCH DPLL sharing regardless of DPLL_SDVO_HIGH_SPEED drm/i915/bxt: Fix HDMI DPLL configuration drm/i915/gen9: fix the watermark res_blocks value drm/i915/gen9: fix plane_blocks_per_line on watermarks calculations drm/i915/gen9: minimum scanlines for Y tile is not always 4 drm/i915/gen9: fix the WaWmMemoryReadLatency implementation drm/i915/kbl: KBL also needs to run the SAGV code ...	2016-10-11 18:12:22 -07:00
Thomas Garnier	0549a3c02e	kdump, vmcoreinfo: report memory sections virtual addresses KASLR memory randomization can randomize the base of the physical memory mapping (PAGE_OFFSET), vmalloc (VMALLOC_START) and vmemmap (VMEMMAP_START). Adding these variables on VMCOREINFO so tools can easily identify the base of each memory section. Link: http://lkml.kernel.org/r/1471531632-23003-1-git-send-email-thgarnie@google.com Signed-off-by: Thomas Garnier <thgarnie@google.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Xunlei Pang <xlpang@redhat.com> Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Cc: Kees Cook <keescook@chromium.org> Cc: Eugene Surovegin <surovegin@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-11 15:06:33 -07:00
Hidehiro Kawai	0ee59413c9	x86/panic: replace smp_send_stop() with kdump friendly version in panic path Daniel Walker reported problems which happens when crash_kexec_post_notifiers kernel option is enabled (https://lkml.org/lkml/2015/6/24/44). In that case, smp_send_stop() is called before entering kdump routines which assume other CPUs are still online. As the result, for x86, kdump routines fail to save other CPUs' registers and disable virtualization extensions. To fix this problem, call a new kdump friendly function, crash_smp_send_stop(), instead of the smp_send_stop() when crash_kexec_post_notifiers is enabled. crash_smp_send_stop() is a weak function, and it just call smp_send_stop(). Architecture codes should override it so that kdump can work appropriately. This patch only provides x86-specific version. For Xen's PV kernel, just keep the current behavior. NOTES: - Right solution would be to place crash_smp_send_stop() before __crash_kexec() invocation in all cases and remove smp_send_stop(), but we can't do that until all architectures implement own crash_smp_send_stop() - crash_smp_send_stop()-like work is still needed by machine_crash_shutdown() because crash_kexec() can be called without entering panic() Fixes: `f06e5153f4` (kernel/panic.c: add "crash_kexec_post_notifiers" option) Link: http://lkml.kernel.org/r/20160810080948.11028.15344.stgit@sysi4-13.yrl.intra.hitachi.co.jp Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Reported-by: Daniel Walker <dwalker@fifo99.com> Cc: Dave Young <dyoung@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Daniel Walker <dwalker@fifo99.com> Cc: Xunlei Pang <xpang@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Daney <david.daney@cavium.com> Cc: Aaro Koskinen <aaro.koskinen@iki.fi> Cc: "Steven J. Hill" <steven.hill@cavium.com> Cc: Corey Minyard <cminyard@mvista.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-11 15:06:32 -07:00
Jason Cooper	9c6f0902a5	x86: use simpler API for random address requests Currently, all callers to randomize_range() set the length to 0 and calculate end by adding a constant to the start address. We can simplify the API to remove a bunch of needless checks and variables. Use the new randomize_addr(start, range) call to set the requested address. Link: http://lkml.kernel.org/r/20160803233913.32511-3-jason@lakedaemon.net Signed-off-by: Jason Cooper <jason@lakedaemon.net> Acked-by: Kees Cook <keescook@chromium.org> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-11 15:06:32 -07:00
Linus Torvalds	93c26d7dc0	Merge branch 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull protection keys syscall interface from Thomas Gleixner: "This is the final step of Protection Keys support which adds the syscalls so user space can actually allocate keys and protect memory areas with them. Details and usage examples can be found in the documentation. The mm side of this has been acked by Mel" * 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/pkeys: Update documentation x86/mm/pkeys: Do not skip PKRU register if debug registers are not used x86/pkeys: Fix pkeys build breakage for some non-x86 arches x86/pkeys: Add self-tests x86/pkeys: Allow configuration of init_pkru x86/pkeys: Default to a restrictive init PKRU pkeys: Add details of system call use to Documentation/ generic syscalls: Wire up memory protection keys syscalls x86: Wire up protection keys system calls x86/pkeys: Allocation/free syscalls x86/pkeys: Make mprotect_key() mask off additional vm_flags mm: Implement new pkey_mprotect() system call x86/pkeys: Add fault handling for PF_PK page fault bit	2016-10-10 11:01:51 -07:00
Linus Torvalds	5fa0eb0b4d	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 updates from Thomas Gleixner: "A pile of regression fixes and updates: - address the fallout of the patches which made the cpuid - nodeid relation permanent: Handling of invalid APIC ids and preventing pointless warning messages. - force eager FPU when protection keys are enabled. Protection keys are not generating FPU exceptions so they cannot work with the lazy FPU mechanism. - prevent force migration of interrupts which are not part of the CPU vector domain. - handle the fact that APIC ids are not updated in the ACPI/MADT tables on physical CPU hotplug - remove bash-isms from syscall table generator script - use the hypervisor supplied APIC frequency when running on VMware" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/pkeys: Make protection keys an "eager" feature x86/apic: Prevent pointless warning messages x86/acpi: Prevent LAPIC id 0xff from being accounted arch/x86: Handle non enumerated CPU after physical hotplug x86/unwind: Fix oprofile module link error x86/vmware: Skip lapic calibration on VMware x86/syscalls: Remove bash-isms in syscall table generator x86/irq: Prevent force migration of irqs which are not in the vector domain	2016-10-10 10:59:07 -07:00
Thomas Gleixner	df610d6788	x86/apic: Prevent pointless warning messages Markus reported that he sees new warnings: APIC: NR_CPUS/possible_cpus limit of 4 reached. Processor 4/0x84 ignored. APIC: NR_CPUS/possible_cpus limit of 4 reached. Processor 5/0x85 ignored. This comes from the recent persistant cpuid - nodeid changes. The code which emits the warning has been called prior to these changes only for enabled processors. Now it's called for disabled processors as well to get the possible cpu accounting correct. So if the kernel is compiled for the number of actual available/enabled CPUs and the BIOS reports disabled CPUs as well then the above warnings are printed. That's a pointless exercise as it only makes sense if there are more CPUs enabled than the kernel supports. Nake the warning conditional on enabled processors so we are back to the state before these changes. Fixes: `8f54969dc8` ("x86/acpi: Introduce persistent storage for cpuid <-> apicid mapping") Reported-and-tested-by: Markus Trippelsdorf <markus@trippelsdorf.de> Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk> Cc: Dou Liyang <douly.fnst@cn.fujitsu.com> Cc: linux-acpi@vger.kernel.org Cc: Gu Zheng <guz.fnst@cn.fujitsu.com> Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1610071549330.19804@nanos Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-10-08 12:18:36 +02:00
Thomas Gleixner	f3bf1dbe64	x86/acpi: Prevent LAPIC id 0xff from being accounted Yinghai reported that the recent changes to make the cpuid - nodeid relationship permanent causes a cpuid ordering regression on a system which has 2apic enabled.. The reason is that the ACPI local APIC parser has no sanity check for apicid 0xff, which is an invalid id. So a CPU id for this invalid local APIC id is allocated and therefor breaks the cpuid ordering. Add a sanity check to acpi_parse_lapic() which ignores the invalid id. Fixes: `8f54969dc8` ("x86/acpi: Introduce persistent storage for cpuid <-> apicid mapping") Reported-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>, Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: douly.fnst@cn.fujitsu.com, Cc: zhugh.fnst@cn.fujitsu.com Cc: Tony Luck <tony.luck@intel.com> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Len Brown <lenb@kernel.org> Cc: Lv Zheng <lv.zheng@intel.com>, Cc: robert.moore@intel.com Cc: linux-acpi@vger.kernel.org Link: https://lkml.kernel.org/r/CAE9FiQVQx6FRXT-RdR7Crz4dg5LeUWHcUSy1KacjR+JgU_vGJg@mail.gmail.com	2016-10-08 12:10:52 +02:00
Chris Metcalf	6727ad9e20	nmi_backtrace: generate one-line reports for idle cpus When doing an nmi backtrace of many cores, most of which are idle, the output is a little overwhelming and very uninformative. Suppress messages for cpus that are idling when they are interrupted and just emit one line, "NMI backtrace for N skipped: idling at pc 0xNNN". We do this by grouping all the cpuidle code together into a new .cpuidle.text section, and then checking the address of the interrupted PC to see if it lies within that section. This commit suitably tags x86 and tile idle routines, and only adds in the minimal framework for other architectures. Link: http://lkml.kernel.org/r/1472487169-14923-5-git-send-email-cmetcalf@mellanox.com Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Daniel Thompson <daniel.thompson@linaro.org> [arm] Tested-by: Petr Mladek <pmladek@suse.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Russell King <linux@arm.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-07 18:46:30 -07:00
Chris Metcalf	9a01c3ed5c	nmi_backtrace: add more trigger_*_cpu_backtrace() methods Patch series "improvements to the nmi_backtrace code" v9. This patch series modifies the trigger_xxx_backtrace() NMI-based remote backtracing code to make it more flexible, and makes a few small improvements along the way. The motivation comes from the task isolation code, where there are scenarios where we want to be able to diagnose a case where some cpu is about to interrupt a task-isolated cpu. It can be helpful to see both where the interrupting cpu is, and also an approximation of where the cpu that is being interrupted is. The nmi_backtrace framework allows us to discover the stack of the interrupted cpu. I've tested that the change works as desired on tile, and build-tested x86, arm, mips, and sparc64. For x86 I confirmed that the generic cpuidle stuff as well as the architecture-specific routines are in the new cpuidle section. For arm, mips, and sparc I just build-tested it and made sure the generic cpuidle routines were in the new cpuidle section, but I didn't attempt to figure out which the platform-specific idle routines might be. That might be more usefully done by someone with platform experience in follow-up patches. This patch (of 4): Currently you can only request a backtrace of either all cpus, or all cpus but yourself. It can also be helpful to request a remote backtrace of a single cpu, and since we want that, the logical extension is to support a cpumask as the underlying primitive. This change modifies the existing lib/nmi_backtrace.c code to take a cpumask as its basic primitive, and modifies the linux/nmi.h code to use the new "cpumask" method instead. The existing clients of nmi_backtrace (arm and x86) are converted to using the new cpumask approach in this change. The other users of the backtracing API (sparc64 and mips) are converted to use the cpumask approach rather than the all/allbutself approach. The mips code ignored the "include_self" boolean but with this change it will now also dump a local backtrace if requested. Link: http://lkml.kernel.org/r/1472487169-14923-2-git-send-email-cmetcalf@mellanox.com Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com> Tested-by: Daniel Thompson <daniel.thompson@linaro.org> [arm] Reviewed-by: Aaron Tomlin <atomlin@redhat.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Russell King <linux@arm.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-07 18:46:30 -07:00
Linus Torvalds	ddc4e6d232	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching Pull livepatching updates from Jiri Kosina: - fix for patching modules that contain .altinstructions or .parainstructions sections, from Jessica Yu - make TAINT_LIVEPATCH a per-module flag (so that it's immediately clear which module caused the taint), from Josh Poimboeuf * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching: livepatch/module: make TAINT_LIVEPATCH module-specific Documentation: livepatch: add section about arch-specific code livepatch/x86: apply alternatives and paravirt patches after relocations livepatch: use arch_klp_init_object_loaded() to finish arch-specific tasks	2016-10-07 12:02:24 -07:00
Prarit Bhargava	2a51fe083e	arch/x86: Handle non enumerated CPU after physical hotplug When a CPU is physically added to a system then the MADT table is not updated. If subsequently a kdump kernel is started on that physically added CPU then the ACPI enumeration fails to provide the information for this CPU which is now the boot CPU of the kdump kernel. As a consequence, generic_processor_info() is not invoked for that CPU so the number of enumerated processors is 0 and none of the initializations, including the logical package id management, are performed. We have code which relies on the correctness of the logical package map and other information which is initialized via generic_processor_info(). Executing such code will result in undefined behaviour or kernel crashes. This problem applies only to the kdump kernel because a normal kexec will switch to the original boot CPU, which is enumerated in MADT, before jumping into the kexec kernel. The boot code already has a check for num_processors equal 0 in prefill_possible_map(). We can use that check as an indicator that the enumeration of the boot CPU did not happen and invoke generic_processor_info() for it. That initializes the relevant data for the boot CPU and therefore prevents subsequent failure. [ tglx: Refined the code and rewrote the changelog ] Signed-off-by: Prarit Bhargava <prarit@redhat.com> Fixes: `1f12e32f4c` ("x86/topology: Create logical package id") Cc: Peter Zijlstra <peterz@infradead.org> Cc: Len Brown <len.brown@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: dyoung@redhat.com Cc: Eric Biederman <ebiederm@xmission.com> Cc: kexec@lists.infradead.org Link: http://lkml.kernel.org/r/1475514432-27682-1-git-send-email-prarit@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-10-07 15:22:15 +02:00
Rik van Riel	25d83b531c	x86/fpu: Rename lazy restore functions to "register state valid" Name the functions after the state they track, rather than the function they currently enable. This should make it more obvious when we use the fpu_register_state_valid() function for something else in the future. Signed-off-by: Rik van Riel <riel@redhat.com> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: pbonzini@redhat.com Link: http://lkml.kernel.org/r/1475627678-20788-8-git-send-email-riel@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-10-07 11:14:41 +02:00
Rik van Riel	3913cc3507	x86/fpu: Remove struct fpu::counter With the lazy FPU code gone, we no longer use the counter field in struct fpu for anything. Get rid it. Signed-off-by: Rik van Riel <riel@redhat.com> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: pbonzini@redhat.com Link: http://lkml.kernel.org/r/1475627678-20788-6-git-send-email-riel@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-10-07 11:14:40 +02:00
Andy Lutomirski	c592b57347	x86/fpu: Remove use_eager_fpu() This removes all the obvious code paths that depend on lazy FPU mode. It shouldn't change the generated code at all. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Rik van Riel <riel@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: pbonzini@redhat.com Link: http://lkml.kernel.org/r/1475627678-20788-5-git-send-email-riel@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-10-07 11:14:36 +02:00
Andy Lutomirski	ca6938a1cd	x86/fpu: Hard-disable lazy FPU mode Since commit: `58122bf1d8` ("x86/fpu: Default eagerfpu=on on all CPUs") ... in Linux 4.6, eager FPU mode has been the default on all x86 systems, and no one has reported any regressions. This patch removes the ability to enable lazy mode: use_eager_fpu() becomes "return true" and all of the FPU mode selection machinery is removed. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Rik van Riel <riel@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: pbonzini@redhat.com Link: http://lkml.kernel.org/r/1475627678-20788-3-git-send-email-riel@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-10-07 11:14:17 +02:00
Linus Torvalds	6218590bcb	KVM updates for v4.9-rc1 All architectures: Move `make kvmconfig` stubs from x86; use 64 bits for debugfs stats. ARM: Important fixes for not using an in-kernel irqchip; handle SError exceptions and present them to guests if appropriate; proxying of GICV access at EL2 if guest mappings are unsafe; GICv3 on AArch32 on ARMv8; preparations for GICv3 save/restore, including ABI docs; cleanups and a bit of optimizations. MIPS: A couple of fixes in preparation for supporting MIPS EVA host kernels; MIPS SMP host & TLB invalidation fixes. PPC: Fix the bug which caused guests to falsely report lockups; other minor fixes; a small optimization. s390: Lazy enablement of runtime instrumentation; up to 255 CPUs for nested guests; rework of machine check deliver; cleanups and fixes. x86: IOMMU part of AMD's AVIC for vmexit-less interrupt delivery; Hyper-V TSC page; per-vcpu tsc_offset in debugfs; accelerated INS/OUTS in nVMX; cleanups and fixes. -----BEGIN PGP SIGNATURE----- iQEcBAABCAAGBQJX9iDrAAoJEED/6hsPKofoOPoIAIUlgojkb9l2l1XVDgsXdgQL sRVhYSVv7/c8sk9vFImrD5ElOPZd+CEAIqFOu45+NM3cNi7gxip9yftUVs7wI5aC eDZRWm1E4trDZLe54ZM9ThcqZzZZiELVGMfR1+ZndUycybwyWzafpXYsYyaXp3BW hyHM3qVkoWO3dxBWFwHIoO/AUJrWYkRHEByKyvlC6KPxSdBPSa5c1AQwMCoE0Mo4 K/xUj4gBn9eMelNhg4Oqu/uh49/q+dtdoP2C+sVM8bSdquD+PmIeOhPFIcuGbGFI B+oRpUhIuntN39gz8wInJ4/GRSeTuR2faNPxMn4E1i1u4LiuJvipcsOjPfe0a18= =fZRB -----END PGP SIGNATURE----- Merge tag 'kvm-4.9-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM updates from Radim Krčmář: "All architectures: - move `make kvmconfig` stubs from x86 - use 64 bits for debugfs stats ARM: - Important fixes for not using an in-kernel irqchip - handle SError exceptions and present them to guests if appropriate - proxying of GICV access at EL2 if guest mappings are unsafe - GICv3 on AArch32 on ARMv8 - preparations for GICv3 save/restore, including ABI docs - cleanups and a bit of optimizations MIPS: - A couple of fixes in preparation for supporting MIPS EVA host kernels - MIPS SMP host & TLB invalidation fixes PPC: - Fix the bug which caused guests to falsely report lockups - other minor fixes - a small optimization s390: - Lazy enablement of runtime instrumentation - up to 255 CPUs for nested guests - rework of machine check deliver - cleanups and fixes x86: - IOMMU part of AMD's AVIC for vmexit-less interrupt delivery - Hyper-V TSC page - per-vcpu tsc_offset in debugfs - accelerated INS/OUTS in nVMX - cleanups and fixes" * tag 'kvm-4.9-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (140 commits) KVM: MIPS: Drop dubious EntryHi optimisation KVM: MIPS: Invalidate TLB by regenerating ASIDs KVM: MIPS: Split kernel/user ASID regeneration KVM: MIPS: Drop other CPU ASIDs on guest MMU changes KVM: arm/arm64: vgic: Don't flush/sync without a working vgic KVM: arm64: Require in-kernel irqchip for PMU support KVM: PPC: Book3s PR: Allow access to unprivileged MMCR2 register KVM: PPC: Book3S PR: Support 64kB page size on POWER8E and POWER8NVL KVM: PPC: Book3S: Remove duplicate setting of the B field in tlbie KVM: PPC: BookE: Fix a sanity check KVM: PPC: Book3S HV: Take out virtual core piggybacking code KVM: PPC: Book3S: Treat VTB as a per-subcore register, not per-thread ARM: gic-v3: Work around definition of gic_write_bpr1 KVM: nVMX: Fix the NMI IDT-vectoring handling KVM: VMX: Enable MSR-BASED TPR shadow even if APICv is inactive KVM: nVMX: Fix reload apic access page warning kvmconfig: add virtio-gpu to config fragment config: move x86 kvm_guest.config to a common location arm64: KVM: Remove duplicating init code for setting VMID ARM: KVM: Support vgic-v3 ...	2016-10-06 10:49:01 -07:00
Josh Poimboeuf	cfee9eddcd	x86/unwind: Fix oprofile module link error When compiling on x86 with CONFIG_OPROFILE=m and CONFIG_FRAME_POINTER=n, the oprofile module fails to link: ERROR: ftrace_graph_ret_addr" [arch/x86/oprofile/oprofile.ko] undefined! The problem was introduced when oprofile was converted to use the new x86 unwinder. When frame pointers are disabled, the "guess" unwinder's unwind_get_return_address() is an inline function which calls ftrace_graph_ret_addr(), which is not exported. Fix it by converting the "guess" version of unwind_get_return_address() to an exported out-of-line function, just like its frame pointer counterpart. Reported-by: Karl Beldan <karl.beldan@gmail.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: `ec2ad9ccf1` ("oprofile/x86: Convert x86_backtrace() to use the new unwinder") Link: http://lkml.kernel.org/r/be08d589f6474df78364e081c42777e382af9352.1475731632.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-10-06 09:52:20 +02:00
Renat Valiullin	b91688f528	x86/vmware: Skip lapic calibration on VMware In a virtualized environment the APIC timer calibration can go wrong when the host is overcommitted or the guest is running nested. This results in the APIC timers operating at an incorrect frequency. Since VMware supports a mechanism to retrieve the local APIC frequency we can ask the hypervisor for it and skip the APIC calibration loop. Signed-off-by: Renat Valiullin <rvaliullin@vmware.com> Acked-by: Alok N Kataria <akataria@vmware.com> Cc: virtualization@lists.linux-foundation.org Link: http://lkml.kernel.org/r/20161004201148.GA1421@uu64vm Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-10-05 11:43:30 +02:00
Mika Westerberg	db91aa793f	x86/irq: Prevent force migration of irqs which are not in the vector domain When a CPU is about to be offlined we call fixup_irqs() that resets IRQ affinities related to the CPU in question. The same thing is also done when the system is suspended to S-states like S3 (mem). For each IRQ we try to complete any on-going move regardless whether the IRQ is actually part of x86_vector_domain. For each IRQ descriptor we fetch its chip_data, assume it is of type struct apic_chip_data and manipulate it by clearing old_domain mask etc. For irq_chips that are not part of the x86_vector_domain, like those created by various GPIO drivers, will find their chip_data being changed unexpectly. Below is an example where GPIO chip owned by pinctrl-sunrisepoint.c gets corrupted after resume: # cat /sys/kernel/debug/gpio gpiochip0: GPIOs 360-511, parent: platform/INT344B:00, INT344B:00: gpio-511 ( \|sysfs ) in hi # rtcwake -s10 -mmem <10 seconds passes> # cat /sys/kernel/debug/gpio gpiochip0: GPIOs 360-511, parent: platform/INT344B:00, INT344B:00: gpio-511 ( \|sysfs ) in ? Note '?' in the output. It means the struct gpio_chip ->get function is NULL whereas before suspend it was there. Fix this by first checking that the IRQ belongs to x86_vector_domain before we try to use the chip_data as struct apic_chip_data. Reported-and-tested-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: stable@vger.kernel.org # 4.4+ Link: http://lkml.kernel.org/r/20161003101708.34795-1-mika.westerberg@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-10-04 13:13:47 +02:00
Linus Torvalds	597f03f9d1	Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull CPU hotplug updates from Thomas Gleixner: "Yet another batch of cpu hotplug core updates and conversions: - Provide core infrastructure for multi instance drivers so the drivers do not have to keep custom lists. - Convert custom lists to the new infrastructure. The block-mq custom list conversion comes through the block tree and makes the diffstat tip over to more lines removed than added. - Handle unbalanced hotplug enable/disable calls more gracefully. - Remove the obsolete CPU_STARTING/DYING notifier support. - Convert another batch of notifier users. The relayfs changes which conflicted with the conversion have been shipped to me by Andrew. The remaining lot is targeted for 4.10 so that we finally can remove the rest of the notifiers" * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits) cpufreq: Fix up conversion to hotplug state machine blk/mq: Reserve hotplug states for block multiqueue x86/apic/uv: Convert to hotplug state machine s390/mm/pfault: Convert to hotplug state machine mips/loongson/smp: Convert to hotplug state machine mips/octeon/smp: Convert to hotplug state machine fault-injection/cpu: Convert to hotplug state machine padata: Convert to hotplug state machine cpufreq: Convert to hotplug state machine ACPI/processor: Convert to hotplug state machine virtio scsi: Convert to hotplug state machine oprofile/timer: Convert to hotplug state machine block/softirq: Convert to hotplug state machine lib/irq_poll: Convert to hotplug state machine x86/microcode: Convert to hotplug state machine sh/SH-X3 SMP: Convert to hotplug state machine ia64/mca: Convert to hotplug state machine ARM/OMAP/wakeupgen: Convert to hotplug state machine ARM/shmobile: Convert to hotplug state machine arm64/FP/SIMD: Convert to hotplug state machine ...	2016-10-03 19:43:08 -07:00
Linus Torvalds	8e4ef63867	Merge branch 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 vdso updates from Ingo Molnar: "The main changes in this cycle centered around adding support for 32-bit compatible C/R of the vDSO on 64-bit kernels, by Dmitry Safonov" * 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/vdso: Use CONFIG_X86_X32_ABI to enable vdso prctl x86/vdso: Only define map_vdso_randomized() if CONFIG_X86_64 x86/vdso: Only define prctl_map_vdso() if CONFIG_CHECKPOINT_RESTORE x86/signal: Add SA_{X32,IA32}_ABI sa_flags x86/ptrace: Down with test_thread_flag(TIF_IA32) x86/coredump: Use pr_reg size, rather that TIF_IA32 flag x86/arch_prctl/vdso: Add ARCH_MAP_VDSO_* x86/vdso: Replace calculate_addr in map_vdso() with addr x86/vdso: Unmap vdso blob on vvar mapping failure	2016-10-03 17:29:01 -07:00
Linus Torvalds	6aebe7f9e8	Merge branch 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 timer updates from Ingo Molnar: "This tree includes a HPET overhead micro-optimization plus new TSC frequencies for newer Intel CPUs" * 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/tsc: Add additional Intel CPU models to the crystal quirk list x86/tsc: Use cpu id defines instead of hex constants x86/hpet: Reduce HPET counter read contention	2016-10-03 17:27:17 -07:00
Linus Torvalds	a8adc0f091	Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: "Header file and a wrapper functions cleanup" * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Migrate exception table users off module.h and onto extable.h x86: Clean up various simple wrapper functions	2016-10-03 17:18:52 -07:00
Linus Torvalds	3ef0a61a46	Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 boot updates from Ingo Molnar: "The changes in this cycle were: - Save e820 table RAM footprint on larger kernel configurations. (Denys Vlasenko) - pmem related fixes (Dan Williams) - theoretical e820 boundary condition fix (Wei Yang)" * 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot: Fix kdump, cleanup aborted E820_PRAM max_pfn manipulation x86/e820: Use much less memory for e820/e820_saved, save up to 120k x86/e820: Prepare e280 code for switch to dynamic storage x86/e820: Mark some static functions __init x86/e820: Fix very large 'size' handling boundary condition	2016-10-03 16:46:53 -07:00
Linus Torvalds	1a4a2bc460	Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull low-level x86 updates from Ingo Molnar: "In this cycle this topic tree has become one of those 'super topics' that accumulated a lot of changes: - Add CONFIG_VMAP_STACK=y support to the core kernel and enable it on x86 - preceded by an array of changes. v4.8 saw preparatory changes in this area already - this is the rest of the work. Includes the thread stack caching performance optimization. (Andy Lutomirski) - switch_to() cleanups and all around enhancements. (Brian Gerst) - A large number of dumpstack infrastructure enhancements and an unwinder abstraction. The secret long term plan is safe(r) live patching plus maybe another attempt at debuginfo based unwinding - but all these current bits are standalone enhancements in a frame pointer based debug environment as well. (Josh Poimboeuf) - More __ro_after_init and const annotations. (Kees Cook) - Enable KASLR for the vmemmap memory region. (Thomas Garnier)" [ The virtually mapped stack changes are pretty fundamental, and not x86-specific per se, even if they are only used on x86 right now. ] * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (70 commits) x86/asm: Get rid of __read_cr4_safe() thread_info: Use unsigned long for flags x86/alternatives: Add stack frame dependency to alternative_call_2() x86/dumpstack: Fix show_stack() task pointer regression x86/dumpstack: Remove dump_trace() and related callbacks x86/dumpstack: Convert show_trace_log_lvl() to use the new unwinder oprofile/x86: Convert x86_backtrace() to use the new unwinder x86/stacktrace: Convert save_stack_trace_*() to use the new unwinder perf/x86: Convert perf_callchain_kernel() to use the new unwinder x86/unwind: Add new unwind interface and implementations x86/dumpstack: Remove NULL task pointer convention fork: Optimize task creation by caching two thread stacks per CPU if CONFIG_VMAP_STACK=y sched/core: Free the stack early if CONFIG_THREAD_INFO_IN_TASK lib/syscall: Pin the task stack in collect_syscall() x86/process: Pin the target stack in get_wchan() x86/dumpstack: Pin the target stack when dumping it kthread: Pin the stack via try_get_task_stack()/put_task_stack() in to_live_kthread() function sched/core: Add try_get_task_stack() and put_task_stack() x86/entry/64: Fix a minor comment rebase error iommu/amd: Don't put completion-wait semaphore on stack ...	2016-10-03 16:13:28 -07:00
Linus Torvalds	110a9e42b6	Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 apic updates from Ingo Molnar: "The main changes are: - Persistent CPU/node numbering across CPU hotplug/unplug events. This is a pretty involved series of changes that first fetches all the information during bootup and then uses it for the various hotplug/unplug methods. (Gu Zheng, Dou Liyang) - IO-APIC hot-add/remove fixes and enhancements. (Rui Wang) - ... various fixes, cleanups and enhancements" * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits) x86/apic: Fix silent & fatal merge conflict in __generic_processor_info() acpi: Fix broken error check in map_processor() acpi: Validate processor id when mapping the processor acpi: Provide mechanism to validate processors in the ACPI tables x86/acpi: Set persistent cpuid <-> nodeid mapping when booting x86/acpi: Enable MADT APIs to return disabled apicids x86/acpi: Introduce persistent storage for cpuid <-> apicid mapping x86/acpi: Enable acpi to register all possible cpus at boot time x86/numa: Online memory-less nodes at boot time x86/apic: Get rid of apic_version[] array x86/apic: Order irq_enter/exit() calls correctly vs. ack_APIC_irq() x86/ioapic: Ignore root bridges without a companion ACPI device x86/apic: Update comment about disabling processor focus x86/smpboot: Check APIC ID before setting up default routing x86/ioapic: Fix IOAPIC failing to request resource x86/ioapic: Fix lost IOAPIC resource after hot-removal and hotadd x86/ioapic: Fix setup_res() failing to get resource x86/ioapic: Support hot-removal of IOAPICs present during boot x86/ioapic: Change prototype of acpi_ioapic_add() x86/apic, ACPI: Fix incorrect assignment when handling apic/x2apic entries ...	2016-10-03 15:36:06 -07:00
Linus Torvalds	af79ad2b1f	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler changes from Ingo Molnar: "The main changes are: - irqtime accounting cleanups and enhancements. (Frederic Weisbecker) - schedstat debugging enhancements, make it more broadly runtime available. (Josh Poimboeuf) - More work on asymmetric topology/capacity scheduling. (Morten Rasmussen) - sched/wait fixes and cleanups. (Oleg Nesterov) - PELT (per entity load tracking) improvements. (Peter Zijlstra) - Rewrite and enhance select_idle_siblings(). (Peter Zijlstra) - sched/numa enhancements/fixes (Rik van Riel) - sched/cputime scalability improvements (Stanislaw Gruszka) - Load calculation arithmetics fixes. (Dietmar Eggemann) - sched/deadline enhancements (Tommaso Cucinotta) - Fix utilization accounting when switching to the SCHED_NORMAL policy. (Vincent Guittot) - ... plus misc cleanups and enhancements" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (64 commits) sched/irqtime: Consolidate irqtime flushing code sched/irqtime: Consolidate accounting synchronization with u64_stats API u64_stats: Introduce IRQs disabled helpers sched/irqtime: Remove needless IRQs disablement on kcpustat update sched/irqtime: No need for preempt-safe accessors sched/fair: Fix min_vruntime tracking sched/debug: Add SCHED_WARN_ON() sched/core: Fix set_user_nice() sched/fair: Introduce set_curr_task() helper sched/core, ia64: Rename set_curr_task() sched/core: Fix incorrect utilization accounting when switching to fair class sched/core: Optimize SCHED_SMT sched/core: Rewrite and improve select_idle_siblings() sched/core: Replace sd_busy/nr_busy_cpus with sched_domain_shared sched/core: Introduce 'struct sched_domain_shared' sched/core: Restructure destroy_sched_domain() sched/core: Remove unused @cpu argument from destroy_sched_domain*() sched/wait: Introduce init_wait_entry() sched/wait: Avoid abort_exclusive_wait() in __wait_on_bit_lock() sched/wait: Avoid abort_exclusive_wait() in ___wait_event() ...	2016-10-03 13:39:00 -07:00
Linus Torvalds	e606d81d2d	Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS updates from Ingo Molnar: "The main changes were: - Lots of enhancements for AMD SMCA (Scalable MCA features/extensions) systems: extract, decode and print more hardware error information and add matching support on the injection/testing side as well. (Yazn Ghannam) - Various MCE handling improvements on modern Intel Xeons. (Tony Luck) - Plus misc fixes and enhancements" * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits) x86/RAS/mce_amd_inj: Remove debugfs dir recursively on exit x86/RAS/mce_amd_inj: Fix signed wrap around when decrementing index 'i' x86/RAS/mce_amd_inj: Fix some W= warnings x86/MCE/AMD, EDAC: Handle reserved bank 4 on Fam17h properly x86/mce/AMD: Extract the error address on SMCA systems x86/mce, EDAC/mce_amd: Print MCA_SYND and MCA_IPID during MCE on SMCA systems x86/mce/AMD: Save MCA_IPID in MCE struct on SMCA systems x86/mce/AMD: Ensure the deferred error interrupt is of type APIC on SMCA systems x86/mce/AMD: Update sysfs bank names for SMCA systems x86/mce/AMD, EDAC/mce_amd: Define and use tables for known SMCA IP types EDAC/mce_amd: Use SMCA prefix for error descriptions arrays EDAC/mce_amd: Add missing SMCA error descriptions x86/mce/AMD: Read MSRs on the CPU allocating the threshold blocks x86/RAS: Add syndrome support to mce_amd_inj EDAC/mce_amd: Print syndrome register value on SMCA systems x86/mce: Add support for new MCA_SYND register x86/mce/AMD: Use msr_ops.misc() in allocate_threshold_blocks() x86/mce: Drop X86_FEATURE_MCE_RECOVERY and the related model string test x86/mce: Improve memcpy_mcsafe() x86/mce: Add PCI quirks to identify Xeons with machine check recovery ...	2016-10-03 13:22:39 -07:00
Linus Torvalds	00bcf5cdd6	Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "The main changes in this cycle were: - rwsem micro-optimizations (Davidlohr Bueso) - Improve the implementation and optimize the performance of percpu-rwsems. (Peter Zijlstra.) - Convert all lglock users to better facilities such as percpu-rwsems or percpu-spinlocks and remove lglocks. (Peter Zijlstra) - Remove the ticket (spin)lock implementation. (Peter Zijlstra) - Korean translation of memory-barriers.txt and related fixes to the English document. (SeongJae Park) - misc fixes and cleanups" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) x86/cmpxchg, locking/atomics: Remove superfluous definitions x86, locking/spinlocks: Remove ticket (spin)lock implementation locking/lglock: Remove lglock implementation stop_machine: Remove stop_cpus_lock and lg_double_lock/unlock() fs/locks: Use percpu_down_read_preempt_disable() locking/percpu-rwsem: Add down_read_preempt_disable() fs/locks: Replace lg_local with a per-cpu spinlock fs/locks: Replace lg_global with a percpu-rwsem locking/percpu-rwsem: Add DEFINE_STATIC_PERCPU_RWSEMand percpu_rwsem_assert_held() locking/pv-qspinlock: Use cmpxchg_release() in __pv_queued_spin_unlock() locking/rwsem, x86: Drop a bogus cc clobber futex: Add some more function commentry locking/hung_task: Show all locks locking/rwsem: Scan the wait_list for readers only once locking/rwsem: Remove a few useless comments locking/rwsem: Return void in __rwsem_mark_wake() locking, rcu, cgroup: Avoid synchronize_sched() in __cgroup_procs_write() locking/Documentation: Add Korean translation locking/Documentation: Fix a typo of example result locking/Documentation: Fix wrong section reference ...	2016-10-03 12:15:00 -07:00
Linus Torvalds	de956b8f45	Merge branch 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI updates from Ingo Molnar: "Main changes in this cycle were: - Refactor the EFI memory map code into architecture neutral files and allow drivers to permanently reserve EFI boot services regions on x86, as well as ARM/arm64. (Matt Fleming) - Add ARM support for the EFI ESRT driver. (Ard Biesheuvel) - Make the EFI runtime services and efivar API interruptible by swapping spinlocks for semaphores. (Sylvain Chouleur) - Provide the EFI identity mapping for kexec which allows kexec to work on SGI/UV platforms with requiring the "noefi" kernel command line parameter. (Alex Thorlton) - Add debugfs node to dump EFI page tables on arm64. (Ard Biesheuvel) - Merge the EFI test driver being carried out of tree until now in the FWTS project. (Ivan Hu) - Expand the list of flags for classifying EFI regions as "RAM" on arm64 so we align with the UEFI spec. (Ard Biesheuvel) - Optimise out the EFI mixed mode if it's unsupported (CONFIG_X86_32) or disabled (CONFIG_EFI_MIXED=n) and switch the early EFI boot services function table for direct calls, alleviating us from having to maintain the custom function table. (Lukas Wunner) - Miscellaneous cleanups and fixes" * 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits) x86/efi: Round EFI memmap reservations to EFI_PAGE_SIZE x86/efi: Allow invocation of arbitrary boot services x86/efi: Optimize away setup_gop32/64 if unused x86/efi: Use kmalloc_array() in efi_call_phys_prolog() efi/arm64: Treat regions with WT/WC set but WB cleared as memory efi: Add efi_test driver for exporting UEFI runtime service interfaces x86/efi: Defer efi_esrt_init until after memblock_x86_fill efi/arm64: Add debugfs node to dump UEFI runtime page tables x86/efi: Remove unused find_bits() function fs/efivarfs: Fix double kfree() in error path x86/efi: Map in physical addresses in efi_map_region_fixed lib/ucs2_string: Speed up ucs2_utf8size() firmware-gsmi: Delete an unnecessary check before the function call "dma_pool_destroy" x86/efi: Initialize status to ensure garbage is not returned on small size efi: Replace runtime services spinlock with semaphore efi: Don't use spinlocks for efi vars efi: Use a file local lock for efivars efi/arm*: esrt: Add missing call to efi_esrt_init() efi/esrt: Use memremap not ioremap to access ESRT table in memory x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image data ...	2016-10-03 11:33:18 -07:00
Linus Torvalds	d7a0dab82f	Merge branch 'core-smp-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core SMP updates from Ingo Molnar: "Two main change is generic vCPU pinning and physical CPU SMP-call support, for Xen to be able to perform certain calls on specific physical CPUs - by Juergen Gross" * 'core-smp-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: smp: Allocate smp_call_on_cpu() workqueue on stack too hwmon: Use smp_call_on_cpu() for dell-smm i8k dcdbas: Make use of smp_call_on_cpu() xen: Add xen_pin_vcpu() to support calling functions on a dedicated pCPU smp: Add function to execute a function synchronously on a CPU virt, sched: Add generic vCPU pinning support xen: Sync xen header	2016-10-03 11:02:39 -07:00
Linus Torvalds	72d39926f0	ACPI material for v4.9-rc1 - Update of the ACPICA code in the kernel to upstream revision 20160831 with the following major changes: * New mechanism for GPE masking. * Fixes for issues related to the LoadTable operator and table loading. * Fixes for issues related to so-called module-level code (MLC), that is AML that doesn't belong to any methods. * Change of the return value of the _OSI method to reflect the Windows behavior. * GAS (Generic Address Structure) support fix related to 32-bit FADT addresses. * Elimination of unnecessary FADT version 2 support. * ACPI tools fixes and cleanups. From Bob Moore, Lv Zheng, and Jung-uk Kim. - ACPI sysfs interface updates to fix GPE handling (on top of the new GPE masking mechanism in ACPICA) and issues related to table loading (Lv Zheng). - New watchdog driver based on the ACPI WDAT (ACPI Watchdog Action Table), needed on some platforms to replace the iTCO watchdog that doesn't work there and related updates of the intel_pmc_ipc, i2c/i801 and MFD/lcp_ich drivers (Mika Westerberg). - Driver core fix to prevent it from leaking secondary fwnode objects during device removal (Lukas Wunner). - New definitions of built-in properties for UART in ACPI-based x86 SoC drivers and a 8250_dw driver quirk for the APM X-Gene SoC (Heikki Krogerus). - New device ID for the Vulcan SPI controller and constification of local strucures in the AMD SoC (APD) ACPI driver (Kamlakant Patel, Julia Lawall). - Fix for a bug causing the allocation of PCI resorces to fail if ACPI-enumerated child platform devices are registered below the PCI devices in question (Mika Westerberg). - Change of the default polarity for PCI legacy IRQs to high on systems booting wth ACPI on platforms with a GIC interrupt controller model fixing the discrepancy between the specification and HW behavior (Lorenzo Pieralisi). - Fixes for the handling of system suspend/resume in the ACPI EC driver and update of that driver to make it cope with the cases when the EC device defined in the ECDT has to be used throughout the entire system life cycle (Lv Zheng). - Update of the ACPI CPPC library to allow it to batch requests sent over the PCC channel (to reduce overhead), to support the fixed functional hardware (FFH) CPPC registers access type, to notify the mailbox framework about TX completions when the interrupt flag is set for the PCC mailbox, and to support HW-Reduced Communication Subspace type 2 (Ashwin Chaugule, Prashanth Prakash, Srinivas Pandruvada, Hoan Tran). - ACPI button driver fix and documentation update related to the handling of laptop lids (Lv Zheng). - ACPI battery driver initialization fix (Carlos Garnacho). - ACPI GPIO enumeration documentation update (Mika Westerberg). - Assorted updates of the core ACPI bus type code (Lukas Wunner, Lv Zheng). - Assorted cleanups of the ACPI table parsing code and the x86-specific ACPI code (Al Stone). - Fixes for assorted ACPI-related issues found in linux-next (Wei Yongjun). -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABCAAGBQJX8Y5+AAoJEILEb/54YlRx73oP/RiAi86NKjOj+GfYceVe37jn 6lSqoMugjgTQHRYvYiQCjJ/BR0GzQZqUkz9TAu1Op14+rhTH3OhSfPizzJWCpVfA G9l9ZRQNnsKNs14bbYmWtmWduh46dFLVFJqo+M/0H3ZMFZu6Adcb+1SBtXHUoQ6L z69ngFxTu3yRvqS4cmm5h7SOx5W2uZZl8zViJW8jgyGhUBStG87gzR6wsYBldGCk XFxcaGWBXRccWGAQLSwfs0psQccEooCqbpsDqaUdrK/mI0rsQr88f25ZxEE7Zw7H bv3py1cgJBZRq36L7eBGQXjIE7YQey6qG2lug2zsUJWe+vzy2vHjHVJHuBXKKgv3 txOA6QZx63UgEyN3zFT7K5ek6uOnkKdeE+s+Laj+K/x4V2R6gbtgO011EVcXy+bI NvqsO76tfPHpwrn5s1VVc5lcEBEPHKHb+WulHrqhSSU4ivk0gtJDeSI+c8xta6YT XwSry5tozDLkG1uEZqkyY1XTlOUAHO8E6YcrlOv2z1+mG7L8OH/vCp1apzgexsZA 1683AH5cwKc3KaP+4QdKGdxY2BDxb7OTVh3cGy4kAYb6tqQ/vj7vlRiJvtaMBtFw xJn3buuagwJzKtgebpA565opvyFAfUX/RNFlTP63aXAefSAgq6KLq70vKFxkIZto H1LpUbmiEbuBml8CBGb1 =xDOQ -----END PGP SIGNATURE----- Merge tag 'acpi-4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI updates from Rafael Wysocki: "First off, the ACPICA code in the kernel is updated to upstream revision 20160831 that brings in a few bug fixes and cleanups. In particular, it is possible to mask GPEs now (and the sysfs interface for GPE control is fixed on top of that), problems related to the table loading mechanism are fixed and all code related to FADT version 2 (which has never been part of the ACPI specification) is dropped. On the new features front, there is a new watchdog driver based on the ACPI WDAT (ACPI Watchdog Action Table), needed on some platforms to replace the iTCO watchdog that doesn't work there, and some UART devices get new definitions of built-in properties (to be accessed via the generic device properties API). Also, included is a fix for an ACPI-related PCI resorces allocation issue and a few problems in the EC driver and in the button and battery drivers are fixed. In addition to that, the ACPI CPPC library is updated to make batching of requests sent over the PCC channel possible (which reduces the PCC usage overhead substantially in some cases) and to support functional fixed hardware (FFH) type of CPPC registers access (which will allow CPPC to be used on x86 too in the future). As usual, there are some assorted fixes and cleanups too. Specifics: - Update of the ACPICA code in the kernel to upstream revision 20160831 with the following major changes: * New mechanism for GPE masking. * Fixes for issues related to the LoadTable operator and table loading. * Fixes for issues related to so-called module-level code (MLC), that is AML that doesn't belong to any methods. * Change of the return value of the _OSI method to reflect the Windows behavior. * GAS (Generic Address Structure) support fix related to 32-bit FADT addresses. * Elimination of unnecessary FADT version 2 support. * ACPI tools fixes and cleanups. From Bob Moore, Lv Zheng, and Jung-uk Kim. - ACPI sysfs interface updates to fix GPE handling (on top of the new GPE masking mechanism in ACPICA) and issues related to table loading (Lv Zheng). - New watchdog driver based on the ACPI WDAT (ACPI Watchdog Action Table), needed on some platforms to replace the iTCO watchdog that doesn't work there and related updates of the intel_pmc_ipc, i2c/i801 and MFD/lcp_ich drivers (Mika Westerberg). - Driver core fix to prevent it from leaking secondary fwnode objects during device removal (Lukas Wunner). - New definitions of built-in properties for UART in ACPI-based x86 SoC drivers and a 8250_dw driver quirk for the APM X-Gene SoC (Heikki Krogerus). - New device ID for the Vulcan SPI controller and constification of local strucures in the AMD SoC (APD) ACPI driver (Kamlakant Patel, Julia Lawall). - Fix for a bug causing the allocation of PCI resorces to fail if ACPI-enumerated child platform devices are registered below the PCI devices in question (Mika Westerberg). - Change of the default polarity for PCI legacy IRQs to high on systems booting wth ACPI on platforms with a GIC interrupt controller model fixing the discrepancy between the specification and HW behavior (Lorenzo Pieralisi). - Fixes for the handling of system suspend/resume in the ACPI EC driver and update of that driver to make it cope with the cases when the EC device defined in the ECDT has to be used throughout the entire system life cycle (Lv Zheng). - Update of the ACPI CPPC library to allow it to batch requests sent over the PCC channel (to reduce overhead), to support the fixed functional hardware (FFH) CPPC registers access type, to notify the mailbox framework about TX completions when the interrupt flag is set for the PCC mailbox, and to support HW-Reduced Communication Subspace type 2 (Ashwin Chaugule, Prashanth Prakash, Srinivas Pandruvada, Hoan Tran). - ACPI button driver fix and documentation update related to the handling of laptop lids (Lv Zheng). - ACPI battery driver initialization fix (Carlos Garnacho). - ACPI GPIO enumeration documentation update (Mika Westerberg). - Assorted updates of the core ACPI bus type code (Lukas Wunner, Lv Zheng). - Assorted cleanups of the ACPI table parsing code and the x86-specific ACPI code (Al Stone). - Fixes for assorted ACPI-related issues found in linux-next (Wei Yongjun)" * tag 'acpi-4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (98 commits) ACPI / documentation: Use recommended name in GPIO property names watchdog: wdat_wdt: Fix warning for using 0 as NULL watchdog: wdat_wdt: fix return value check in wdat_wdt_probe() platform/x86: intel_pmc_ipc: Do not create iTCO watchdog when WDAT table exists i2c: i801: Do not create iTCO watchdog when WDAT table exists mfd: lpc_ich: Do not create iTCO watchdog when WDAT table exists ACPI / bus: Adjust ACPI subsystem initialization for new table loading mode ACPICA: Parser: Fix a regression in LoadTable support ACPICA: Tables: Fix "UNLOAD" code path lock issues ACPI / watchdog: Add support for WDAT hardware watchdog ACPI / platform: Pay attention to parent device's resources PCI: Add pci_find_resource() ACPI / CPPC: Support PCC with interrupt flag ACPI / sysfs: Update sysfs signature handling code ACPI / sysfs: Fix an issue for LoadTable opcode ACPICA: Tables: Fix a regression in acpi_tb_find_table() ACPI / tables: Remove duplicated include from tables.c ACPI / APD: constify local structures x86: ACPI: make variable names clearer in acpi_parse_madt_lapic_entries() x86: ACPI: remove extraneous white space after semicolon ...	2016-10-03 10:11:58 -07:00
Rafael J. Wysocki	0d573c6a01	Merge branches 'acpi-x86', 'acpi-cppc' and 'acpi-soc' * acpi-x86: x86: ACPI: make variable names clearer in acpi_parse_madt_lapic_entries() x86: ACPI: remove extraneous white space after semicolon * acpi-cppc: ACPI / CPPC: Support PCC with interrupt flag ACPI / CPPC: Add prefix cppc to cpudata structure name ACPI / CPPC: Add support for functional fixed hardware address ACPI / CPPC: Don't return on CPPC probe failure ACPI / CPPC: Allow build with ACPI_CPU_FREQ_PSS config ACPI / CPPC: check for error bit in PCC status field ACPI / CPPC: move all PCC related information into pcc_data ACPI / CPPC: add sysfs support to compute delivered performance ACPI / CPPC: set a non-zero value for transition_latency ACPI / CPPC: support for batching CPPC requests ACPI / CPPC: acquire pcc_lock only while accessing PCC subspace ACPI / CPPC: restructure read/writes for efficient sys mapped reg ops mailbox: pcc: Support HW-Reduced Communication Subspace type 2 * acpi-soc: ACPI / APD: constify local structures ACPI / APD: Add device HID for Vulcan SPI controller	2016-10-02 01:39:09 +02:00
Andy Lutomirski	05fb3c199b	x86/boot: Initialize FPU and X86_FEATURE_ALWAYS even if we don't have CPUID Otherwise arch_task_struct_size == 0 and we die. While we're at it, set X86_FEATURE_ALWAYS, too. Reported-by: David Saggiorato <david@saggiorato.net> Tested-by: David Saggiorato <david@saggiorato.net> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave@sr71.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Fixes: aaeb5c01c5b ("x86/fpu, sched: Introduce CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT and use it on x86") Link: http://lkml.kernel.org/r/8de723afbf0811071185039f9088733188b606c9.1475103911.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-09-30 13:53:04 +02:00
Andy Lutomirski	1ef55be16e	x86/asm: Get rid of __read_cr4_safe() We use __read_cr4() vs __read_cr4_safe() inconsistently. On CR4-less CPUs, all CR4 bits are effectively clear, so we can make the code simpler and more robust by making __read_cr4() always fix up faults on 32-bit kernels. This may fix some bugs on old 486-like CPUs, but I don't have any easy way to test that. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: david@saggiorato.net Link: http://lkml.kernel.org/r/ea647033d357d9ce2ad2bbde5a631045f5052fb6.1475178370.git.luto@kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-09-30 12:40:12 +02:00
Thomas Gleixner	d7e25c66c9	Merge branch 'x86/urgent' into x86/asm Get the cr4 fixes so we can apply the final cleanup	2016-09-30 12:38:28 +02:00
Andy Lutomirski	192d1dccbf	x86/boot: Fix another __read_cr4() case on 486 The condition for reading CR4 was wrong: there are some CPUs with CPUID but not CR4. Rather than trying to make the condition exact, use __read_cr4_safe(). Fixes: `18bc7bd523` ("x86/boot: Synchronize trampoline_cr4_features and mmu_cr4_features directly") Reported-by: david@saggiorato.net Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Link: http://lkml.kernel.org/r/8c453a61c4f44ab6ff43c29780ba04835234d2e5.1475178369.git.luto@kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-09-30 12:37:40 +02:00
Peter Zijlstra	cfd8983f03	x86, locking/spinlocks: Remove ticket (spin)lock implementation We've unconditionally used the queued spinlock for many releases now. Its time to remove the old ticket lock code. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <waiman.long@hpe.com> Cc: Waiman.Long@hpe.com Cc: david.vrabel@citrix.com Cc: dhowells@redhat.com Cc: pbonzini@redhat.com Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/20160518184302.GO3193@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-09-30 10:56:00 +02:00
Tim Chen	8f37961cf2	sched/core, x86/topology: Fix NUMA in package topology bug Current code can call set_cpu_sibling_map() and invoke sched_set_topology() more than once (e.g. on CPU hot plug). When this happens after sched_init_smp() has been called, we lose the NUMA topology extension to sched_domain_topology in sched_init_numa(). This results in incorrect topology when the sched domain is rebuilt. This patch fixes the bug and issues warning if we call sched_set_topology() after sched_init_smp(). Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bp@suse.de Cc: jolsa@redhat.com Cc: rjw@rjwysocki.net Link: http://lkml.kernel.org/r/1474485552-141429-2-git-send-email-srinivas.pandruvada@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-09-30 10:53:18 +02:00
Dave Airlie	ca09fb9f60	Linux 4.8-rc8 -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJX6H4uAAoJEHm+PkMAQRiG5sMH/3yzrMiUCSokdS+cvY+jgKAG JS58JmRvBPz2mRaU3MRPBGRDeCz/Nc9LggL2ZcgM+E1ZYirlYyQfIED3lkqk5R07 kIN1wmb+kQhXyU4IY3fEX7joqyKC6zOy4DUChPkBQU0/0+VUmdVmcJvsuPlnMZtf g95m0BdYTui+eDezASRqOEp3Lb5ONL4c3ao4yBP0LHF033ctj3VJQiyi5uERPZJ0 5e6Mo7Wxn78t9WqJLQAiEH46kTwT2plNlxf3XXqTenfIdbWhqE873HPGeSMa3VQV VywXTpCpSPQsA8BYg66qIbebdKOhs9MOviHVfqDtwQlvwhjlBDya0gNHfI5fSy4= =Y/L5 -----END PGP SIGNATURE----- Merge tag 'v4.8-rc8' into drm-next Linux 4.8-rc8 There was a lot of fallout in the imx/amdgpu/i915 drivers, so backmerge it now to avoid troubles. * tag 'v4.8-rc8': (1442 commits) Linux 4.8-rc8 fault_in_multipages_readable() throws set-but-unused error mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing radix tree: fix sibling entry handling in radix_tree_descend() radix tree test suite: Test radix_tree_replace_slot() for multiorder entries fix memory leaks in tracing_buffers_splice_read() tracing: Move mutex to protect against resetting of seq data MIPS: Fix delay slot emulation count in debugfs MIPS: SMP: Fix possibility of deadlock when bringing CPUs online mm: delete unnecessary and unsafe init_tlb_ubc() huge tmpfs: fix Committed_AS leak shmem: fix tmpfs to handle the huge= option properly blk-mq: skip unmapped queues in blk_mq_alloc_request_hctx MIPS: Fix pre-r6 emulation FPU initialisation arm64: kgdb: handle read-only text / modules arm64: Call numa_store_cpu_info() earlier. locking/hung_task: Fix typo in CONFIG_DETECT_HUNG_TASK help text nvme-rdma: only clear queue flags after successful connect i2c: qup: skip qup_i2c_suspend if the device is already runtime suspended perf/core: Limit matching exclusive events to one PMU ...	2016-09-28 12:08:49 +10:00
Thomas Gleixner	eb6296dec1	x86/apic: Fix silent & fatal merge conflict in __generic_processor_info() Fix up the silent merge conflict between commit `c291b01515` in x86/urgent and commit `f7c28833c2` in x86/apic which both remove num_processors++ from the original location and then add it at two different locations. As a result num_processors is incremented twice which can cut the number of available cpus in half. Remove the one which is added by commit `c291b01515`. In hindsight I should have merged x86/urgent into x86/apic _before_ adding the nodeid bits, but in hindsight we are always smarter. Reported-and-tested-by: Borislav Petkov <bp@alien8.de> Reported-by: Mike Galbraith <umgwanakikbuti@gmail.com> Fixes: `1e1b37273c` ("Merge branch 'x86/urgent' into x86/apic") Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1609261350090.5483@nanos Cc: Dou Liyang <douly.fnst@cn.fujitsu.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-09-26 15:51:22 -04:00
Thomas Gleixner	1e1b37273c	Merge branch 'x86/urgent' into x86/apic Bring in the upstream modifications so we can fixup the silent merge conflict which is introduced by this merge. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-09-26 15:47:03 -04:00
Ingo Molnar	6fae257f0b	Linux 4.8-rc8 -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJX6H4uAAoJEHm+PkMAQRiG5sMH/3yzrMiUCSokdS+cvY+jgKAG JS58JmRvBPz2mRaU3MRPBGRDeCz/Nc9LggL2ZcgM+E1ZYirlYyQfIED3lkqk5R07 kIN1wmb+kQhXyU4IY3fEX7joqyKC6zOy4DUChPkBQU0/0+VUmdVmcJvsuPlnMZtf g95m0BdYTui+eDezASRqOEp3Lb5ONL4c3ao4yBP0LHF033ctj3VJQiyi5uERPZJ0 5e6Mo7Wxn78t9WqJLQAiEH46kTwT2plNlxf3XXqTenfIdbWhqE873HPGeSMa3VQV VywXTpCpSPQsA8BYg66qIbebdKOhs9MOviHVfqDtwQlvwhjlBDya0gNHfI5fSy4= =Y/L5 -----END PGP SIGNATURE----- Merge tag 'v4.8-rc8' into ras/core, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-09-26 11:12:45 +02:00
Dan Williams	917db484dc	x86/boot: Fix kdump, cleanup aborted E820_PRAM max_pfn manipulation In commit: `ec776ef6bb` ("x86/mm: Add support for the non-standard protected e820 type") Christoph references the original patch I wrote implementing pmem support. The intent of the 'max_pfn' changes in that commit were to enable persistent memory ranges to be covered by the struct page memmap by default. However, that approach was abandoned when Christoph ported the patches [1], and that functionality has since been replaced by devm_memremap_pages(). In the meantime, this max_pfn manipulation is confusing kdump [2] that assumes that everything covered by the max_pfn is "System RAM". This results in kdump hanging or crashing. [1]: https://lists.01.org/pipermail/linux-nvdimm/2015-March/000348.html [2]: https://bugzilla.redhat.com/show_bug.cgi?id=1351098 So fix it. Reported-by: Zhang Yi <yizhan@redhat.com> Reported-by: Jeff Moyer <jmoyer@redhat.com> Tested-by: Zhang Yi <yizhan@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Cc: <stable@vger.kernel.org> # v4.1 and later kernels Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Boaz Harrosh <boaz@plexistor.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-nvdimm@lists.01.org Fixes: `ec776ef6bb` ("x86/mm: Add support for the non-standard protected e820 type") Link: http://lkml.kernel.org/r/147448744538.34910.11287693517367139607.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-09-22 12:26:48 +02:00
Gu Zheng	dc6db24d24	x86/acpi: Set persistent cpuid <-> nodeid mapping when booting The whole patch-set aims at making cpuid <-> nodeid mapping persistent. So that, when node online/offline happens, cache based on cpuid <-> nodeid mapping such as wq_numa_possible_cpumask will not cause any problem. It contains 4 steps: 1. Enable apic registeration flow to handle both enabled and disabled cpus. 2. Introduce a new array storing all possible cpuid <-> apicid mapping. 3. Enable _MAT and MADT relative apis to return non-present or disabled cpus' apicid. 4. Establish all possible cpuid <-> nodeid mapping. This patch finishes step 4. This patch set the persistent cpuid <-> nodeid mapping for all enabled/disabled processors at boot time via an additional acpi namespace walk for processors. [ tglx: Remove the unneeded exports ] Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com> Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: mika.j.penttila@gmail.com Cc: len.brown@intel.com Cc: rafael@kernel.org Cc: rjw@rjwysocki.net Cc: yasu.isimatu@gmail.com Cc: linux-mm@kvack.org Cc: linux-acpi@vger.kernel.org Cc: isimatu.yasuaki@jp.fujitsu.com Cc: gongzhaogang@inspur.com Cc: tj@kernel.org Cc: izumi.taku@jp.fujitsu.com Cc: cl@linux.com Cc: chen.tang@easystack.cn Cc: akpm@linux-foundation.org Cc: kamezawa.hiroyu@jp.fujitsu.com Cc: lenb@kernel.org Link: http://lkml.kernel.org/r/1472114120-3281-6-git-send-email-douly.fnst@cn.fujitsu.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-09-21 21:18:39 +02:00
Gu Zheng	8f54969dc8	x86/acpi: Introduce persistent storage for cpuid <-> apicid mapping The whole patch-set aims at making cpuid <-> nodeid mapping persistent. So that, when node online/offline happens, cache based on cpuid <-> nodeid mapping such as wq_numa_possible_cpumask will not cause any problem. It contains 4 steps: 1. Enable apic registeration flow to handle both enabled and disabled cpus. 2. Introduce a new array storing all possible cpuid <-> apicid mapping. 3. Enable _MAT and MADT relative apis to return non-present or disabled cpus' apicid. 4. Establish all possible cpuid <-> nodeid mapping. This patch finishes step 2. In this patch, we introduce a new static array named cpuid_to_apicid[], which is large enough to store info for all possible cpus. And then, we modify the cpuid calculation. In generic_processor_info(), it simply finds the next unused cpuid. And it is also why the cpuid <-> nodeid mapping changes with node hotplug. After this patch, we find the next unused cpuid, map it to an apicid, and store the mapping in cpuid_to_apicid[], so that cpuid <-> apicid mapping will be persistent. And finally we will use this array to make cpuid <-> nodeid persistent. cpuid <-> apicid mapping is established at local apic registeration time. But non-present or disabled cpus are ignored. In this patch, we establish all possible cpuid <-> apicid mapping when registering local apic. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com> Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: mika.j.penttila@gmail.com Cc: len.brown@intel.com Cc: rafael@kernel.org Cc: rjw@rjwysocki.net Cc: yasu.isimatu@gmail.com Cc: linux-mm@kvack.org Cc: linux-acpi@vger.kernel.org Cc: isimatu.yasuaki@jp.fujitsu.com Cc: gongzhaogang@inspur.com Cc: tj@kernel.org Cc: izumi.taku@jp.fujitsu.com Cc: cl@linux.com Cc: chen.tang@easystack.cn Cc: akpm@linux-foundation.org Cc: kamezawa.hiroyu@jp.fujitsu.com Cc: lenb@kernel.org Link: http://lkml.kernel.org/r/1472114120-3281-4-git-send-email-douly.fnst@cn.fujitsu.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-09-21 21:18:38 +02:00
Gu Zheng	f7c28833c2	x86/acpi: Enable acpi to register all possible cpus at boot time cpuid <-> nodeid mapping is firstly established at boot time. And workqueue caches the mapping in wq_numa_possible_cpumask in wq_numa_init() at boot time. When doing node online/offline, cpuid <-> nodeid mapping is established/destroyed, which means, cpuid <-> nodeid mapping will change if node hotplug happens. But workqueue does not update wq_numa_possible_cpumask. So here is the problem: Assume we have the following cpuid <-> nodeid in the beginning: Node \| CPU ------------------------ node 0 \| 0-14, 60-74 node 1 \| 15-29, 75-89 node 2 \| 30-44, 90-104 node 3 \| 45-59, 105-119 and we hot-remove node2 and node3, it becomes: Node \| CPU ------------------------ node 0 \| 0-14, 60-74 node 1 \| 15-29, 75-89 and we hot-add node4 and node5, it becomes: Node \| CPU ------------------------ node 0 \| 0-14, 60-74 node 1 \| 15-29, 75-89 node 4 \| 30-59 node 5 \| 90-119 But in wq_numa_possible_cpumask, cpu30 is still mapped to node2, and the like. When a pool workqueue is initialized, if its cpumask belongs to a node, its pool->node will be mapped to that node. And memory used by this workqueue will also be allocated on that node. static struct worker_pool get_unbound_pool(const struct workqueue_attrs attrs){ ... /* if cpumask is contained inside a NUMA node, we belong to that node / if (wq_numa_enabled) { for_each_node(node) { if (cpumask_subset(pool->attrs->cpumask, wq_numa_possible_cpumask[node])) { pool->node = node; break; } } } Since wq_numa_possible_cpumask is not updated, it could be mapped to an offline node, which will lead to memory allocation failure: SLUB: Unable to allocate memory on node 2 (gfp=0x80d0) cache: kmalloc-192, object size: 192, buffer size: 192, default order: 1, min order: 0 node 0: slabs: 6172, objs: 259224, free: 245741 node 1: slabs: 3261, objs: 136962, free: 127656 It happens here: create_worker(struct worker_pool pool) \|--> worker = alloc_worker(pool->node); static struct worker alloc_worker(int node) { struct worker worker; worker = kzalloc_node(sizeof(*worker), GFP_KERNEL, node); --> Here, useing the wrong node. ...... return worker; } [Solution] There are four mappings in the kernel: 1. nodeid (logical node id) <-> pxm 2. apicid (physical cpu id) <-> nodeid 3. cpuid (logical cpu id) <-> apicid 4. cpuid (logical cpu id) <-> nodeid 1. pxm (proximity domain) is provided by ACPI firmware in SRAT, and nodeid <-> pxm mapping is setup at boot time. This mapping is persistent, won't change. 2. apicid <-> nodeid mapping is setup using info in 1. The mapping is setup at boot time and CPU hotadd time, and cleared at CPU hotremove time. This mapping is also persistent. 3. cpuid <-> apicid mapping is setup at boot time and CPU hotadd time. cpuid is allocated, lower ids first, and released at CPU hotremove time, reused for other hotadded CPUs. So this mapping is not persistent. 4. cpuid <-> nodeid mapping is also setup at boot time and CPU hotadd time, and cleared at CPU hotremove time. As a result of 3, this mapping is not persistent. To fix this problem, we establish cpuid <-> nodeid mapping for all the possible cpus at boot time, and make it persistent. And according to init_cpu_to_node(), cpuid <-> nodeid mapping is based on apicid <-> nodeid mapping and cpuid <-> apicid mapping. So the key point is obtaining all cpus' apicid. apicid can be obtained by _MAT (Multiple APIC Table Entry) method or found in MADT (Multiple APIC Description Table). So we finish the job in the following steps: 1. Enable apic registeration flow to handle both enabled and disabled cpus. This is done by introducing an extra parameter to generic_processor_info to let the caller control if disabled cpus are ignored. 2. Introduce a new array storing all possible cpuid <-> apicid mapping. And also modify the way cpuid is calculated. Establish all possible cpuid <-> apicid mapping when registering local apic. Store the mapping in this array. 3. Enable _MAT and MADT relative apis to return non-present or disabled cpus' apicid. This is also done by introducing an extra parameter to these apis to let the caller control if disabled cpus are ignored. 4. Establish all possible cpuid <-> nodeid mapping. This is done via an additional acpi namespace walk for processors. This patch finished step 1. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com> Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: mika.j.penttila@gmail.com Cc: len.brown@intel.com Cc: rafael@kernel.org Cc: rjw@rjwysocki.net Cc: yasu.isimatu@gmail.com Cc: linux-mm@kvack.org Cc: linux-acpi@vger.kernel.org Cc: isimatu.yasuaki@jp.fujitsu.com Cc: gongzhaogang@inspur.com Cc: tj@kernel.org Cc: izumi.taku@jp.fujitsu.com Cc: cl@linux.com Cc: chen.tang@easystack.cn Cc: akpm@linux-foundation.org Cc: kamezawa.hiroyu@jp.fujitsu.com Cc: lenb@kernel.org Link: http://lkml.kernel.org/r/1472114120-3281-3-git-send-email-douly.fnst@cn.fujitsu.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-09-21 21:18:38 +02:00
Denys Vlasenko	1827822902	x86/e820: Use much less memory for e820/e820_saved, save up to 120k The maximum size of e820 map array for EFI systems is defined as E820_X_MAX (E820MAX + 3 * MAX_NUMNODES). In x86_64 defconfig, this ends up with E820_X_MAX = 320, e820 and e820_saved are 6404 bytes each. With larger configs, for example Fedora kernels, E820_X_MAX = 3200, e820 and e820_saved are 64004 bytes each. Most of this space is wasted. Typical machines have some 20-30 e820 areas at most. After previous patch, e820 and e820_saved are pointers to e280 maps. Change them to initially point to maps which are __initdata. At the very end of kernel init, just before __init[data] sections are freed in free_initmem(), allocate smaller blocks, copy maps there, and change pointers. The late switch makes sure that all functions which can be used to change e820 maps are no longer accessible (they are all __init functions). Run-tested. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20160918182125.21000-1-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-09-21 15:02:12 +02:00
Denys Vlasenko	475339684e	x86/e820: Prepare e280 code for switch to dynamic storage This patch turns e820 and e820_saved into pointers to e820 tables, of the same size as before. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20160917213927.1787-2-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-09-21 15:02:12 +02:00
Denys Vlasenko	8c2103f224	x86/e820: Mark some static functions __init They are all called only from other __init functions in e820.c Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20160917213927.1787-1-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-09-21 15:02:11 +02:00
Ingo Molnar	580498a23b	Merge branch 'linus' into x86/boot, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-09-21 15:01:57 +02:00
Josh Poimboeuf	71f5443ebb	x86/dumpstack: Fix show_stack() task pointer regression With the following commit: `e18bcccd1a` ("x86/dumpstack: Convert show_trace_log_lvl() to use the new unwinder") The task pointer argument to show_stack_log_lvl() in show_stack() was inadvertently changed to 'current'. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: byungchul.park@lge.com Cc: fweisbec@gmail.com Cc: keescook@chromium.org Cc: linux-tip-commits@vger.kernel.org Cc: luto@amacapital.net Cc: nilayvaish@gmail.com Cc: rostedt@goodmis.org Cc: tip-bot for Josh Poimboeuf <tipbot@zytor.com> Fixes: `e18bcccd1a` ("x86/dumpstack: Convert show_trace_log_lvl() to use the new unwinder") Link: http://lkml.kernel.org/r/20160920155340.yhewlx7vmgmov5fb@treble Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-09-20 23:36:37 +02:00
Ingo Molnar	41a66072c3	Merge branch 'efi/urgent' into efi/core, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-09-20 16:58:59 +02:00
Paolo Bonzini	108b249c45	KVM: x86: introduce get_kvmclock_ns Introduce a function that reads the exact nanoseconds value that is provided to the guest in kvmclock. This crystallizes the notion of kvmclock as a thin veneer over a stable TSC, that the guest will (hopefully) convert with NTP. In other words, kvmclock is not a paravirtualized host-to-guest NTP. Drop the get_kernel_ns() function, that was used both to get the base value of the master clock and to get the current value of kvmclock. The former use is replaced by ktime_get_boot_ns(), the latter is the purpose of get_kernel_ns(). This also allows KVM to provide a Hyper-V time reference counter that is synchronized with the time that is computed from the TSC page. Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-09-20 09:26:15 +02:00
Josh Poimboeuf	c8fe460982	x86/dumpstack: Remove dump_trace() and related callbacks All previous users of dump_trace() have been converted to use the new unwind interfaces, so we can remove it and the related print_context_stack() and print_context_stack_bp() callback functions. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nilay Vaish <nilayvaish@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/5b97da3572b40b5a4d8e185cf2429308d0987a13.1474045023.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-09-20 08:29:34 +02:00
Josh Poimboeuf	e18bcccd1a	x86/dumpstack: Convert show_trace_log_lvl() to use the new unwinder Convert show_trace_log_lvl() to use the new unwinder. dump_trace() has been deprecated. show_trace_log_lvl() is special compared to other users of the unwinder. It's the only place where both reliable and unreliable addresses are needed. With frame pointers enabled, most callers of the unwinder don't want to know about unreliable addresses. But in this case, when we're dumping the stack to the console because something presumably went wrong, the unreliable addresses are useful: - They show stale data on the stack which can provide useful clues. - If something goes wrong with the unwinder, or if frame pointers are corrupt or missing, all the stack addresses still get shown. So in order to show all addresses on the stack, and at the same time figure out which addresses are reliable, we have to do the scanning and the unwinding in parallel. The scanning is done with the help of get_stack_info() to traverse the stacks. The unwinding is done separately by the new unwinder. In theory we could simplify show_trace_log_lvl() by instead pushing some of this logic into the unwind code. But then we would need some kind of "fake" frame logic in the unwinder which would add a lot of complexity and wouldn't be worth it in order to support only one user. Another benefit of this approach is that once we have a DWARF unwinder, we should be able to just plug it in with minimal impact to this code. Another change here is that callers of show_trace_log_lvl() don't need to provide the 'bp' argument. The unwinder already finds the relevant frame pointer by unwinding until it reaches the first frame after the provided stack pointer. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nilay Vaish <nilayvaish@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/703b5998604c712a1f801874b43f35d6dac52ede.1474045023.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-09-20 08:29:34 +02:00
Josh Poimboeuf	49a612c6b0	x86/stacktrace: Convert save_stack_trace_() to use the new unwinder Convert save_stack_trace_() to use the new unwinder. dump_trace() has been deprecated. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nilay Vaish <nilayvaish@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/815494c627d89887db0ce56ceffd58ad16ee6c21.1474045023.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-09-20 08:29:33 +02:00
Josh Poimboeuf	7c7900f897	x86/unwind: Add new unwind interface and implementations The x86 stack dump code is a bit of a mess. dump_trace() uses callbacks, and each user of it seems to have slightly different requirements, so there are several slightly different callbacks floating around. Also there are some upcoming features which will need more changes to the stack dump code, including the printing of stack pt_regs, reliable stack detection for live patching, and a DWARF unwinder. Each of those features would at least need more callbacks and/or callback interfaces, resulting in a much bigger mess than what we have today. Before doing all that, we should try to clean things up and replace dump_trace() with something cleaner and more flexible. The new unwinder is a simple state machine which was heavily inspired by a suggestion from Andy Lutomirski: https://lkml.kernel.org/r/CALCETrUbNTqaM2LRyXGRx=kVLRPeY5A3Pc6k4TtQxF320rUT=w@mail.gmail.com It's also similar to the libunwind API: http://www.nongnu.org/libunwind/man/libunwind(3).html Some if its advantages: - Simplicity: no more callback sprawl and less code duplication. - Flexibility: it allows the caller to stop and inspect the stack state at each step in the unwinding process. - Modularity: the unwinder code, console stack dump code, and stack metadata analysis code are all better separated so that changing one of them shouldn't have much of an impact on any of the others. Two implementations are added which conform to the new unwind interface: - The frame pointer unwinder which is used for CONFIG_FRAME_POINTER=y. - The "guess" unwinder which is used for CONFIG_FRAME_POINTER=n. This isn't an "unwinder" per se. All it does is scan the stack for kernel text addresses. But with no frame pointers, guesses are better than nothing in most cases. Suggested-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nilay Vaish <nilayvaish@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/6dc2f909c47533d213d0505f0a113e64585bec82.1474045023.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-09-20 08:29:33 +02:00
Paul Gortmaker	744c193eb9	x86: Migrate exception table users off module.h and onto extable.h These files were only including module.h for exception table related functions. We've now separated that content out into its own file "extable.h" so now move over to that and avoid all the extra header content in module.h that we don't really need to compile these files. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Acked-by: Ingo Molnar <mingo@kernel.org> Link: http://lkml.kernel.org/r/20160919210418.30243-1-paul.gortmaker@windriver.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-09-20 01:05:43 +02:00
Prarit Bhargava	6baf3d6182	x86/tsc: Add additional Intel CPU models to the crystal quirk list commit `aa297292d7` ("x86/tsc: Enumerate SKL cpu_khz and tsc_khz via CPUID") added code to retrieve the crystal and TSC frequency from CPUID leaves. If the crystal freqency is enumerated as 0,the resulting TSC frequency is 0 as well. For CPUs with a known fixed crystal frequency a quirk list is available to set the frequency, Kabylake and SkylakeX CPUs are missing in the list of CPUs which need this quirk. Add them so the TSC frequency can be calculated correctly. [ tglx: Removed the silly default case as the switch() is only invoked when cpu_khz is 0. Massaged changelog. ] Signed-off-by: Prarit Bhargava <prarit@redhat.com> Cc: Len Brown <len.brown@intel.com> Cc: Rafael Aquini <aquini@redhat.com> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Link: http://lkml.kernel.org/r/1474289501-31717-3-git-send-email-prarit@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-09-20 01:00:32 +02:00
Prarit Bhargava	655e52d2b6	x86/tsc: Use cpu id defines instead of hex constants asm/intel-family.h contains defines for cpu ids which should be used instead of hex constants. Convert the switch case in native_calibrate_tsc() to use the defines before adding more cpu models. [ tglx: Massaged changelog ] Signed-off-by: Prarit Bhargava <prarit@redhat.com> Cc: Len Brown <len.brown@intel.com> Cc: Rafael Aquini <aquini@redhat.com> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Link: http://lkml.kernel.org/r/1474289501-31717-2-git-send-email-prarit@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-09-20 01:00:32 +02:00
Denys Vlasenko	cff9ab2b29	x86/apic: Get rid of apic_version[] array The array has a size of MAX_LOCAL_APIC, which can be as large as 32k, so it can consume up to 128k. The array has been there forever and was never used for anything useful other than a version mismatch check which was introduced in 2009. There is no reason to store the version in an array. The kernel is not prepared to handle different APIC versions anyway, so the real important part is to detect a version mismatch and warn about it, which can be done with a single variable as well. [ tglx: Massaged changelog ] Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> CC: Andy Lutomirski <luto@amacapital.net> CC: Borislav Petkov <bp@alien8.de> CC: Brian Gerst <brgerst@gmail.com> CC: Mike Travis <travis@sgi.com> Link: http://lkml.kernel.org/r/20160913181232.30815-1-dvlasenk@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-09-20 00:31:19 +02:00
Vinson Lee	6e68b08728	x86/vdso: Use CONFIG_X86_X32_ABI to enable vdso prctl The prctl code which references vdso_image_x32 is built when CONFIG_X86_X32 is set. This results in the following build failure: LD init/built-in.o arch/x86/built-in.o: In function `do_arch_prctl': (.text+0x27466): undefined reference to `vdso_image_x32' vdso_image_x32 depends on CONFIG_X86_X32_ABI. So we need to make the prctl depend on that as well. [ tglx: Massaged changelog ] Fixes: `2eefd87896` ("x86/arch_prctl/vdso: Add ARCH_MAP_VDSO_*") Signed-off-by: Vinson Lee <vlee@freedesktop.org> Reviewed-by: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Borislav Petkov <bp@suse.de> Cc: Dmitry Vyukov <dvyukov@google.com> Link: http://lkml.kernel.org/r/1474073513-6656-1-git-send-email-vlee@freedesktop.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-09-20 00:01:48 +02:00
Sebastian Andrzej Siewior	b067a7be41	x86/apic/uv: Convert to hotplug state machine Install the callbacks via the state machine. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20160906170457.32393-19-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-09-19 21:44:33 +02:00
Sebastian Andrzej Siewior	29bd7fbc07	x86/microcode: Convert to hotplug state machine Install the callbacks via the state machine. CPU_UP_CANCELED_FROZEN() is not preserved: It is only there to free memory in an error case because it is assumed if the CPU does show up on resume it won't be seen ever again. As per Borislav: \|IOW, you don't need mc_cpu_dead(). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20160907164523.46a2xnffha4bv63g@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-09-19 21:44:27 +02:00
Tejun Heo	a2c2727d20	mce, workqueue: remove keventd_up() usage Now that workqueue can handle work item queueing from very early during boot, there is no need to gate schedule_work() with keventd_up(). Remove it. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: linux-edac@vger.kernel.org	2016-09-17 13:18:21 -04:00
Josh Poimboeuf	81539169f2	x86/dumpstack: Remove NULL task pointer convention show_stack_log_lvl() and friends allow a NULL pointer for the task_struct to indicate the current task. This creates confusion and can cause sneaky bugs. Instead require the caller to pass 'current' directly. This only changes the internal workings of the dumpstack code. The dump_trace() and show_stack() interfaces still allow a NULL task pointer. Those interfaces should also probably be fixed as well. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-09-16 16:21:39 +02:00
Andy Lutomirski	74327a3e88	x86/process: Pin the target stack in get_wchan() This will prevent a crash if get_wchan() runs after the task stack is freed. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jann Horn <jann@thejh.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/337aeca8614024aa4d8d9c81053bbf8fcffbe4ad.1474003868.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-09-16 09:18:53 +02:00
Andy Lutomirski	1959a60182	x86/dumpstack: Pin the target stack when dumping it Specifically, pin the stack in save_stack_trace_tsk() and show_trace_log_lvl(). This will prevent a crash if the target task dies before or while dumping its stack once we start freeing task stacks early. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jann Horn <jann@thejh.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/cf0082cde65d1941a996d026f2b2cdbfaca17bfa.1474003868.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-09-16 09:18:53 +02:00
Ingo Molnar	91b7bd39e6	x86/vdso: Only define prctl_map_vdso() if CONFIG_CHECKPOINT_RESTORE ... otherwise the compiler complains: arch/x86/kernel/process_64.c:528:13: warning: ‘prctl_map_vdso’ defined but not used [-Wunused-function] Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: gorcunov@openvz.org Cc: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org Cc: oleg@redhat.com Cc: xemul@virtuozzo.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-09-15 08:45:39 +02:00
Andy Lutomirski	15f4eae70d	x86: Move thread_info into task_struct Now that most of the thread_info users have been cleaned up, this is straightforward. Most of this code was written by Linus. Originally-from: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jann Horn <jann@thejh.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/a50eab40abeaec9cb9a9e3cbdeafd32190206654.1473801993.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-09-15 08:25:13 +02:00
Andy Lutomirski	b9d989c721	x86/asm: Move the thread_info::status field to thread_struct Because sched.h and thread_info.h are a tangled mess, I turned in_compat_syscall() into a macro. If we had current_thread_struct() or similar and we could use it from thread_info.h, then this would be a bit cleaner. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jann Horn <jann@thejh.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/ccc8a1b2f41f9c264a41f771bb4a6539a642ad72.1473801993.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-09-15 08:25:12 +02:00
Ingo Molnar	d4b80afbba	Merge branch 'linus' into x86/asm, to pick up recent fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-09-15 08:24:53 +02:00
Josh Poimboeuf	fcd709ef20	x86/dumpstack: Add recursion checking for all stacks in_exception_stack() has some recursion checking which makes sure the stack trace code never traverses a given exception stack more than once. This prevents an infinite loop if corruption somehow causes a stack's "next stack" pointer to point to itself (directly or indirectly). The recursion checking can be useful for other stacks in addition to the exception stack, so extend it to work for all stacks. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nilay Vaish <nilayvaish@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/95de5db4cfe111754845a5cef04e20630d01423f.1473905218.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-09-15 08:13:15 +02:00
Josh Poimboeuf	5fe599e02e	x86/dumpstack: Add support for unwinding empty IRQ stacks When an interrupt happens in entry code while running on a software IRQ stack, and the IRQ stack was empty, regs->sp will contain the stack end address (e.g., irq_stack_ptr). If the regs are passed to dump_trace(), get_stack_info() will report STACK_TYPE_UNKNOWN, causing dump_trace() to return prematurely without trying to go to the next stack. Update the bounds checking for software interrupt stacks so that the ending address is now considered part of the stack. This means that it's now possible for the 'walk_stack' callbacks -- print_context_stack() and print_context_stack_bp() -- to be called with an empty stack. But that's fine; they're already prepared to deal with that due to their on_stack() checks. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nilay Vaish <nilayvaish@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/5a5e5de92dcf11e8dc6b6e8e50ad7639d067830b.1473905218.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-09-15 08:13:15 +02:00
Josh Poimboeuf	cb76c93982	x86/dumpstack: Add get_stack_info() interface valid_stack_ptr() is buggy: it assumes that all stacks are of size THREAD_SIZE, which is not true for exception stacks. So the walk_stack() callbacks will need to know the location of the beginning of the stack as well as the end. Another issue is that in general the various features of a stack (type, size, next stack pointer, description string) are scattered around in various places throughout the stack dump code. Encapsulate all that information in a single place with a new stack_info struct and a get_stack_info() interface. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nilay Vaish <nilayvaish@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/8164dd0db96b7e6a279fa17ae5e6dc375eecb4a9.1473905218.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-09-15 08:13:15 +02:00
Josh Poimboeuf	9c00390757	x86/dumpstack: Simplify in_exception_stack() in_exception_stack() does some bad, bad things just so the unwinder can print different values for different areas of the debug exception stack. There's no need to clarify where exactly on the stack it is. Just print "#DB" and be done with it. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nilay Vaish <nilayvaish@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/e91cb410169dd576678dd427c35efb716fd0cee1.1473905218.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-09-15 08:13:14 +02:00
Dmitry Safonov	6846351052	x86/signal: Add SA_{X32,IA32}_ABI sa_flags Introduce new flags that defines which ABI to use on creating sigframe. Those flags kernel will set according to sigaction syscall ABI, which set handler for the signal being delivered. So that will drop the dependency on TIF_IA32/TIF_X32 flags on signal deliver. Those flags will be used only under CONFIG_COMPAT. Similar way ARM uses sa_flags to differ in which mode deliver signal for 26-bit applications (look at SA_THIRYTWO). Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: 0x7f454c46@gmail.com Cc: oleg@redhat.com Cc: linux-mm@kvack.org Cc: gorcunov@openvz.org Cc: xemul@virtuozzo.com Link: http://lkml.kernel.org/r/20160905133308.28234-7-dsafonov@virtuozzo.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-09-14 21:28:11 +02:00
Dmitry Safonov	cc87324b3d	x86/ptrace: Down with test_thread_flag(TIF_IA32) As the task isn't executing at the moment of {GET,SET}REGS, return regset that corresponds to code selector, rather than value of TIF_IA32 flag. I.e. if we ptrace i386 elf binary that has just changed it's code selector to __USER_CS, than GET_REGS will return full x86_64 register set. Note, that this will work only if application has changed it's CS. If the application does 32-bit syscall with __USER_CS, ptrace will still return 64-bit register set. Which might be still confusing for tools that expect TS_COMPACT to be exposed [1, 2]. So this this change should make PTRACE_GETREGSET more reliable and this will be another step to drop TIF_{IA32,X32} flags. [1]: https://sourceforge.net/p/strace/mailman/message/30471411/ [2]: https://lkml.org/lkml/2012/1/18/320 Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: 0x7f454c46@gmail.com Cc: oleg@redhat.com Cc: linux-mm@kvack.org Cc: luto@kernel.org Cc: Pedro Alves <palves@redhat.com> Cc: gorcunov@openvz.org Cc: xemul@virtuozzo.com Link: http://lkml.kernel.org/r/20160905133308.28234-6-dsafonov@virtuozzo.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-09-14 21:28:11 +02:00
Dmitry Safonov	2eefd87896	x86/arch_prctl/vdso: Add ARCH_MAP_VDSO_* Add API to change vdso blob type with arch_prctl. As this is usefull only by needs of CRIU, expose this interface under CONFIG_CHECKPOINT_RESTORE. Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: 0x7f454c46@gmail.com Cc: oleg@redhat.com Cc: linux-mm@kvack.org Cc: gorcunov@openvz.org Cc: xemul@virtuozzo.com Link: http://lkml.kernel.org/r/20160905133308.28234-4-dsafonov@virtuozzo.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-09-14 21:28:09 +02:00
Josh Poimboeuf	cfeeed279d	x86/dumpstack: Allow preemption in show_stack_log_lvl() and dump_trace() show_stack_log_lvl() and dump_trace() are already preemption safe: - If they're running in irq or exception context, preemption is already disabled and the percpu stack pointers can be trusted. - If they're running with preemption enabled, they must be running on the task stack anyway, so it doesn't matter if they're comparing the stack pointer against a percpu stack pointer from this CPU or another one: either way it won't match. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nilay Vaish <nilayvaish@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/a0ca0b1044eca97d4f0ec7c1619cf80b3b65560d.1473371307.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-09-14 17:23:30 +02:00
Linus Torvalds	5924bbecd0	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Three fixes: - AMD microcode loading fix with randomization - an lguest tooling fix - and an APIC enumeration boundary condition fix" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/apic: Fix num_processors value in case of failure tools/lguest: Don't bork the terminal in case of wrong args x86/microcode/AMD: Fix load of builtin microcode with randomized memory	2016-09-13 12:52:45 -07:00
Masahiro Yamada	f148b41e8b	x86: Clean up various simple wrapper functions Remove unneeded variables and assignments. While we are here, let's fix the following as well: - Remove unnecessary parentheses - Remove unnecessary unsigned-suffix 'U' from constant values - Reword the comment in set_apic_id() (suggested by Thomas Gleixner) Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andrew Banman <abanman@sgi.com> Cc: Borislav Petkov <bp@suse.de> Cc: Daniel J Blueman <daniel@numascale.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Mike Travis <travis@sgi.com> Cc: Nathan Zimmer <nzimmer@sgi.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steffen Persvold <sp@numascale.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Wei Jiangang <weijg.fnst@cn.fujitsu.com> Link: http://lkml.kernel.org/r/1473573502-27954-1-git-send-email-yamada.masahiro@socionext.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-09-13 20:42:58 +02:00
Nicolas Iooss	ba6d018e3d	x86/mm/pkeys: Do not skip PKRU register if debug registers are not used __show_regs() fails to dump the PKRU state when the debug registers are in their default state because there is a return statement on the debug register state. Change the logic to report PKRU value even when debug registers are in their default state. Fixes:c0b17b5bd4b7 ("x86/mm/pkeys: Dump PKRU with other kernel registers") Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Link: http://lkml.kernel.org/r/20160910183045.4618-1-nicolas.iooss_linux@m4x.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-09-13 15:52:28 +02:00
Yazen Ghannam	4f29b73bae	x86/mce/AMD: Extract the error address on SMCA systems The MCA_ADDR registers on Scalable MCA systems contain the ErrorAddr in bits [55:0] and the least significant bit of the address in bits [61:56]. We should extract the valid ErrorAddr bits from the MCA_ADDR register rather than saving the raw value to struct mce. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1473275643-1721-1-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-09-13 15:23:13 +02:00
Yazen Ghannam	4b711f92c9	x86/mce, EDAC/mce_amd: Print MCA_SYND and MCA_IPID during MCE on SMCA systems The MCA_SYND and MCA_IPID registers contain valuable information and should be included in MCE output. The MCA_SYND register contains syndrome and other error information, and the MCA_IPID register will uniquely identify the MCA bank's type without having to rely on system software. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1472680624-34221-2-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-09-13 15:23:13 +02:00
Yazen Ghannam	5828c46f2c	x86/mce/AMD: Save MCA_IPID in MCE struct on SMCA systems The MCA_IPID register uniquely identifies a bank's type and instance on Scalable MCA systems. We should save the value of this register in struct mce along with the other relevant error information. This ensures that we can decode errors without relying on system software to correlate the bank to the type. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1472680624-34221-1-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-09-13 15:23:12 +02:00
Yazen Ghannam	66ef269dbb	x86/mce/AMD: Ensure the deferred error interrupt is of type APIC on SMCA systems The Deferred Error Interrupt Type is set per bank on Scalable MCA systems. This is done in a bitfield in the MCA_CONFIG register of each bank. We should set its type to APIC-based interrupt and not assume BIOS has set it for us. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1472737486-1720-1-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-09-13 15:23:11 +02:00
Yazen Ghannam	87a6d4091b	x86/mce/AMD: Update sysfs bank names for SMCA systems Define a bank's sysfs filename based on its IP type and InstanceId. Credits go to Aravind for: * The general idea and proto- get_name(). * Defining smca_umc_block_names[] and buf_mcatype[]. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Link: http://lkml.kernel.org/r/1473193490-3291-2-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-09-13 15:23:11 +02:00
Yazen Ghannam	5896820e0a	x86/mce/AMD, EDAC/mce_amd: Define and use tables for known SMCA IP types Scalable MCA defines a number of IP types. An MCA bank on an SMCA system is defined as one of these IP types. A bank's type is uniquely identified by the combination of the HWID and MCATYPE values read from its MCA_IPID register. Add the required tables in order to be able to lookup error descriptions based on a bank's type and the error's extended error code. [ bp: Align comments, simplify a bit. ] Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1472741832-1690-1-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-09-13 15:23:10 +02:00
Yazen Ghannam	cfee4f6f0b	x86/mce/AMD: Read MSRs on the CPU allocating the threshold blocks Scalable MCA systems allow non-core MCA banks to only be accessible by certain CPUs. The MSRs for these banks are Read-as-Zero on other CPUs. During allocate_threshold_blocks(), get_block_address() can be scheduled on CPUs other than the one allocating the block. This causes the MSRs to be read on the wrong CPU and results in incorrect behavior. Add a @cpu parameter to get_block_address() and pass this in to ensure that the MSRs are only read on the CPU that is allocating the block. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1472673994-12235-2-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-09-13 15:23:08 +02:00
Yazen Ghannam	db819d60f6	x86/mce: Add support for new MCA_SYND register Syndrome information is no longer contained in MCA_STATUS for SMCA systems but in a new register - MCA_SYND. Add a synd field to struct mce to hold MCA_SYND register value. Add it to the end of struct mce to maintain compatibility with old versions of mcelog. Also, add it to the respective tracepoint. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1467633035-32080-1-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-09-13 15:23:06 +02:00
Yazen Ghannam	74ab0e7a83	x86/mce/AMD: Use msr_ops.misc() in allocate_threshold_blocks() Change MSR_IA32_MCx_MISC() macro to msr_ops.misc() because SMCA machines define a different set of MSRs and msr_ops will give you the correct MISC register. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1468269447-8808-1-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-09-13 15:23:06 +02:00
Al Stone	c12f29a5d4	x86: ACPI: make variable names clearer in acpi_parse_madt_lapic_entries() This patch has no functional change; it is purely cosmetic, though it does make it a wee bit easier to understand the code. Before, the count of LAPICs was being stored in the variable 'x2count' and the count of X2APICs was being stored in the variable 'count'. This patch swaps that so that the routine acpi_parse_madt_lapic_entries() will now consistently use x2count to refer to X2APIC info, and count to refer to LAPIC info. Signed-off-by: Al Stone <ahs3@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-09-13 02:19:59 +02:00
Al Stone	0f61aaa413	x86: ACPI: remove extraneous white space after semicolon Signed-off-by: Al Stone <ahs3@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-09-13 02:19:59 +02:00
Linus Torvalds	ac059c4fa7	* s390: nested virt fixes (new 4.8 feature) * x86: fixes for 4.8 regressions * ARM: two small bugfixes -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJX1xaBAAoJEL/70l94x66Db/8H/AuKWs6QbvUNnA+tWITdmFbi ah7R2BoRVak4h6UGYC4OsY9BTuWVeaCx2+8yLIqSdfWoYUJzwiGFPwkuUK/hVSkK /9gZO8SsREAm/1gVck2X2vzATYbfCxKcRsSq0/ocmIQEpRYWKGoJWS6zeiwLZ9kq H/KsAvsGc83VdlPXrhLVQa7js/Tl/M5JlBEbm6i8nIsvhoqZkES4rgwHKBtkxTyU 8oepfNQnYTb2hZhJE0aW+9z3V0O+NKbGW75bKOzrbJBoEUGeHuPpLnP0mZIyEGPU grQkfuWMmfEaWrGahNA9ARpsNcy4Zp0BF5mgJ03OAmgfY+KmJroVk95IvcP9jF4= =+uuA -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM fixes from Paolo Bonzini: - s390: nested virt fixes (new 4.8 feature) - x86: fixes for 4.8 regressions - ARM: two small bugfixes * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: kvm-arm: Unmap shadow pagetables properly x86, clock: Fix kvm guest tsc initialization arm: KVM: Fix idmap overlap detection when the kernel is idmap'ed KVM: lapic: adjust preemption timer correctly when goes TSC backward KVM: s390: vsie: fix riccbd KVM: s390: don't use current->thread.fpu.* when accessing registers	2016-09-12 14:30:14 -07:00
Ricardo Neri	3dad6f7f69	x86/efi: Defer efi_esrt_init until after memblock_x86_fill Commit 7b02d53e7852 ("efi: Allow drivers to reserve boot services forever") introduced a new efi_mem_reserve to reserve the boot services memory regions forever. This reservation involves allocating a new EFI memory range descriptor. However, allocation can only succeed if there is memory available for the allocation. Otherwise, error such as the following may occur: esrt: Reserving ESRT space from 0x000000003dd6a000 to 0x000000003dd6a010. Kernel panic - not syncing: ERROR: Failed to allocate 0x9f0 bytes below \ 0x0. CPU: 0 PID: 0 Comm: swapper Not tainted 4.7.0-rc5+ #503 0000000000000000 ffffffff81e03ce0 ffffffff8131dae8 ffffffff81bb6c50 ffffffff81e03d70 ffffffff81e03d60 ffffffff8111f4df 0000000000000018 ffffffff81e03d70 ffffffff81e03d08 00000000000009f0 00000000000009f0 Call Trace: [<ffffffff8131dae8>] dump_stack+0x4d/0x65 [<ffffffff8111f4df>] panic+0xc5/0x206 [<ffffffff81f7c6d3>] memblock_alloc_base+0x29/0x2e [<ffffffff81f7c6e3>] memblock_alloc+0xb/0xd [<ffffffff81f6c86d>] efi_arch_mem_reserve+0xbc/0x134 [<ffffffff81fa3280>] efi_mem_reserve+0x2c/0x31 [<ffffffff81fa3280>] ? efi_mem_reserve+0x2c/0x31 [<ffffffff81fa40d3>] efi_esrt_init+0x19e/0x1b4 [<ffffffff81f6d2dd>] efi_init+0x398/0x44a [<ffffffff81f5c782>] setup_arch+0x415/0xc30 [<ffffffff81f55af1>] start_kernel+0x5b/0x3ef [<ffffffff81f55434>] x86_64_start_reservations+0x2f/0x31 [<ffffffff81f55520>] x86_64_start_kernel+0xea/0xed ---[ end Kernel panic - not syncing: ERROR: Failed to allocate 0x9f0 bytes below 0x0. An inspection of the memblock configuration reveals that there is no memory available for the allocation: MEMBLOCK configuration: memory size = 0x0 reserved size = 0x4f339c0 memory.cnt = 0x1 memory[0x0] [0x00000000000000-0xffffffffffffffff], 0x0 bytes on node 0\ flags: 0x0 reserved.cnt = 0x4 reserved[0x0] [0x0000000008c000-0x0000000008c9bf], 0x9c0 bytes flags: 0x0 reserved[0x1] [0x0000000009f000-0x000000000fffff], 0x61000 bytes\ flags: 0x0 reserved[0x2] [0x00000002800000-0x0000000394bfff], 0x114c000 bytes\ flags: 0x0 reserved[0x3] [0x000000304e4000-0x00000034269fff], 0x3d86000 bytes\ flags: 0x0 This situation can be avoided if we call efi_esrt_init after memblock has memory regions for the allocation. Also, the EFI ESRT driver makes use of early_memremap'pings. Therfore, we do not want to defer efi_esrt_init for too long. We must call such function while calls to early_memremap are still valid. A good place to meet the two aforementioned conditions is right after memblock_x86_fill, grouped with other EFI-related functions. Reported-by: Scott Lawson <scott.lawson@intel.com> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Peter Jones <pjones@redhat.com> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>	2016-09-09 16:08:52 +01:00
Matt Fleming	4971531af3	x86/efi: Test for EFI_MEMMAP functionality when iterating EFI memmap Both efi_find_mirror() and efi_fake_memmap() really want to know whether the EFI memory map is available, not just whether the machine was booted using EFI. efi_fake_memmap() even has a check for EFI_MEMMAP at the start of the function. Since we've already got other code that has this dependency, merge everything under one if() conditional, and remove the now superfluous check from efi_fake_memmap(). Tested-by: Dave Young <dyoung@redhat.com> [kexec/kdump] Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> [arm] Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Taku Izumi <izumi.taku@jp.fujitsu.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>	2016-09-09 16:06:34 +01:00
Waiman Long	f99fd22e4d	x86/hpet: Reduce HPET counter read contention On a large system with many CPUs, using HPET as the clock source can have a significant impact on the overall system performance because of the following reasons: 1) There is a single HPET counter shared by all the CPUs. 2) HPET counter reading is a very slow operation. Using HPET as the default clock source may happen when, for example, the TSC clock calibration exceeds the allowable tolerance. Something the performance slowdown can be so severe that the system may crash because of a NMI watchdog soft lockup, for example. During the TSC clock calibration process, the default clock source will be set temporarily to HPET. For systems with many CPUs, it is possible that NMI watchdog soft lockup may occur occasionally during that short time period where HPET clocking is active as is shown in the kernel log below: [ 71.646504] hpet0: 8 comparators, 64-bit 14.318180 MHz counter [ 71.655313] Switching to clocksource hpet [ 95.679135] BUG: soft lockup - CPU#144 stuck for 23s! [swapper/144:0] [ 95.693363] BUG: soft lockup - CPU#145 stuck for 23s! [swapper/145:0] [ 95.695580] BUG: soft lockup - CPU#582 stuck for 23s! [swapper/582:0] [ 95.698128] BUG: soft lockup - CPU#357 stuck for 23s! [swapper/357:0] This patch addresses the above issues by reducing HPET read contention using the fact that if more than one CPUs are trying to access HPET at the same time, it will be more efficient when only one CPU in the group reads the HPET counter and shares it with the rest of the group instead of each group member trying to read the HPET counter individually. This is done by using a combination quadword that contains a 32-bit stored HPET value and a 32-bit spinlock. The CPU that gets the lock will be responsible for reading the HPET counter and storing it in the quadword. The others will monitor the change in HPET value and lock status and grab the latest stored HPET value accordingly. This change is only enabled on 64-bit SMP configuration. On a 4-socket Haswell-EX box with 144 threads (HT on), running the AIM7 compute workload (1500 users) on a 4.8-rc1 kernel (HZ=1000) with and without the patch has the following performance numbers (with HPET or TSC as clock source): TSC = 1042431 jobs/min HPET w/o patch = 798068 jobs/min HPET with patch = 1029445 jobs/min The perf profile showed a reduction of the %CPU time consumed by read_hpet from 11.19% without patch to 1.24% with patch. [ tglx: It's really sad that we need to have such hacks just to deal with the fact that cpu vendors have not managed to fix the TSC wreckage within 15+ years. Were They Forgetting? ] Signed-off-by: Waiman Long <Waiman.Long@hpe.com> Tested-by: Prarit Bhargava <prarit@redhat.com> Cc: Scott J Norton <scott.norton@hpe.com> Cc: Douglas Hatch <doug.hatch@hpe.com> Cc: Randy Wright <rwright@hpe.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1473182530-29175-1-git-send-email-Waiman.Long@hpe.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-09-09 15:16:19 +02:00
Dave Hansen	acd547b298	x86/pkeys: Default to a restrictive init PKRU PKRU is the register that lets you disallow writes or all access to a given protection key. The XSAVE hardware defines an "init state" of 0 for PKRU: its most permissive state, allowing access/writes to everything. Since we start off all new processes with the init state, we start all processes off with the most permissive possible PKRU. This is unfortunate. If a thread is clone()'d [1] before a program has time to set PKRU to a restrictive value, that thread will be able to write to all data, no matter what pkey is set on it. This weakens any integrity guarantees that we want pkeys to provide. To fix this, we define a very restrictive PKRU to override the XSAVE-provided value when we create a new FPU context. We choose a value that only allows access to pkey 0, which is as restrictive as we can practically make it. This does not cause any practical problems with applications using protection keys because we require them to specify initial permissions for each key when it is allocated, which override the restrictive default. In the end, this ensures that threads which do not know how to manage their own pkey rights can not do damage to data which is pkey-protected. I would have thought this was a pretty contrived scenario, except that I heard a bug report from an MPX user who was creating threads in some very early code before main(). It may be crazy, but folks evidently _do_ it. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: linux-arch@vger.kernel.org Cc: Dave Hansen <dave@sr71.net> Cc: mgorman@techsingularity.net Cc: arnd@arndb.de Cc: linux-api@vger.kernel.org Cc: linux-mm@kvack.org Cc: luto@kernel.org Cc: akpm@linux-foundation.org Cc: torvalds@linux-foundation.org Link: http://lkml.kernel.org/r/20160729163021.F3C25D4A@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-09-09 13:02:28 +02:00
Dave Hansen	e8c24d3a23	x86/pkeys: Allocation/free syscalls This patch adds two new system calls: int pkey_alloc(unsigned long flags, unsigned long init_access_rights) int pkey_free(int pkey); These implement an "allocator" for the protection keys themselves, which can be thought of as analogous to the allocator that the kernel has for file descriptors. The kernel tracks which numbers are in use, and only allows operations on keys that are valid. A key which was not obtained by pkey_alloc() may not, for instance, be passed to pkey_mprotect(). These system calls are also very important given the kernel's use of pkeys to implement execute-only support. These help ensure that userspace can never assume that it has control of a key unless it first asks the kernel. The kernel does not promise to preserve PKRU (right register) contents except for allocated pkeys. The 'init_access_rights' argument to pkey_alloc() specifies the rights that will be established for the returned pkey. For instance: pkey = pkey_alloc(flags, PKEY_DENY_WRITE); will allocate 'pkey', but also sets the bits in PKRU[1] such that writing to 'pkey' is already denied. The kernel does not prevent pkey_free() from successfully freeing in-use pkeys (those still assigned to a memory range by pkey_mprotect()). It would be expensive to implement the checks for this, so we instead say, "Just don't do it" since sane software will never do it anyway. Any piece of userspace calling pkey_alloc() needs to be prepared for it to fail. Why? pkey_alloc() returns the same error code (ENOSPC) when there are no pkeys and when pkeys are unsupported. They can be unsupported for a whole host of reasons, so apps must be prepared for this. Also, libraries or LD_PRELOADs might steal keys before an application gets access to them. This allocation mechanism could be implemented in userspace. Even if we did it in userspace, we would still need additional user/kernel interfaces to tell userspace which keys are being used by the kernel internally (such as for execute-only mappings). Having the kernel provide this facility completely removes the need for these additional interfaces, or having an implementation of this in userspace at all. Note that we have to make changes to all of the architectures that do not use mman-common.h because we use the new PKEY_DENY_ACCESS/WRITE macros in arch-independent code. 1. PKRU is the Protection Key Rights User register. It is a usermode-accessible register that controls whether writes and/or access to each individual pkey is allowed or denied. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: linux-arch@vger.kernel.org Cc: Dave Hansen <dave@sr71.net> Cc: arnd@arndb.de Cc: linux-api@vger.kernel.org Cc: linux-mm@kvack.org Cc: luto@kernel.org Cc: akpm@linux-foundation.org Cc: torvalds@linux-foundation.org Link: http://lkml.kernel.org/r/20160729163015.444FE75F@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-09-09 13:02:27 +02:00
Srinivas Pandruvada	a6cbcdd5ab	ACPI / CPPC: Add support for functional fixed hardware address The CPPC registers can also be accessed via functional fixed hardware addresse(FFH) in X86. Add support by modifying cpc_read and cpc_write to be able to read/write MSRs on x86 platform on per cpu basis. Also with this change, acpi_cppc_processor_probe doesn't bail out if address space id is not equal to PCC or memory address space and FFH is supported on the system. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-09-08 23:02:14 +02:00
Prarit Bhargava	a4497a86fb	x86, clock: Fix kvm guest tsc initialization When booting a kvm guest on AMD with the latest kernel the following messages are displayed in the boot log: tsc: Unable to calibrate against PIT tsc: HPET/PMTIMER calibration failed `aa297292d7` ("x86/tsc: Enumerate SKL cpu_khz and tsc_khz via CPUID") introduced a change to account for a difference in cpu and tsc frequencies for Intel SKL processors. Before this change the native tsc set x86_platform.calibrate_tsc to native_calibrate_tsc() which is a hardware calibration of the tsc, and in tsc_init() executed tsc_khz = x86_platform.calibrate_tsc(); cpu_khz = tsc_khz; The kvm code changed x86_platform.calibrate_tsc to kvm_get_tsc_khz() and executed the same tsc_init() function. This meant that KVM guests did not execute the native hardware calibration function. After `aa297292d7`, there are separate native calibrations for cpu_khz and tsc_khz. The code sets x86_platform.calibrate_tsc to native_calibrate_tsc() which is now an Intel specific calibration function, and x86_platform.calibrate_cpu to native_calibrate_cpu() which is the "old" native_calibrate_tsc() function (ie, the native hardware calibration function). tsc_init() now does cpu_khz = x86_platform.calibrate_cpu(); tsc_khz = x86_platform.calibrate_tsc(); if (tsc_khz == 0) tsc_khz = cpu_khz; else if (abs(cpu_khz - tsc_khz) * 10 > tsc_khz) cpu_khz = tsc_khz; The kvm code should not call the hardware initialization in native_calibrate_cpu(), as it isn't applicable for kvm and it didn't do that prior to `aa297292d7`. This patch resolves this issue by setting x86_platform.calibrate_cpu to kvm_get_tsc_khz(). v2: I had originally set x86_platform.calibrate_cpu to cpu_khz_from_cpuid(), however, pbonzini pointed out that the CPUID leaf in that function is not available in KVM. I have changed the function pointer to kvm_get_tsc_khz(). Fixes: `aa297292d7` ("x86/tsc: Enumerate SKL cpu_khz and tsc_khz via CPUID") Signed-off-by: Prarit Bhargava <prarit@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Cc: Len Brown <len.brown@intel.com> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Borislav Petkov <bp@suse.de> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: "Christopher S. Hall" <christopher.s.hall@intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: kvm@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-09-08 16:41:55 +02:00
Wei Yang	3ec979658e	x86/e820: Fix very large 'size' handling boundary condition The (start, size) tuple represents a range [start, start + size - 1], which means "start" and "start + size - 1" should be compared to see whether the range overflows. For example, a range with (start, size): (0xffffffff fffffff0, 0x00000000 00000010) represents [0xffffffff fffffff0, 0xffffffff ffffffff] ... would be judged overflow in the original code, while actually it is not. This patch fixes this and makes sure it still works when size is zero. Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: yinghai@kernel.org Link: http://lkml.kernel.org/r/1471657213-31817-1-git-send-email-richard.weiyang@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-09-08 09:11:14 +02:00
Josh Poimboeuf	5a8ff54c26	x86/dumpstack: Remove unnecessary stack pointer arguments When calling show_stack_log_lvl() or dump_trace() with a regs argument, providing a stack pointer or frame pointer is redundant. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>d Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Brian Gerst <brgerst@gmail.com> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nilay Vaish <nilayvaish@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1694e2e955e3b9a73a3c3d5ba2634344014dd550.1472057064.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-09-08 08:58:40 +02:00
Josh Poimboeuf	4b8afafbe7	x86/dumpstack: Add get_stack_pointer() and get_frame_pointer() The various functions involved in dumping the stack all do similar things with regard to getting the stack pointer and the frame pointer based on the regs and task arguments. Create helper functions to do that instead. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Brian Gerst <brgerst@gmail.com> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nilay Vaish <nilayvaish@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/f448914885a35f333fe04da1b97a6c2cc1f80974.1472057064.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-09-08 08:58:40 +02:00
Josh Poimboeuf	d438f5fda3	x86/dumpstack: Make printk_stack_address() more generally useful Change printk_stack_address() to be useful when called by an unwinder outside the context of dump_trace(). Specifically: - printk_stack_address()'s 'data' argument is always used as the log level string. Make that explicit. - Call touch_nmi_watchdog(). Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Brian Gerst <brgerst@gmail.com> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nilay Vaish <nilayvaish@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/9fbe0db05bacf66d337c162edbf61450d0cff1e2.1472057064.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-09-08 08:58:40 +02:00
Andy Lutomirski	6271cfdfc0	x86/mm: Improve stack-overflow #PF handling If we get a page fault indicating kernel stack overflow, invoke handle_stack_overflow(). To prevent us from overflowing the stack again while handling the overflow (because we are likely to have very little stack space left), call handle_stack_overflow() on the double-fault stack. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/6d6cf96b3fb9b4c9aa303817e1dc4de0c7c36487.1472603235.git.luto@kernel.org [ Minor edit. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-09-08 08:47:20 +02:00
Ingo Molnar	2b3061c77c	Merge branch 'x86/mm' into x86/asm, to unify the two branches for simplicity Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-09-08 08:41:52 +02:00
Dou Liyang	c291b01515	x86/apic: Fix num_processors value in case of failure If the topology package map check of the APIC ID and the CPU is a failure, we don't generate the processor info for that APIC ID yet we increase disabled_cpus by one - which is buggy. Only increase num_processors once we are sure we don't fail. Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1473214893-16481-1-git-send-email-douly.fnst@cn.fujitsu.com [ Rewrote the changelog. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-09-08 08:11:03 +02:00
Carlos Santa	8d9c20e1d1	drm/i915: Remove .is_mobile field from platform struct As recommended by Ville Syrjala removing .is_mobile field from the platform struct definition for vlv and hsw+ GPUs as there's no need to make the distinction in later hardware anymore. Keep it for older GPUs as it is still needed for ilk-ivb. Signed-off-by: Carlos Santa <carlos.santa@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2016-09-07 16:07:07 -07:00
Sebastian Andrzej Siewior	9a20ea4b4c	x86/kvm: Convert to hotplug state machine Install the callbacks via the state machine. The online & down callbacks are invoked on the target CPU so we can avoid using smp_call_function_single(). local_irq_disable() is used because smp_call_function_single() used to invoke the function with interrupts disabled. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Cc: kvm@vger.kernel.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Gleb Natapov <gleb@kernel.org> Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20160818125731.27256-15-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-09-06 18:30:25 +02:00
Juergen Gross	47ae4b05d0	virt, sched: Add generic vCPU pinning support Add generic virtualization support for pinning the current vCPU to a specified physical CPU. As this operation isn't performance critical (a very limited set of operations like BIOS calls and SMIs is expected to need this) just add a hypervisor specific indirection. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Douglas_Warzecha@dell.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: akataria@vmware.com Cc: boris.ostrovsky@oracle.com Cc: chrisw@sous-sol.org Cc: david.vrabel@citrix.com Cc: hpa@zytor.com Cc: jdelvare@suse.com Cc: jeremy@goop.org Cc: linux@roeck-us.net Cc: pali.rohar@gmail.com Cc: rusty@rustcorp.com.au Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1472453327-19050-3-git-send-email-jgross@suse.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-09-05 13:52:38 +02:00
Tony Luck	ffb173e657	x86/mce: Drop X86_FEATURE_MCE_RECOVERY and the related model string test We now have a better way to determine if we are running on a cpu that supports machine check recovery. Free up this feature bit. Signed-off-by: Tony Luck <tony.luck@intel.com> Acked-by: Borislav Petkov <bp@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Boris Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/d5db39e08d46cf1012d94d3902275d08ba931926.1472754712.git.tony.luck@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-09-05 11:47:31 +02:00
Tony Luck	9a6fb28a35	x86/mce: Improve memcpy_mcsafe() Use the mcsafe_key defined in the previous patch to make decisions on which copy function to use. We can't use the FEATURE bit any more because PCI quirks run too late to affect the patching of code. So we use a static key. Turn memcpy_mcsafe() into an inline function to make life easier for callers. The assembly code that actually does the copy is now named memcpy_mcsafe_unrolled() Signed-off-by: Tony Luck <tony.luck@intel.com> Acked-by: Borislav Petkov <bp@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Boris Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/bfde2fc774e94f53d91b70a4321c85a0d33e7118.1472754712.git.tony.luck@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-09-05 11:47:31 +02:00
Tony Luck	3637efb008	x86/mce: Add PCI quirks to identify Xeons with machine check recovery Each Xeon includes a number of capability registers in PCI space that describe some features not enumerated by CPUID. Use these to determine that we are running on a model that can recover from machine checks. Hooks for Ivybridge ... Skylake provided. Signed-off-by: Tony Luck <tony.luck@intel.com> Acked-by: Borislav Petkov <bp@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Boris Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/abf331dc4a3e2a2d17444129bc51127437bcf4ba.1472754711.git.tony.luck@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-09-05 11:47:31 +02:00
Borislav Petkov	cc2187a6e0	x86/microcode/AMD: Fix load of builtin microcode with randomized memory We do not need to add the randomization offset when the microcode is built in. Reported-and-tested-by: Emanuel Czirai <icanrealizeum@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/20160904093736.GA11939@pd.tnic Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-09-05 10:38:56 +02:00
Linus Torvalds	9ca581b50d	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Thomas Gleixner: "A single fix for an AMD erratum so machines without a BIOS fix work" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/AMD: Apply erratum 665 on machines without a BIOS fix	2016-09-04 08:45:41 -07:00
Emanuel Czirai	d199299675	x86/AMD: Apply erratum 665 on machines without a BIOS fix AMD F12h machines have an erratum which can cause DIV/IDIV to behave unpredictably. The workaround is to set MSRC001_1029[31] but sometimes there is no BIOS update containing that workaround so let's do it ourselves unconditionally. It is simple enough. [ Borislav: Wrote commit message. ] Signed-off-by: Emanuel Czirai <icanrealizeum@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Yaowu Xu <yaowu@google.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20160902053550.18097-1-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-09-02 20:42:28 +02:00
Steven Rostedt	15301a5707	x86/paravirt: Do not trace _paravirt_ident_*() functions Łukasz Daniluk reported that on a RHEL kernel that his machine would lock up after enabling function tracer. I asked him to bisect the functions within available_filter_functions, which he did and it came down to three: _paravirt_nop(), _paravirt_ident_32() and _paravirt_ident_64() It was found that this is only an issue when noreplace-paravirt is added to the kernel command line. This means that those functions are most likely called within critical sections of the funtion tracer, and must not be traced. In newer kenels _paravirt_nop() is defined within gcc asm(), and is no longer an issue. But both _paravirt_ident_{32,64}() causes the following splat when they are traced: mm/pgtable-generic.c:33: bad pmd ffff8800d2435150(0000000001d00054) mm/pgtable-generic.c:33: bad pmd ffff8800d3624190(0000000001d00070) mm/pgtable-generic.c:33: bad pmd ffff8800d36a5110(0000000001d00054) mm/pgtable-generic.c:33: bad pmd ffff880118eb1450(0000000001d00054) NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [systemd-journal:469] Modules linked in: e1000e CPU: 2 PID: 469 Comm: systemd-journal Not tainted 4.6.0-rc4-test+ #513 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012 task: ffff880118f740c0 ti: ffff8800d4aec000 task.ti: ffff8800d4aec000 RIP: 0010:[<ffffffff81134148>] [<ffffffff81134148>] queued_spin_lock_slowpath+0x118/0x1a0 RSP: 0018:ffff8800d4aefb90 EFLAGS: 00000246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88011eb16d40 RDX: ffffffff82485760 RSI: 000000001f288820 RDI: ffffea0000008030 RBP: ffff8800d4aefb90 R08: 00000000000c0000 R09: 0000000000000000 R10: ffffffff821c8e0e R11: 0000000000000000 R12: ffff880000200fb8 R13: 00007f7a4e3f7000 R14: ffffea000303f600 R15: ffff8800d4b562e0 FS: 00007f7a4e3d7840(0000) GS:ffff88011eb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f7a4e3f7000 CR3: 00000000d3e71000 CR4: 00000000001406e0 Call Trace: _raw_spin_lock+0x27/0x30 handle_pte_fault+0x13db/0x16b0 handle_mm_fault+0x312/0x670 __do_page_fault+0x1b1/0x4e0 do_page_fault+0x22/0x30 page_fault+0x28/0x30 __vfs_read+0x28/0xe0 vfs_read+0x86/0x130 SyS_read+0x46/0xa0 entry_SYSCALL_64_fastpath+0x1e/0xa8 Code: 12 48 c1 ea 0c 83 e8 01 83 e2 30 48 98 48 81 c2 40 6d 01 00 48 03 14 c5 80 6a 5d 82 48 89 0a 8b 41 08 85 c0 75 09 f3 90 8b 41 08 <85> c0 74 f7 4c 8b 09 4d 85 c9 74 08 41 0f 18 09 eb 02 f3 90 8b Reported-by: Łukasz Daniluk <lukasz.daniluk@intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-09-02 09:40:47 -07:00
Thomas Gleixner	0cb7bf61b1	Merge branch 'linus' into smp/hotplug Apply upstream changes to avoid conflicts with pending patches.	2016-09-01 18:33:46 +02:00
Brian Gerst	ffcb043ba5	sched/x86: Fix thread_saved_pc() thread_saved_pc() was using a completely bogus method to get the return address. Since switch_to() was previously inlined, there was no sane way to know where on the stack the return address was stored. Now with the frame of a sleeping thread well defined, this can be implemented correctly. Signed-off-by: Brian Gerst <brgerst@gmail.com> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1471106302-10159-7-git-send-email-brgerst@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-08-24 12:31:51 +02:00
Brian Gerst	616d24835e	sched/x86: Pass kernel thread parameters in 'struct fork_frame' Instead of setting up a fake pt_regs context, put the kernel thread function pointer and arg into the unused callee-restored registers of 'struct fork_frame'. Signed-off-by: Brian Gerst <brgerst@gmail.com> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1471106302-10159-6-git-send-email-brgerst@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-08-24 12:31:50 +02:00
Brian Gerst	0100301bfd	sched/x86: Rewrite the switch_to() code Move the low-level context switch code to an out-of-line asm stub instead of using complex inline asm. This allows constructing a new stack frame for the child process to make it seamlessly flow to ret_from_fork without an extra test and branch in __switch_to(). It also improves code generation for __schedule() by using the C calling convention instead of clobbering all registers. Signed-off-by: Brian Gerst <brgerst@gmail.com> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1471106302-10159-5-git-send-email-brgerst@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-08-24 12:31:41 +02:00
Brian Gerst	7b32aeadbc	sched/x86: Add 'struct inactive_task_frame' to better document the sleeping task stack frame Add 'struct inactive_task_frame', which defines the layout of the stack for a sleeping process. For now, the only defined field is the BP register (frame pointer). Signed-off-by: Brian Gerst <brgerst@gmail.com> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1471106302-10159-4-git-send-email-brgerst@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-08-24 12:27:41 +02:00
Brian Gerst	163630191e	sched/x86/64, kgdb: Clear GDB_PS on 64-bit switch_to() no longer saves EFLAGS, so it's bogus to look for it on the stack. Set it to zero like 32-bit. Signed-off-by: Brian Gerst <brgerst@gmail.com> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1471106302-10159-3-git-send-email-brgerst@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-08-24 12:27:40 +02:00
Brian Gerst	4e047aa7f2	sched/x86/32, kgdb: Don't use thread.ip in sleeping_thread_to_gdb_regs() Match 64-bit and set gdb_regs[GDB_PC] to zero. thread.ip is always the same point in the scheduler (except for newly forked processes), and will be removed in a future patch. Signed-off-by: Brian Gerst <brgerst@gmail.com> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1471106302-10159-2-git-send-email-brgerst@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-08-24 12:27:40 +02:00
Josh Poimboeuf	13e25bab7e	x86/dumpstack/ftrace: Don't print unreliable addresses in print_context_stack_bp() When function graph tracing is enabled, print_context_stack_bp() can report return_to_handler() as an unreliable address, which is confusing and misleading: return_to_handler() is really only useful as a hint for debugging, whereas print_context_stack_bp() users only care about the actual 'reliable' call path. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nilay Vaish <nilayvaish@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/c51aef578d8027791b38d2ad9bac0c7f499fde91.1471607358.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-08-24 12:15:15 +02:00
Josh Poimboeuf	6f727b84e2	x86/dumpstack/ftrace: Mark function graph handler function as unreliable When function graph tracing is enabled for a function, its return address on the stack is replaced with the address of an ftrace handler (return_to_handler). Currently 'return_to_handler' can be reported as reliable. That's not ideal, and can actually be misleading. When saving or dumping the stack, you normally only care about what led up to that point (the call path), rather than what will happen in the future (the return path). That's especially true in the non-oops stack trace case, which isn't used for debugging. For example, in a perf profiling operation, reporting return_to_handler() in the trace would just be confusing. And in the oops case, where debugging is important, "unreliable" is also more appropriate there because it serves as a hint that graph tracing was involved, instead of trying to imply that return_to_handler() was the real caller. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nilay Vaish <nilayvaish@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/f8af15749c7d632d3e7f815995831d5b7f82950d.1471607358.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-08-24 12:15:15 +02:00
Josh Poimboeuf	471bd10f5e	ftrace/x86: Implement HAVE_FUNCTION_GRAPH_RET_ADDR_PTR Use the more reliable version of ftrace_graph_ret_addr() so we no longer have to worry about the unwinder getting out of sync with the function graph ret_stack index, which can happen if the unwinder skips any frames before calling ftrace_graph_ret_addr(). This fixes this issue (and several others like it): $ cat /proc/self/stack [<ffffffff810489a2>] save_stack_trace_tsk+0x22/0x40 [<ffffffff81311a89>] proc_pid_stack+0xb9/0x110 [<ffffffff813127c4>] proc_single_show+0x54/0x80 [<ffffffff812be088>] seq_read+0x108/0x3e0 [<ffffffff812923d7>] __vfs_read+0x37/0x140 [<ffffffff812929d9>] vfs_read+0x99/0x140 [<ffffffff81293f28>] SyS_read+0x58/0xc0 [<ffffffff818af97c>] entry_SYSCALL_64_fastpath+0x1f/0xbd [<ffffffffffffffff>] 0xffffffffffffffff $ echo function_graph > /sys/kernel/debug/tracing/current_tracer $ cat /proc/self/stack [<ffffffff818b2428>] return_to_handler+0x0/0x27 [<ffffffff810394cc>] print_context_stack+0xfc/0x100 [<ffffffff818b2428>] return_to_handler+0x0/0x27 [<ffffffff8103891b>] dump_trace+0x12b/0x350 [<ffffffff818b2428>] return_to_handler+0x0/0x27 [<ffffffff810489a2>] save_stack_trace_tsk+0x22/0x40 [<ffffffff818b2428>] return_to_handler+0x0/0x27 [<ffffffff81311a89>] proc_pid_stack+0xb9/0x110 [<ffffffff818b2428>] return_to_handler+0x0/0x27 [<ffffffff813127c4>] proc_single_show+0x54/0x80 [<ffffffff818b2428>] return_to_handler+0x0/0x27 [<ffffffff812be088>] seq_read+0x108/0x3e0 [<ffffffff818b2428>] return_to_handler+0x0/0x27 [<ffffffff812923d7>] __vfs_read+0x37/0x140 [<ffffffff818b2428>] return_to_handler+0x0/0x27 [<ffffffff812929d9>] vfs_read+0x99/0x140 [<ffffffffffffffff>] 0xffffffffffffffff Enabling function graph tracing causes the stack trace to change in two ways: First, the real call addresses are confusingly interspersed with 'return_to_handler' addresses. This issue will be fixed by the next patch. Second, the stack trace is offset by two frames, because the unwinder skipped the first two frames and got out of sync with the ret_stack index. This patch fixes this issue. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nilay Vaish <nilayvaish@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/a6d623e36f8d08f9a17bd74d804d201177a23afd.1471607358.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-08-24 12:15:15 +02:00
Josh Poimboeuf	408fe5de2f	x86/dumpstack/ftrace: Convert dump_trace() callbacks to use ftrace_graph_ret_addr() Convert print_context_stack() and print_context_stack_bp() to use the arch-independent ftrace_graph_ret_addr() helper. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nilay Vaish <nilayvaish@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/56ec97cafc1bf2e34d1119e6443d897db406da86.1471607358.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-08-24 12:15:14 +02:00
Josh Poimboeuf	9a7c348ba6	ftrace: Add return address pointer to ftrace_ret_stack Storing this value will help prevent unwinders from getting out of sync with the function graph tracer ret_stack. Now instead of needing a stateful iterator, they can compare the return address pointer to find the right ret_stack entry. Note that an array of 50 ftrace_ret_stack structs is allocated for every task. So when an arch implements this, it will add either 200 or 400 bytes of memory usage per task (depending on whether it's a 32-bit or 64-bit platform). Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nilay Vaish <nilayvaish@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/a95cfcc39e8f26b89a430c56926af0bb217bc0a1.1471607358.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-08-24 12:15:14 +02:00

... 22 23 24 25 26 ...

14712 Commits